AI4003
Applied Machine Learning
Instructor: Dr. M. Tariq
Today’s Class
• A little about me
• Intro to Applied Machine
Learning
• Course outline and logistics
Muhammad Tariq| Professor & Head
Electrical Engineering| FAST-NUCES| Islamabad Campus
Postdoc. Princeton University| Ph.D. Waseda University| MS. Hanyang University
Fulbright Scholar USA| MEXT Scholar Japan| HEC Scholar S. Korea
Enlisted in Stanford's Top 2% Scientists
Course
outline
Grading Policy
Assessment Weightage % Due When
Methods
Assignment 6% At end of difficult
Absolute grading topics
Quiz 6% After each
assignment
Project 8% After Sessional-II
Sessional I 15% Sessional-I Week
Sessional II 15% Sessional-II Week
Final Exam 50% Final Exam Week
Homework details
• Implement and apply machine
learning methods in Python notebooks
• Submit Report PDF and Jupyter
notebook
Learning resources
Syllabus
Recordings
Assignments
Schedule
Lecture slides and readings
Lectures
• In-person
Office hours
• Friday 11.30 am to 1.00 pm
Readings/textbook: Forsyth Applied Machine Learning
The lectures are not directly based on any textbook, but will point you to relevant
readings from David
Forsyth’s Applied Machine Learning, which is considered our primary text, or other
online resources. The AML
book is really quite good and worth reading, even for parts not covered in lectures.
Academic Integrity
These are OK
• Discuss homeworks with classmates (don’t show each other code)
• Use Stack Overflow to learn how to use a Python module
• Get ideas from online (make sure to attribute the source)
Not OK
• Copying or looking at homework-specific code (i.e. so that you claim credit for part of an assignment based on code that you
didn’t write)
• Using external resources (code, ideas, data) without acknowledging them
Remember
• Ask if you’re not sure if it’s ok
• You are safe as long as you acknowledge all of your sources of inspiration, code, etc. in your write-up
Other comments
Prerequisites
• Probability, linear algebra, calculus, signal
and systems
• Experience with Python will help but is not
necessary, understanding that it may take
more time to complete assignments
• Watch tutorials (see schedule: intro reading)
for linear algebra, python/numpy, and
jupyter notebooks.
How is this course different from…
This course provides a foundation for ML practice, This course has less theory, derivations, and
while most of ML courses provides a foundation for ML optimization, and more on application representations
research and examples
Should you take this course?
Take this course if …
• You want to learn how to apply machine ML
• You like coding-based homeworks and are OK with
math too
• You are willing to spend 10-12 hours per week (maybe
even more) on lectures, reading, review, and
assignments
Do not take this course if …
• You want more of a theoretical background
• You want to focus on one application domain (take
vision, NLP, or a special topics course instead)
• You want an “easy A” (it’s not going to be easy)
Feedback is welcome
I will occasionally solicit feedback in You can always talk to me after class My goal is to be a force multiplier on
the class directly or indirectly– or send me email how much you can learn with a
please respond given amount of effort
What to do next
• Read the syllabus and schedule
• Unless you consider yourself highly proficient in Python/numpy and linear
algebra, watch/do the tutorials linked in the web page
General Purpose Learners
Kamath et al. 2022
What is machine learning?
• Create predictive models Data
or useful insights from raw
data
– Alexa speech recognition
– Amazon product
recommendations
Algorithm
– Tesla autopilot
– GPT-3 text generation
– Image generation ML spins raw data into gold!
– Data visualization
What is Machine Learning?
• “Learning is any process by which a system improves performance from experience.”
• - Herbert Simon
• Definition by Tom Mitchell (1998):
• Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
• A well-defined learning task is given by <P, T, E>.
The whole machine learning problem
1. Data preparation Example: voice recognition in Alexa
a. Collect and curate data
b. Annotate the data (for supervised
problems)
c. Split your data into train, validation, and
test sets
2. Algorithm and model development
a. Design methods to extract features from
the data
b. Design a machine learning model and Our focus, but it’s important to understand all of it
identify key parameters and loss
c. Train, select parameters, and evaluate
your designs using the validation set
3. Final evaluation using the test set
4. Integrate into your application
Course objectives
• Learn how to solve problems with ML • The global machine learning market is
expected to grow from $21.17 billion in 2022 to
• Key concepts and methodologies for $209.91 billion by 2029, at a CAGR of 38.8%.
learning from data With the field growing at such an exponential
rate the number of jobs is growing too and
• Algorithms and their strengths and machine learning is one of the most trending
limitations career paths of today. - Emeritus
• Domain-specific representations
• Ability to select the right tools for the
job
Traditional Programming
Data
Output
Program Computer
Machine Learning
Data
Computer Program
Output
4
Slide credit: Pedro Domingos
When Do We Use
Machine Learning? ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
Learning isn’t always useful:
• There is no need to “learn” to calculate payroll
5
Based on slide by E. Alpaydin
A classic example of a task that requires machine learning:
It is very hard to say what makes a 2
6
Slide credit: Geoffrey Hinton
Some more examples of tasks that are best
solved by using a learning algorithm
Recognizing patterns: Generating patterns: Recognizing anomalies: Prediction:
Facial identities or facial Generating images or motion Unusual credit card Future stock prices or currency
expressions sequences transactions exchange rates
Handwritten or spoken words Unusual patterns of sensor
Medical images readings in a nuclear power
plant
7
Slide credit: Geoffrey Hinton
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]
8
Slide credit: Pedro Domingos
State of the Art Applications of
Machine Learning
11
Autonomous Cars
• Nevada made it legal for
autonomous cars to drive on
roads in June 2011
• As of 2013, four states (Nevada,
Florida, California, and
Michigan) have legalized
autonomous cars
Penn’s Autonomous Car →
(Ben Franklin Racing Team) 12
Autonomous Car Sensors
13
Autonomous Car Technology
Path
Planning
Laser Terrain Mapping
Learning from Human Drivers
Adaptive Vision
Sebastian
Stanley
Images and movies taken from Sebastian Thrun’s multimedia w1e4bsite.
Deep Learning in the Headlines
15
Deep Belief Net on Face Images
object models
object parts
(combination
of edges)
edges
pixels
Based on materials 16
by Andrew Ng
Learning of Object Parts
30
Slide credit: Andrew Ng
Training on Multiple Objects
• Trained on 4 classes (cars, faces,
motorbikes, airplanes).
• Second layer: Shared-features and
object-specific features.
• Third layer: More specific features.
31
Slide credit: Andrew Ng
Scene Labeling via Deep Learning
[Farabet et al. ICML 2012, PAMI 2013] 19
Inference from Deep Learned Models
Generating posterior samples from faces by “filling in” experiments
(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.
Input images
Samples from
feedforward
Inference
(control)
Samples from
Full posterior
inference
20
Slide credit: Andrew Ng
Machine Learning in
Automatic Speech Recognition
A Typical Speech Recognition System
ML used to predict of phone states from the sound spectrogram
Deep learning has state-of-the-art results
# Hidden Layers 1 2 4 8 10 12
Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1
Baseline GMM performance = 15.4%
[Zeiler et al. “On rectified linear units for speech
recognition” ICASSP 2013]
21
Impact of Deep Learning in Speech Technology
22
Slide credit: Li Deng, MS Research
Fake Videos
Cheapfake: Video frame slowed down to make Deepfake: A puppet-mastered deepfake to transfer
Nancy Pelosi’s speech appear slurred and drunk source’s head movement and facial expressions on
Putin’s face
Image credit: Washington Post,
Types of Learning
23
Types of Learning
• Supervised (inductive) learning
– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
24
Based on slide by Pedro Domingos
Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is real-valued == regression
9
8
September Arctic Sea Ice Extent
7
(1,000,000 sq km)
6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
26
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
27
Based on example by Andrew Ng
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
Based on example by Andrew Ng
Tumor Size 41
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
Predict Benign Predict Malignant
Based on example by Andrew Ng
Tumor Size 42
Supervised Learning
• x can be multi-dimensional
– Each dimension corresponds to an attribute
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape
…
Tumor Size
30
Based on example by Andrew Ng
Unsupervised Learning
• Given x1, x2, ..., x n (without labels)
• Output hidden structure behind the x’s
– E.g., clustering
31
Unsupervised Learning
Genomics application: group individuals by genetic similarity
Genes
Individuals 32
[Source: Daphne Koller]
Unsupervised Learning
Organize computing clusters Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Market segmentation Astronomical data analysis 33
Slide credit: Andrew Ng
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources
47
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources
48
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Reinforcement Learning
• Given a sequence of states and actions with
(delayed) rewards, output a policy
– Policy is a mapping from states → actions that
tells you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand
36
The Agent-Environment Interface
Agent and environment interact at discrete time steps : t = 0, 1, 2, K
Agent observes state at step t : st S
produces action at step t : at A(st )
gets resulting reward : rt +1
and resulting next state : st +1
... rt +1 s rt +2 s rt +3 s ...
st a t +1
at +1 t +2
at +2 t +3 at +3
t
37
Slide credit: Sutton & Barto
Reinforcement Learning
https://www.youtube.com/watch?v=4cgWya-wjgY
51
Inverse Reinforcement Learning
• Learn policy from user demonstrations
Stanford Autonomous Helicopter
http://heli.stanford.edu/
https://www.youtube.com/watch?v=VCdxqn0fcnE
52
Framing a Learning Problem
53
Designing a Learning System
• Choose the training experience
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the target
function from the experience
Training data Learner
Environment/
Experience Knowledge
Testing data
Performance
Element 41
Based on slide by Ray Mooney
Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”
• If examples are not independent, requires
collective classification
• If test distribution is different, requires
transfer learning
42
Slide credit: Ray Mooney
Tens of thousands of machine learning algorithms
– Hundreds new every year
ML in a Every ML algorithm has three components:
– Representation
Nutshell – what the model looks like; how knowledge is represented.
– Optimization
– the process for finding good models; how programs are
generated.
– Evaluation
– how good models are differentiated; how programs are
evaluated. 43
Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks
57
Slide credit: Ray Mooney
Various Search/Optimization
Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution
58
Slide credit: Ray Mooney
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.
47
Slide credit: Pedro Domingos
ML in Practice
• Understand domain, prior knowledge, and goals
• Data integration, selection, cleaning, pre-processing, etc.
Loop • Learn models
• Interpret results
• Consolidate and deploy discovered knowledge
48
Based on a slide by Pedro Domingos
Lessons Learned about Learning
• Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.
• Function approximation can be viewed as a search
through a space of hypotheses (representations of
functions) for one that best fits a set of training data.
• Different learning methods assume different
hypothesis spaces (representation languages) and/or
employ different search techniques.
49
Slide credit: Ray Mooney
A Brief History of
Machine Learning
50
History of Machine Learning
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM
63
Slide credit: Ray Mooney
History of Machine Learning (cont.)
• 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
• 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning
64
Slide credit: Ray Mooney
History of Machine Learning (cont.)
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc.
– ???
65
Based on slide by Ray Mooney
What We’ll Cover in this Course
• Supervised learning • Unsupervised learning
– Decision tree induction – Clustering
– Linear regression – Dimensionality reduction
– Logistic regression • Reinforcement learning
– Support vector machines – Temporal difference
& kernel methods learning
– Model ensembles – Q learning
– Bayesian learning • Evaluation
– Neural networks & deep • Applications
learning
– Learning theory
Our focus will be on applying machine learning to real applications
54