10/23/2023
Introduction to
Machine Learning
Dr. Muhammad Amjad Iqbal
Associate Professor
University of Central Punjab, Lahore.
Amjad.iqbal@ucp.edu.pk
https://sites.google.com/a/ucp.edu.pk/iml/
Slides of Prof. Dr. Andrew Ng, Stanford
Assessment
Scale: Bad, Average, Good, Very Good
• How good are you in programming
– think of “Data structures” course?
• How good are you in Mathematics?
– think of “Discrete Structures” course?
• How good are you in Probability and Statistics?
Course Logistics
• Two lectures per week
• Quizzes (15% marks)
• Assignments (15% marks)
• Project (10% marks)
• 2 exams
– Mid Term Exam (20% marks)
– Final Exam (40% marks)
• Covers all course
3
1
10/23/2023
Course Logistics Cont.
• Course material will be posted on:
https://cms.ucp.edu.pk/
• What is on the website:
– Course Handbook, Lectures, Assignments
Visit the website frequently
Course Logistics Cont.
• Plagiarism
– Copying someone else’s work (partial or complete) and
submitting it as if it were one’s own
– Read course handbook to know more about plagiarism
– Zero tolerance for plagiarism
• You’ll upload Assignments in MS Teams or something
similar
– Assignments will be checked for plagiarism using Moss
or JPlag
– If Plagiarism found, all involved will get zero in that
assignment
5
Course Objectives
• To introduce the basic concepts of Machine
Learning.
• To make students understand the use of machine
learning approaches to solve some laboratory
problems initially and real world problems later on.
• To equip students with structures and strategies for
complex problem solving
• Learning by doing it using Python.
• To Excite you about the field
6
2
10/23/2023
Reference Books
• No single textbook
• Witten, Frank and Hall. Data Mining - Practical
Machine Learning Tools and Techniques 3rd Edition
• Christopher M. Bishop. Pattern Recognition and
Machine Learning Springer
• Ethem Alpaydin. Introduction to Machine Learning 2nd
Edition
• T. Mitchell. Machine Learning. WCB/McGraw-Hill,
Boston, 1997. 7
Reference Books
• Stuart Russell and Peter Norvig. Artificial
Intelligence A Modern Approach – 3rd edition
• David Barber. Bayesian Reasoning and Machine
Learning
• Data Mining A Knowledge Discovery Approach –
Springer
• All available in pdf at ucpshares
8
Motivation
• Machine Learning is one of the most exiting
area
• Its everywhere
SPAM
9
3
10/23/2023
www.imdb.com
www.amazon.com
Machine Learning
• Grew out of work in AI
• Aim: building intelligent machines
• What we knew already: Program a machine to find the
shortest path from A to B (for example)
• Did not know much: How to write AI programs that can
do more interesting things like web search, photo
tagging or email anti-spam, driverless car, etc.
• Realization: Machine learns to do it by itself
• Machine learning was developed as a new capability
for computers
• Today it touches many segments of industry and
science
12
4
10/23/2023
ML application areas: Database mining
• Large datasets from growth of automation/web
• One of the reasons that ML becomes so wide
spread
– Web click data
• Tons of companies collecting clickstream data for
mining purpose
• To understand the users better with machine
learning algorithms
• Huge segment of Software Industry working on it
currently 13
ML application areas: Database mining
• Electronic Medical records
– Trying to turn medical records into medical
knowledge, to understand disease better.
• An evaluation of machine-learning methods for
predicting pneumonia mortality
– G. F. Cooper et al. 1997
• Artificial Intelligence in Medicine, 9(2) 107-138
14
ML application areas: Database mining
• Computational biology
– Biologists collecting lots of data about gene
sequences, DNA sequences, etc.
– ML algorithms are giving us a much better
understanding of the human genome
• Engineering
15
5
10/23/2023
ML application areas:
• Applications we can’t program by hand.
– Autonomous helicopter, Google driverless car
• Learns to do it by itself
– Handwriting recognition
• Postal Mail: A learning algorithm that has learned how to
read postal code in your handwriting (US mail)
– Most of Natural Language Processing (NLP)
and Computer Vision today
• Applied Machine learning
16
ML application areas:
• Self-customizing programs
– Amazon, IMDB, Youtube recommendations
• Understanding human learning (brain, cognition)
– Learning algorithms are being used today to
understand human learning and to
understand the brain.
17
Machine learning is a highly desirable skill in IT
industry and Computer Science research
18
6
10/23/2023
Machine Learning definition
• Arthur Samuel (1959). Machine Learning:
• Field of study that gives computers the ability to
learn without being explicitly programmed.
• Tom Mitchell (1998) Well-posed Learning Problem:
A computer program is said to learn from
experience E with respect to some task T and some
performance measure P, if its performance on T, as
measured by P, improves with experience E.
19
“A computer program is said to learn from experience E with
respect to some task T and some performance measure P, if
its performance on T, as measured by P, improves with
experience E.”
Suppose your email program watches which emails you do or do
not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?
Classifying emails as spam or not spam. T
Watching you label emails as spam or not spam. E
The number (or fraction) of emails correctly classified as spam/not spam. P
None of the above—this is not a machine learning problem.
Topics
Machine learning algorithms: Background Topics:
- Supervised learning - Linear Algebra
- Unsupervised learning - Probability
- Reinforcement learning
- Bayesian Networks
- Hidden Markov Models
Also talk about: Practical advice for applying learning
algorithms. Implementation in Octave
21
7
10/23/2023
Supervised Learning
• Probably the most common type of machine
learning problem
• Let us introduce it with an example
22
Housing price prediction.
400
300 quadratic function or
second-order polynomial
Price ($)
in 1000’s 200
100
0
0 500 1000 1500 2000 2500
750
Size in feet2
Supervised Learning Regression: Predict continuous
“right answers” given valued output (price)
23
Another supervised learning example
Cancer (malignant, benign)
24
8
10/23/2023
Another supervised learning example
Cancer (malignant, benign)
Classification: Discrete Valued output (0 or 1)
0, 1, 2, 3, …
Benign, T1, T2, T3, … 25
Slightly different set of symbols to plot this data with 2 features
Features
- Tumor Thickness
- Uniformity of Cell Size
- Uniformity of Cell Shape
…
26
What do we do if we have infinite number of
features?
Support Vector Machine (SVM) algorithm can
ideally deal with infinite number of features, with
a neat mathematical trick
In supervised learning:
• In every example in our data set, we are told
what is the "correct answer”.
• Data is labeled with answers
27
9
10/23/2023
• Classification: the output is binary or a fixed
number of features. Ex. something is either a
chair or not.
• Regression is continuous. Ex. Tomorrow’s
temperature might be 13 degrees in our
prediction.
28
Problem 1: You have a large inventory of identical items.
You want to predict how many of these items will sell over
the next 3 months.
Problem 2: You’d like software to examine individual
customer accounts, and for each account decide if it has
been hacked/compromised.
Classification or regression problems?
1. Treat both as classification problems.
2. Treat problem 1 as a classification problem, problem 2
as a regression problem.
3. Treat problem 1 as a regression problem, problem 2
as a classification problem.
4. Treat both as regression problems.
Supervised Learning Topics
• Regression
• Support Vector Machines
• Neural Networks
• Bayesian Learning
• K Nearest Neighbors
• Decision Trees
• etc.
10
10/23/2023
Unsupervised Learning
• Data without “right answers”
• Data doesn't have any labels
• We're just told, here is a data set!
• Can you find some structure in the data?
31
Supervised Learning
x2
x1
32
Unsupervised Learning
x2
A clustering algorithm
x1
33
11
10/23/2023
34
35
Genes
Individuals
DNA microarray data to understand genomics
Colors show the degree to which different individuals do or do not have a
specific gene.
36
[Source: Daphne Koller]
12
10/23/2023
Genes
Individuals
• Cluster individuals into different categories or into different types of people.
37
[Source: Daphne Koller]
Organize computing clusters
• Figure out which machines tend to work together
using clustering algorithm
• Then put those machines together, to make data
center work more efficiently.
38
Social network analysis
Market segmentation
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Astronomical data analysis 39
13
10/23/2023
Of the following examples, which would you address
using an unsupervised learning algorithm?
(select all that apply.)
1. Given email labeled as spam/not spam, learn a spam filter.
2. Given a set of news articles found on the web, group them
into set of articles about the same story.
3. Given a database of customer data, automatically discover
market segments and group customers into different market
segments.
4. Given a dataset of patients diagnosed as either having
diabetes or not, learn to classify new patients as having
diabetes or not.
Unsupervised Learning
• K-means Clustering
• A-priori Algorithm
• Self-organizing Maps
Reinforcement learning
• Refers to problems where we don't do one-shot
decision-making
• E.g., in the supervised learning cancer prediction
problem, we have a patient. We predict if tumor
is malignant or benign. Later we’ll know either we
got it right or wrong.
• In reinforcement learning problems, we usually
have to make a sequence of decisions over time.
42
14
10/23/2023
Reinforcement learning
• Examples: autonomous helicopter, driverless car
• Cannot program by hand
• Stochastic environment: too many possibilities
• The basic idea is to define a “reward function”
• Q Learning
43
Probabilistic reasoning
• Turing award (Nobel prize in Computer Science)
for Bayesian networks
44
END
45
15