INTRODUCTION TO
MACHINE LEARNING
Outline
Topics:
• The Machine Learning Framework
• What is Supervised Learning?
• What is Unsupervised Learning?
Reading Material:
Chapter 18.2 to 14.2 in Russell & Norvig
Reference Videos:
• Machine Learning course by Andrew Ng, Coursera
− What is Machine Learning [video]
− Introduction – Supervised Learning [video]
− Introduction – Unsupervised Learning [video]
• The Machine Learning Pipeline by Evan Sparks [video]
Problems with Traditional Approach
Input
Output
Complex,
specific “car”
program
if this local region looks like a door handle,
and
if this local region area looks like a wheel
Classify image as a car
Will work if given the same image again, but, given the following new images, this
algorithm is expected to fail.
Problem:
- Static – cannot adapt to new input
- Complex – problem becomes unwieldy (many variations) 3
What is Machine Learning
Learning = Improving with experience at some task
Arthur Samuel (1959)
Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.
Tom Mitchell (1998)
Well-posed Learning Problem:
A computer program is said to learn from
experience E with respect to some task T and some
performance measure P, if its performance on T, as
measured by P, improves with experience E.
4
The Machine Learning Framework
Machine learning algorithms build models (hypothesis function) to tackle
tasks
• Example: a straight line (linear classifier)
A model can be adjusted by modifying its parameters.
• Example: adjusting slope and bias of a straight line allow us to split the feature
space into two partitions
Divided into two phases:
• Training phase
− Use training samples to learn the parameters of the model
• Testing phase
− Given an new sample, apply the model (learnt from the training phase) for
the intended task (regression, classification)
Since the parameters are adjustable, we need not write custom algorithms for
different tasks. We simply need train a new model for new problems or when
the environment changes.
5
A Standard Machine Learning Pipeline
Training Phase: Testing Phase:
Ground Truths Training Evaluation Testing Ground Truths
(if required) Set Set Set (if available)
Data Preparation
Training
Input to
generates
Model
Predictions
Evaluation
6
Data Preparation
Involves the following activities:
• Determine useful features or information to collect
• Collect samples for training and testing (can be very labor-intensive)
• Perform data cleaning and preprocessing
Types of data:
• Text, numbers, clickstreams, images, videos, transactions, graphs,
tables, etc.
7
Feature extraction:
To automatically classify fishes (salmon or sea
brass) in a conveyor belt
Useful features: lightness, width, number of fins,
shape of the fins, shape of fish
Interested to know: classes of fishes in the
conveyor belt
To perform digit classification in an image:
Useful features: intensity value of each pixel in an
image
Interested to know: the digits given an image
To predict house prices:
Useful features: size of house, age of house,
number of rooms, number of toilets, location, type
of house, population size of neighborhood, freehold
or leasehold, renovation status
Interested to know : price of an unseen house
In traditional machine learning where features are
hand-picked. Deep learning learns the feature
automatically from raw features
Perform data preprocessing:
o Data aggregation:
Fusing data from multiple sources
o Data cleaning:
Cleaning data to remove noise and duplicate observations
o Data transformation:
o Format conversion:
Convert the format into the desired format, e.g., from free text into vector
o Discretization and binarization:
Required if some learning algorithms require the data to be of categorical
attributes.
o Feature creation:
Creation of a new set of features from the original raw feature
o Data Reduction
o Feature subset selection:
Selecting a subset of features
o Dimensionality Reduction:
Removing features that are not useful. Problem with the curse of dimensionality
9
Training
Build a machine learning model to predict the value of a
particular attribute based on the values of other attributes
There are four types of modeling tasks:
• Classification: if the target predictive variable is discrete
− Example: for manholes, will the manhole explode next year? Y/N
• Regression: if the target predictive variable is continuous
− Example: predict stock prices based on recent trends
• Clustering: if we want to group observations into similar-
looking groups
• Recommendation: if we want to recommend someone
an item, e.g., a book, movie or product based on rating
data from customers
10
Example (classification):
• Categorize the fishes in the conveyor belt into salman or sea brass. Let’s say our
model is a straight line that separates the two types of fishes (a linear classification
model)
• Find the best straight line (decision boundary) that partitions the feature space into
2 regions, one for each type of fish.
• Need to find the best parameter value (intersection and slope)
Width = 19.2
Lightness =
Width = 19.3 7.3
Lightness = 1.8
Sea Brass
Salmon
Width = 16.4
Lightness =
Width = 17.3 7.6
Lightness = 2.2
Sea Brass
Salmon
11
Testing
Build a machine learning model to predict the value of a
particular attribute based on the values of other attributes
There are four types of modeling tasks:
• Classification: if the target predictive variable is discrete
− Example: for manholes, will the manhole explode next year? Y/N
• Regression: if the target predictive variable is continuous
− Example: predict stock prices based on recent trends
• Clustering: if we want to group observations into similar-
looking groups
• Recommendation: if we want to recommend someone
an item, e.g., a book, movie or product based on rating
data from customers
12
Evaluation
This step evaluates the performance of the learnt model
How do you measure the quality of the result?
Need to have ground-truths - may be hard to get
Different kinds of performance measure is available, each with their
own pros and cons.
Recall
Precision
F-measure
Accuracy
Confusion Table
etc.
13
Categories of Machine Learning Techniques
Supervised Learning:
Learning a model from labeled data
Features
Label
length width weight
fruit 1 165 38 172 Banana
fruit 2 218 39 230 Banana
fruit 3 76 80 145 Orange
fruit 4 145 35 150 Banana
… … … … …
fruit 5 … … … …
Useful for tasks to predict the labels/values of a certain attribute of an input sample
(classification/regression tasks)
Example: Predict the type of a fruit (banana/orange) given its features (length, width and
weight)
Unsupervised Learning:
Sometimes, the labels are not available
Learning a model using features only without the labels
Useful for grouping similar samples into multiple groups (clustering)
Example: Given a group of fruits and their features (length, width, weight), cluster them into
different categories
14
Supervised Learning
In supervised learning, the algorithm is given some example input-output pair
and it learns a function that maps from input to output
The input is the set of features used to describe the samples
The output is the attribute (category or value) that we are interested to predict
Types of supervised problems:
Classification Regression
Classification predicts discrete Regression predicts continuous
valued output (e.g., present/not valued output (e.g., house price)
present)
Price (RM x1000)
Yes Yes 400
300
200
100
No No
0
Size
0 500 1000 1500 2000 2500
Object Detection (Images with Car) Housing Price Prediction
15
Classification: Example
Input (Image): Output:
Digit Classification
• Input: images / pixel grids 0
• Output: a digit 0-9
• Setup: 1
− Get a large collection of example images, each labeled with a
digit
− Note: someone has to hand label all this data! 2
− Want to learn to predict labels of new, future digit images
• Features: 1
− The attributes used to make the digit decision
− Pixels: (6,8)=ON
− Shape Patterns: NumComponents, AspectRatio, NumLoops ??
− …
16
Classification: Example
Input (Image): Output:
Spam mail classification
Input: an email Dear Sir.
Output: spam or non-spam
Setup: First, I must solicit your confidence in
this transaction, this is by virture of its
• Get a large collection of example emails, each nature as being utterly confidencial
labeled “spam” or “non-spam” and top secret. …
• Note: someone has to hand label all this data!
• Want to learn to predict labels of new, future TO BE REMOVED FROM FUTURE
emails MAILINGS, SIMPLY REPLY TO THIS
MESSAGE AND PUT "REMOVE" IN THE
Features: The attributes used to make the spam SUBJECT.
or non-spam decision
• Words: FREE! 99 MILLION EMAIL ADDRESSES
• Text Patterns: $dd, CAPS FOR ONLY $99
• Non-text: SenderInContacts
Ok, Iknow this is blatantly OT but I'm
• … beginning to go insane. Had an old Dell
Dimension XPS sitting in the corner and
decided to put it to use, I know it was
working pre being stuck in the corner,
but when I plugged it in, hit the power
nothing happened.
17
Regression: Example
Predicting number of comments for blog post
https://archive.ics.uci.edu/ml/datasets/BlogFeedback
Input: blog posts
Output: number of comments received for a post in the next 24 hours
Setup:
• Crawl raw HTML-documents of blogs that were posted at most 72 hours before selected basetime.
For each blog, collect the number of comments received in the next 24 hours relative to the
basetime
• Collect from different base date/time.
• Ensure the train and test split are temporally disjoint (Training set: 2010, 2011, Test set: 2012) to
simulate the real-world situation where training data in the past is used to predict events in the
future
Features: Attributes extracted from the blog posts
• Total number of comments before basetime (C1)
• Number of comments in the last 24 hours before the basetime (C2)
• Number of comments between 24 and 48 hours before basetime (C3)
• Difference between C2 and C3
• The length of the blog post
• Bag of words for 200 frequent words of the text of the blog post
• Day of post
• …
18
Unsupervised Learning
19
Unsupervised Learning
• Unsupervised learning involves learning pattern when the training
samples are provided without output (no teacher)
• Uses similarity measure to detect groupings / clustering
Supervised Learning Unsupervised Learning
Positive We want
samples to learn this
x x
x
x2 x x2
x
Samples (no labels)
Negative
samples
x1 x1
Learn the hypothesis function for the Discover the underlying structure,
task based on the features of the relationship or patterns based only
training samples and their labels. on the features of the training
sample 20
Example: News Search
Unsupervised learning is used by Google news to cluster similar news stories
A news article by CNN Related news detected discovered
through clustering
21
Example: Gene clustering
Understanding genomics by finding clusters of people who have or do not have a
certain types of gene
Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8
Genes
Individuals
22
[Source: Daphne Koller]
Other Applications
Other classification tasks: Other regression tasks:
• Spam detection (input: document, • Sociology (input: : pay,
classes: spam / ham) qualifications, output: measure of
• OCR (input: images, classes: social status of various occupations)
characters) • Economics (input: family’s income,
• Medical diagnosis (input: number of children in family, output:
symptoms, classes: diseases) family consumption expenditure)
• Automatic essay grading (input: • Political science (input: measures of
document, classes: grades) public opinion, institutional variables,
• Fraud detection (input: account output: state's level of welfare
activity, classes: fraud / no fraud) spending)
• … many more • … many more
23
When to apply machine learning?
Problem size is too vast for our limited reasoning capacity
(e.g., large datasets from the growth of automation/web. Such as web click data,
medical records, biology)
Applications that cannot be programmed by hand where humans are unable to
explain their expertise
(e.g., Autonomous helicopter, handwriting recognition, most of Natural Language
Processing (NLP), Computer Vision)
Solution changes with time
(e.g., tracking, preferences)
Self-customizing programs
(e.g., Amazon, Netflix product recommendations)
Understanding human learning
(brain, real AI)
24