Machine Learning with MATLAB
Stefan Duprey, Application Engineer
stefan.duprey@mathworks.fr
2013 The MathWorks, Inc.1
What You Will Learn
Overview of machine learning
Algorithms available with MATLAB
MATLAB as an interactive environment
for evaluating and choosing the best algorithm
Machine Learning
Basic Concepts
Start with an initial set of data
Learn from this data
Train your algorithm
with this data
Use the resulting model
to predict outcomes
for new data sets
1
Group1
0.9
Group2
0.8
Group3
0.7
Group4
Group5
0.6
Group6
Group7
0.5
Group8
0.4
0.3
0.2
0.1
0
-0.1
0.1
0.2
0.3
0.4
0.5
0.6
Machine Learning
Characteristics and Examples
Characteristics
Lots of data (many variables)
System too complex to know
the governing equation
(e.g., black-box modeling)
Examples
Pattern recognition (speech, images)
Financial algorithms (credit scoring, algo trading)
Energy forecasting (load, price)
Biology (tumor detection, drug discovery)
AAA 93.68%
5.55%
0.59%
0.18%
0.00%
0.00%
0.00%
0.00%
AA 2.44%
92.60%
4.03%
0.73%
0.15%
0.00%
0.00%
0.06%
A 0.14%
4.18%
91.02%
3.90%
0.60%
0.08%
0.00%
0.08%
BBB 0.03%
0.23%
7.49%
87.86%
3.78%
0.39%
0.06%
0.16%
BB 0.03%
0.12%
0.73%
8.27%
86.74%
3.28%
0.18%
0.64%
B 0.00%
0.00%
0.11%
0.82%
9.64%
85.37%
2.41%
1.64%
CCC 0.00%
0.00%
0.00%
0.37%
1.84%
6.24%
81.88%
9.67%
D 0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
100.00%
AA
BBB
BB
CCC
AAA
Model Development Process
Exploration
Modeling
Evaluation
Deployment
Exploratory Data Analysis
Gain insight from visual examination
MPG
40
Displacement Acceleration
Identify trends and interactions
Detect patterns
Remove outliers
Shrink data
Select and pare predictors
Feature transformation
Weight
20
20
10
400
200
4000
2000
Horsepower
200
150
100
50
20
MPG
40
10
20
Acceleration
200
400 2000
Displacement
4000
Weight
50 100150200
Horsepow er
Data Exploration
Interactions Between Variables
8
4
6
8
20
3
20
400
200
4000
200
150
100
50
20
40
MPG
10
20
Acceleration
200
400 2000
Displacement
4000
Weight
f(t)
10
2000
Horsepower
4
6
8
Coordinate Value
Weight
Displacement Acceleration
MPG
40
-2
-1
-4
-2
-6
-3
MPG
50 100150200
Acceleration
Displacement
Weight
-8
Horsepower
Horsepow er
Plot Matrix by Group
0.1
0.2
0.4
0.5
t
0.6
0.7
0.8
0.9
Andrews Plot
Parallel Coordinates Plot
chevrolet chevelle malibu buick skylark 320
0.3
plymouth satellite
chevrolet chevelle malibu
buick skylark 320 plymouth satellite
amc rebel sst
chevrolet impala
ford torino
plymouth fury iii
Glyph Plot
ford galaxie 500
pontiac catalina
amc rebel sst
ford torino
ford galaxie 500
chevrolet impala
plymouth fury iii
pontiac catalina
Chernoff Faces
7
Machine Learning Overview
Types of Learning, Categories of Algorithms
Machine
Learning
Type of Learning
Categories of Algorithms
Unsupervised
Learning
Clustering
Group and interpret
data based only
on input data
Classification
Supervised
Learning
Develop predictive
model based on both
input and output data
Regression
Unsupervised Learning
Clustering
K-means,
Fuzzy K-means
Hierarchical
Unsupervised
Learning
Clustering
Neural Network
Machine
Learning
Group and interpret
data based only
on input data
Gaussian
Mixture
Classification
Supervised
Learning
Regression
Clustering
Overview
1
What is clustering?
Segment data into groups,
based on data similarity
0.9
0.8
0.7
0.6
Why use clustering?
Identify outliers
Resulting groups may be
the matter of interest
0.5
0.4
0.3
0.2
0.1
0
-0.1
0.1
0.2
0.3
0.4
0.5
0.6
How is clustering done?
Can be achieved by various algorithms
It is an iterative process (involving trial and error)
10
Dataset Well Be Using
Cloud of randomly generated points
Each cluster center is
randomly chosen inside
specified bounds
Each cluster contains
the specified number
of points per cluster
1
Group1
0.9
Group2
0.8
Group3
0.7
Group4
Group5
0.6
Group6
Group7
0.5
Group8
0.4
Each cluster point
is sampled from a
Gaussian distribution
0.3
0.2
0.1
0
-0.1
0.1
0.2
0.3
0.4
0.5
0.6
Multi-dimensional dataset
11
Example Cluster Analysis
K-Means
K-means is a partitioning method
Partitions data into k mutually
exclusive clusters
Each cluster has a
centroid (or center)
Statistics Toolbox
Sum of distances from
all objects to the center
is minimized
12
Distance Metrics & Group Quality
Distance measures choices
Many built-in distance metrics, or
define your own
Cosine Distance
Useful for clustering variables
>> doc pdist
>> distances = pdist(data,metric); %pdist =
pairwise distances
Cityblock Distance
>> squareform(distances)
Useful for discrete variables
>> kmeans(data,k,distance,cityblock)
%not all metrics supported
Euclidean Distance
Default
Create silhouette plots
>> silhouette(data,clusters)
13
Clustering
Neural Network
Networks are comprised of one or more layers
Outputs computed by
applying a nonlinear
transfer function with
weighted sum of inputs
Trained by letting the network
continually adjust itself
to new inputs (determines weights)
Weights
Input
variables
Transfer
function
Output
Variable
Bias
14
Clustering
Neural Network
Neural Network Toolbox provides
interactive apps for easily
creating and training networks
Multi-layered networks
created by cascading
Neural Network Toolbox
(provide better accuracy)
Example architectures for clustering:
Self-organizing maps
Competitive layers
15
Self Organising Map Neural Net
How it Works
SOM Weight Positions
1.2
Started with a regular grid of
neurons laid over the dataset
0.8
Weight 2
0.6
0.4
0.2
Size of the grid determined
the number of clusters
-0.2
-0.5
0.5
Weight 1
Neurons competed to recognize
data points (by being close to them)
SOM Weight Positions
1
0.9
0.8
0.7
Winning neurons were moved
closer to the data points
0.6
Weight 2
0.5
0.4
0.3
Repeated until convergence
0.2
0.1
0
-0.2
0.2
Weight 1
0.4
0.6
16
Gaussian Mixture Models
Statistics Toolbox
Good when clusters have different
sizes and are correlated
Assume that data is drawn
from a fixed number K
of normal distributions
20
10
1
0
1
0.8
0.6
0.8
0.6
0.4
0.4
0.2
0.2
0
17
Cluster Analysis
Summary
Segments data into groups, based on data similarity
No method is perfect
K-means,
Fuzzy K-means
(depends on data)
Hierarchical
Process is iterative;
explore different algorithms
Beware of local minima
Clustering
Neural Network
Gaussian
Mixture
(global optimization can help)
18
Model Development Process
Exploration
Modeling
Evaluation
Deployment
19
Supervised Learning
Classification for Predictive Modeling
Unsupervised
Learning
Decision Tree
Machine
Learning
Ensemble
Method
Classification
Supervised
Learning
Develop predictive
model based on both
input and output data
Neural Network
Support Vector
Machine
20
Classification
Overview
1
Group1
0.9
What is classification?
Predicting the best group for each point
Learns from labeled observations
Uses input features
Group2
0.8
Group3
0.7
Group4
Group5
0.6
Group6
Group7
0.5
Group8
0.4
0.3
0.2
Why use classification?
Accurately group data never seen before
0.1
0
-0.1
0.1
0.2
0.3
0.4
0.5
0.6
How is classification done?
Can use several algorithms to build a predictive model
Good training data is critical
21
Example Classification
Decision Trees
Statistics Toolbox
Builds a tree from training data
Model is a tree where each node is a
logical test on a predictor
Traverse tree by comparing
features with threshold values
The leaf of the tree
specifies the group
22
Ensemble Learners
Statistics Toolbox
Overview
1.5
Decision trees are weak learners
Good to classify data used to train
Often not very good with new data
Note rectangular groups
group2
group3
group4
group5
group6
x2
group1
group7
0.5
group8
What are ensemble learners?
-0.5
-0.4
-0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
x1
Combine many decision trees to create a
strong learner
Uses bootstrapped aggregation
Why use ensemble methods?
Classifier has better predictive power
Note improvement in cluster shapes
23
Decision Trees
Statistics Toolbox
How do I build them with MATLAB?
Build tree model
>> tree = classregtree(x,y);
>> view(tree)
1.5
group1
group2
group3
group5
model on new data
x2
Evaluate the
>> tree(x_new)
group4
group6
group7
0.5
group8
-0.5
-0.4
-0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
x1
24
Enhancing the model : Ensemble Learning
Combine weak learners into a stronger learner
>> ens =fitensemble(x,y,'AdaBoostM2',200,'Tree');
Bootstrapped aggregated trees forest
>> ens = fitensemble(x,y,'Bag',200,'Tree,'type','classification');
>> y_pred = predict(ens,x);
Visualise class boundaries
25
K-Nearest Neighbor Classification
One of the simplest classifiers
Takes the K nearest points
from the training set, and
chooses the majority class
of those K points
No training phase all the
work is done during the
application of the model
1.5
group1
group2
group3
group4
group5
group6
x2
Statistics Toolbox
group7
0.5
group8
-0.5
-0.4
-0.2
0.2
0.4
0.6
0.8
1.2
1.4
1.6
x1
26
MATLAB Helps to Manage Complexity
A single calling syntax
for all methods
Documentation helps
you choose an
appropriate algorithm for
your particular problem
27
Support Vector Machines
Statistics Toolbox
(as of R2013a)
Overview
Good for modeling with complex
boundaries between groups
Can be very accurate
No restrictions on the predictors
4
1
2
Support Vectors
What does it do?
Uses non-linear kernel to
calculate the boundaries
Can be computationally intensive
-1
-2
-3
-2
-1
Version in Statistics Toolbox only
classifies into two groups
28
Classification
Summary
Decision Tree
No absolute best method
Ensemble
Method
Simple does not
mean inefficient
Classification
Neural Network
Support Vector
Machine
Watch for overfitting
Decision trees and neural networks may overfit the noise
Use ensemble learning and cross-validation
Parallelize for speedup
29
Supervised Learning
Regression for Predictive Modeling
Unsupervised
Learning
Machine
Learning
Supervised
Learning
Develop predictive
model based on both
input and output data
Linear
Regression
Non-linear
Non-parametric
30
Regression
Statistics Toolbox
Curve Fitting Toolbox
Why use regression?
Predict the continuous response
for new observations
Type of predictive modeling
Specify a model that describes
Y as a function of X
Estimate coefficients that
minimize the difference
between predicted and actual
You can apply techniques from earlier sections with
regression as well (e.g., Neural Network)
31
Linear Regression
Y is a linear function of the regression coefficients
Common examples:
Straight line
= 0 + 11
Plane
= 0 + 11 +22
Polynomial
= 0 + 113 + 212 +31
Polynomial
with cross terms
= 0 + 112 + 2(1 2) + 3 22
32
Nonlinear Regression
Y is a nonlinear function of the regression coefficients
Syntax for formulas:
Fourier Series
y ~ b0 + b1*cos(x*b3) +
0 + 1 cos 3 + 2 sin 3
b4*sin(x*b3)
Exponential Growth
= 0
@(b,t)(b(1)*exp(b(2)*t)
Logistic Growth
0
@(b,t)(1/(b(1)
+ exp( =
1 + 1
b(2)*x)))
33
Generalized Linear Models
Extends the linear model
Define relationship between model and response variable
Model error distributions other than normal
Logistic regression
Response variable is binary (true / false)
Results are typically expressed as an odds ratio
Poisson regression
Model count data (non-negative integers)
Response variable comes from a Poisson distribution
34
Machine Learning with MATLAB
Interactive environment
Visual tools for exploratory data analysis
Easy to evaluate and choose best algorithm
Apps available to help you get started
(e.g,. neural network tool, curve fitting tool)
Multiple algorithms to choose from
Clustering
Classification
Regression
35
Learn More : Machine Learning with MATLAB
http://www.mathworks.com/discovery/
machine-learning.html
Data Driven Fitting
with MATLAB
Multivariate Classification in
the Life Sciences
Classification
with MATLAB
Electricity Load and
Price Forecasting
Regression
with MATLAB
Credit Risk Modeling with
MATLAB
36