Machine Learning & Application
Ashok Rao
Former Head, Network Project
CEDT, IISc, Bangalore
< ashokrao.mys@gmail.com >
.
Presentation Outline
What is Machine Learning ?
Why Machine Learning ?
Common Learning Schemes / Models / structure
Supervised learning
Unsupervised learning
Hybrid Learning, Semi-Supervised Learning,
Classifiers
Hybrid Classifiers
Panel of classifiers
One example using Subspace (PCA) methods
Discussion and Conclusions.
2
What would have happened if
there is no learning?
Our dictionary defines “to learn” as
Toget knowledge of something by
study, experience, or being taught.
Tobecome aware by information or
from observation
Learning = Improving with
experience at some task
Improve over Task T
with respect to performance measure P
based on experience E
5
What is Machine Learning?
6
The goal of machine learning is to
build computer systems that can
adapt and learn from their
experience.
Tom Dietterich
7
Machine Learning
What is Machine Learning ?
Ability to form rules automatically and subsequent use
(decision) by exposing a system (Algorithm, structure,
data, sensors, etc.,) to input data (information).
Why Machine Learning ?
Society and Information related tasks are getting
increasingly complex, data is of exponential order,
responses are quickly and consistently required. (eg:
Genome and Genetic data, Pharma Industry)
Structure, pattern and “rules’ if can be extracted would be
very valuable and useful in finding appropriate responses.
Eg. Data mining, Text to Speech conversion, etc.,
8
Machine Learning in Medical Domain
Health care is among the most critical of needs.
While Technology has improved (CT Scan, MRI, PET, in
3-D too, CA Surgery, Tele-Medicine, etc)
Still about 70% of the worlds population do not have
Quality and Reliable health care.
Automating diagnosis, Testing and if possible doing it
remote (more effective if portable) would help.
Rather than fully Automating health care, better is
providing Computer Assisted options.
One such example is radiological diagnosis of scan data.
This helps in complex and borderline cases and allows
for Second (and Multiple “opinions” easily)
9
Machine Learning
Where does it fit? What is it not?
Artificial Intelligence Statistics / Mathematics
Data
Mining
Machine
Computer Vision Learning
Robotics
(No definition of a field is perfect – the diagram above is just one
interpretation)
Applications of machine learning
Applications of machine learning
• identify the words in handwritten text
• understand a spoken language
• predict risks in safety-critical systems
• detect errors in a network
• Fraud detection
• Price and market prediction
• Credit card approval
Many applications are immensely hard to program
directly.
These almost always turn out to be “pattern
recognition” tasks.
1. Program the computer to do the pattern
recognition task directly.
2. Program the computer to be able to learn
from
examples. (“training” data)
Human Machine
Evolved (in a large part) Designed to solve logic
for pattern recognition and arithmetic
problems
Can solve gazillion of Can solve gazillion of
PR problems in an hour arithmetic and logical
problems in an hour
Huge number of Absolute precision
parallel but relatively
slow and unreliable
processors
Not perfectly precise Usually one very fast
processor
Not perfectly reliable High reliable
Application Dependent
Classificati
on
Feature Clusterin
Sensor Preprocessing
extraction
g
Prediction
Data Analytics Model
OneR,
Frequency Naïve Bayesian,
Decision tree
Classification and
regression
Similarity K-NN, SVM, GA
Modeling Hierarchical Agglomerative
Clustering
Partitional K-Mean, SOM
Historical Data Prediction
Count, Pie chart, Bar
Categorical
chart, Entropy
Exploration
Min, Max, Mean,
Variance, Histogram,
Numerical
Correlation, Plot,
Skewness.
Classification
• Data: A set of data records (also called
examples, instances or cases) described
by
• k attributes: A1, A2, … Ak.
• a class: Each example is labelled
with a pre-defined class.
• Goal: To learn a classification model
from the data that can be used to predict
the classes of new (future, or test)
cases/instances.
Prediction
Prediction
Quantitative Qualitative
Causal Model
Regression
Time series
Moving Average
Exponential
Smoothing
ARIMA
Kalman Filter
Clustering
• The goal of clustering is to
• group data points that are close (or similar) to each
other
• identify such groupings (or clusters) in an
unsupervised manner
• Unsupervised: no information is provided to the
algorithm on which data points belong to which
clusters
• Example
x x
What should
the clusters be
for these data
x points?
x x
x
x x
x
Supervised learning
yes no
Color Shape Size Output
Blue Torus Big Y
Blue Square Small Y
Blue Star Small Y
Red Arrow Small N
Learn to approximate function F(x1, x2, x3) -> t
from a training set of (x,t) pairs 22
Supervised learning
Training data
X1 X2 X3 T
B T B Y Learner
B S S Y
B S S Y Prediction
R A S N
T
Testing data
Y
X1 X2 X3 T
B A S ? N
Hypothesis
Y C S ?
23
Key issue: generalization
yes no
? ?
Rich (not exhaustive) training set (over fitting, Ch reg. A-Z)
24
Unsupervised learning
What if there are no output labels?
25
Supervised vs. unsupervised learning
Supervised Unsupervised
Learning based on a training set where learning based on un-annotated instances
labeling of instances represents the target (the training data doesn’t specify what we
(categorization) Function are trying to learn)
Each data in the dataset analyzed has been Each data in the dataset analyzed is not
classified classified
Needs help from the data Doesn’t need help from the data
Needs great amount of data Great amount of data are not necessarily
needed
Outcome: a classification decision Outcome: a grouping of objects (instances
and
groups of instances)
Examples: Examples:
Neural Networks (NN) Clustering (Mixture Modeling)
Decision Trees Self Organizing Map (SOM)
Support Vector Machine (SVM)
(Humans are good at creating
groups/categories/clusters from data)
Supervised learning success stories
Face detection
Steering an autonomous car across the US
Detecting credit card fraud
Medical diagnosis
…
28
Hypothesis spaces
Decision trees
Neural networks
K-nearest neighbors
Naïve Bayes classifier
Support vector machines (SVMs)
Boosted decision stumps (Ada-Boost)
…
29
Perceptron
(Single layer neural net )
Linearly separable data
30
Which separating hyperplane?
31
The best linear separator is the one with
the largest margin
margin
32
What if the data is not linearly separable?
33
Multilayer Perceptrons (hidden & output layers)
Number of layers can be identified, Feedback or Feed forward?
34
Kernel trick
x2
x
2 xy
y y 2
z3
x2 kernel
x1
z2
z1
Kernel implicitly maps from 2D to 3D,
making problem linearly separable
35
Occam’s Razor
In Latin:
“Entities should not be multiplied unnecessarily”
- William of Occum, 1320 AD.
- What does this mean?
36
Implications of Occam’s Razor
Simplicity is the order of things.
Simple Explanation.
Simple Model.
Simple Structure.
What if facts are “Complex”
- Combination of “Simple”
37
Boosting
Simple classifiers (weak learners) can have their performance
boosted by taking weighted combinations
Boosting maximizes the margin
38
What is Cluster Analysis?
• Finding groups of objects such that the objects
in a group will be similar (or related) to one
another and different from (or unrelated to)
the objects in other groups
Inter-
Intra- cluster
cluster distances
distances are
are maximized
minimized
Difficulties of
Representation
Hierarchical Clustering
• Build a tree-based hierarchical taxonomy
(dendrogram) from a set of documents.
animal
vertebrate invertebrate
fish reptile amphib. mammal worm insect crustacean
Hierarchical Clustering
1 5
4
3
6
9
8
Hierarchical Clustering
Ste Ste Ste Ste Ste
agglomerative
p0 p1 p2 p3 p4
a
ab
b
abcde
c
cde
d
de
e
divisive
Ste Ste Ste Ste Ste
p4 p3 p2 p1 p0
Partitional Clustering
Original Points A Partitional
Clustering
Partitional Clustering
• Partitioning method: Construct
a partition of n objects into a
set of K clusters
• Given: a set of objects and the
number K
• Find: a partition of K clusters
that optimizes the chosen
partitioning criterion
– Globally optimal: exhaustively
enumerate all partitions
– Effective heuristic methods: K-
K-means Clustering
• Partitional clustering approach
• Each cluster is associated with a centroid
(center point)
• Each point is assigned to the cluster with the
closest centroid
• Number of clusters, K, must be specified
• The basic algorithm is very simple
K-Means Clustering
Step 1: Select k
random seeds s.t. Initial
d(ki,kj) > dmin Seeds (if
k=3)
K-Means Clustering:
Initial Seeds
K-Means Clustering:
New
Centroids
K-Means Clustering:
Centroids
K-Means Clustering: Iterate
Until Stability
New
Centroids
ML enabling technologies
Faster computers
More data
The web
Parallel corpora (machine translation)
Multiple sequenced genomes
Gene expression arrays
New ideas
Kernel trick
Large margins
Boosting
Graphical models
…
58
Some Select references
The web
Kevin Murphy, MIT, AI Lab, PPT slides
Avrim Blum, Carnegie Mellon University, PPT Slides
Bishop C. Pattern Recognition and Machine Learning.
Springer, 2006.
59
m
Principal Component
Analysis (PCA)
PCA seeks a projection that best represents the
data in a least-squares sense.
PCA reduces the
dimensionality of
feature space by
restricting attention to
those directions along
which the scatter of the
data cloud is greatest.
.
We shall Build on This idea next
Subspace Methods
Training and Classification
Complex Models (GMM)
Statistical Methods (Monte Carlo Methods)
What NOT.
89