KEMBAR78
Introduction To Machine Learning | PDF | Machine Learning | Cluster Analysis
0% found this document useful (0 votes)
23 views89 pages

Introduction To Machine Learning

The document presents an overview of Machine Learning, defining it as the ability of computer systems to learn and adapt from experience. It discusses various learning schemes, including supervised, unsupervised, and hybrid learning, along with their applications in fields like healthcare and data analytics. Additionally, it covers classification, clustering, and prediction techniques, emphasizing the importance of pattern recognition in complex tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views89 pages

Introduction To Machine Learning

The document presents an overview of Machine Learning, defining it as the ability of computer systems to learn and adapt from experience. It discusses various learning schemes, including supervised, unsupervised, and hybrid learning, along with their applications in fields like healthcare and data analytics. Additionally, it covers classification, clustering, and prediction techniques, emphasizing the importance of pattern recognition in complex tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Machine Learning & Application

Ashok Rao
Former Head, Network Project
CEDT, IISc, Bangalore
< ashokrao.mys@gmail.com >

.
Presentation Outline
 What is Machine Learning ?
 Why Machine Learning ?

 Common Learning Schemes / Models / structure

 Supervised learning

 Unsupervised learning

 Hybrid Learning, Semi-Supervised Learning,

 Classifiers

 Hybrid Classifiers

 Panel of classifiers

 One example using Subspace (PCA) methods

 Discussion and Conclusions.

2
What would have happened if
there is no learning?
Our dictionary defines “to learn” as

 Toget knowledge of something by


study, experience, or being taught.

 Tobecome aware by information or


from observation
Learning = Improving with
experience at some task
 Improve over Task T
 with respect to performance measure P
 based on experience E

5
What is Machine Learning?

6
The goal of machine learning is to
build computer systems that can
adapt and learn from their
experience.

 Tom Dietterich

7
Machine Learning
 What is Machine Learning ?
Ability to form rules automatically and subsequent use
(decision) by exposing a system (Algorithm, structure,
data, sensors, etc.,) to input data (information).
 Why Machine Learning ?

Society and Information related tasks are getting


increasingly complex, data is of exponential order,
responses are quickly and consistently required. (eg:
Genome and Genetic data, Pharma Industry)
Structure, pattern and “rules’ if can be extracted would be
very valuable and useful in finding appropriate responses.
Eg. Data mining, Text to Speech conversion, etc.,

8
Machine Learning in Medical Domain
 Health care is among the most critical of needs.
 While Technology has improved (CT Scan, MRI, PET, in
3-D too, CA Surgery, Tele-Medicine, etc)
 Still about 70% of the worlds population do not have
Quality and Reliable health care.
 Automating diagnosis, Testing and if possible doing it
remote (more effective if portable) would help.
 Rather than fully Automating health care, better is
providing Computer Assisted options.
 One such example is radiological diagnosis of scan data.

 This helps in complex and borderline cases and allows


for Second (and Multiple “opinions” easily)
9
Machine Learning
Where does it fit? What is it not?

Artificial Intelligence Statistics / Mathematics

Data
Mining

Machine
Computer Vision Learning

Robotics

(No definition of a field is perfect – the diagram above is just one


interpretation)
Applications of machine learning
Applications of machine learning
• identify the words in handwritten text
• understand a spoken language
• predict risks in safety-critical systems
• detect errors in a network
• Fraud detection
• Price and market prediction
• Credit card approval
Many applications are immensely hard to program
directly.
These almost always turn out to be “pattern
recognition” tasks.

1. Program the computer to do the pattern


recognition task directly.

2. Program the computer to be able to learn


from
examples. (“training” data)
Human Machine

Evolved (in a large part) Designed to solve logic


for pattern recognition and arithmetic
problems
Can solve gazillion of Can solve gazillion of
PR problems in an hour arithmetic and logical
problems in an hour
Huge number of Absolute precision
parallel but relatively
slow and unreliable
processors
Not perfectly precise Usually one very fast
processor

Not perfectly reliable High reliable


Application Dependent

Classificati
on

Feature Clusterin
Sensor Preprocessing
extraction
g

Prediction
Data Analytics Model

OneR,
Frequency Naïve Bayesian,
Decision tree
Classification and
regression

Similarity K-NN, SVM, GA

Modeling Hierarchical Agglomerative

Clustering

Partitional K-Mean, SOM

Historical Data Prediction

Count, Pie chart, Bar


Categorical
chart, Entropy

Exploration
Min, Max, Mean,
Variance, Histogram,
Numerical
Correlation, Plot,
Skewness.
Classification
• Data: A set of data records (also called
examples, instances or cases) described
by
• k attributes: A1, A2, … Ak.
• a class: Each example is labelled
with a pre-defined class.
• Goal: To learn a classification model
from the data that can be used to predict
the classes of new (future, or test)
cases/instances.
Prediction
Prediction

Quantitative Qualitative

Causal Model

Regression

Time series

Moving Average
Exponential
Smoothing
ARIMA

Kalman Filter
Clustering
• The goal of clustering is to
• group data points that are close (or similar) to each
other
• identify such groupings (or clusters) in an
unsupervised manner
• Unsupervised: no information is provided to the
algorithm on which data points belong to which
clusters
• Example

x x
What should
the clusters be
for these data
x points?
x x
x
x x
x
Supervised learning
yes no

Color Shape Size Output


Blue Torus Big Y
Blue Square Small Y
Blue Star Small Y
Red Arrow Small N
Learn to approximate function F(x1, x2, x3) -> t
from a training set of (x,t) pairs 22
Supervised learning
Training data

X1 X2 X3 T
B T B Y Learner
B S S Y
B S S Y Prediction
R A S N
T
Testing data
Y
X1 X2 X3 T
B A S ? N
Hypothesis

Y C S ?

23
Key issue: generalization
yes no

? ?
Rich (not exhaustive) training set (over fitting, Ch reg. A-Z)
24
Unsupervised learning
 What if there are no output labels?

25
Supervised vs. unsupervised learning
Supervised Unsupervised
Learning based on a training set where learning based on un-annotated instances
labeling of instances represents the target (the training data doesn’t specify what we
(categorization) Function are trying to learn)
Each data in the dataset analyzed has been Each data in the dataset analyzed is not
classified classified
Needs help from the data Doesn’t need help from the data
Needs great amount of data Great amount of data are not necessarily
needed
Outcome: a classification decision Outcome: a grouping of objects (instances
and
groups of instances)
Examples: Examples:
Neural Networks (NN) Clustering (Mixture Modeling)
Decision Trees Self Organizing Map (SOM)
Support Vector Machine (SVM)
(Humans are good at creating
groups/categories/clusters from data)
Supervised learning success stories

 Face detection
 Steering an autonomous car across the US
 Detecting credit card fraud
 Medical diagnosis
 …

28
Hypothesis spaces
 Decision trees
 Neural networks

 K-nearest neighbors

 Naïve Bayes classifier

 Support vector machines (SVMs)

 Boosted decision stumps (Ada-Boost)

…

29
Perceptron
(Single layer neural net )

Linearly separable data

30
Which separating hyperplane?

31
The best linear separator is the one with
the largest margin

margin

32
What if the data is not linearly separable?

33
Multilayer Perceptrons (hidden & output layers)

Number of layers can be identified, Feedback or Feed forward?

34
Kernel trick

 x2 
 x  
    2 xy 
 y   y 2 
  z3

x2 kernel
x1
z2
z1

Kernel implicitly maps from 2D to 3D,


making problem linearly separable
35
Occam’s Razor
In Latin:
“Entities should not be multiplied unnecessarily”

- William of Occum, 1320 AD.

- What does this mean?

36
Implications of Occam’s Razor
 Simplicity is the order of things.

 Simple Explanation.
 Simple Model.
 Simple Structure.

What if facts are “Complex”

- Combination of “Simple”

37
Boosting

Simple classifiers (weak learners) can have their performance


boosted by taking weighted combinations

Boosting maximizes the margin


38
What is Cluster Analysis?
• Finding groups of objects such that the objects
in a group will be similar (or related) to one
another and different from (or unrelated to)
the objects in other groups
Inter-
Intra- cluster
cluster distances
distances are
are maximized
minimized
Difficulties of
Representation
Hierarchical Clustering
• Build a tree-based hierarchical taxonomy
(dendrogram) from a set of documents.
animal

vertebrate invertebrate

fish reptile amphib. mammal worm insect crustacean


Hierarchical Clustering

1 5
4
3
6
9

8
Hierarchical Clustering
Ste Ste Ste Ste Ste
agglomerative
p0 p1 p2 p3 p4
a
ab
b
abcde
c
cde
d
de
e
divisive
Ste Ste Ste Ste Ste
p4 p3 p2 p1 p0
Partitional Clustering

Original Points A Partitional


Clustering
Partitional Clustering
• Partitioning method: Construct
a partition of n objects into a
set of K clusters
• Given: a set of objects and the
number K
• Find: a partition of K clusters
that optimizes the chosen
partitioning criterion
– Globally optimal: exhaustively
enumerate all partitions
– Effective heuristic methods: K-
K-means Clustering

• Partitional clustering approach


• Each cluster is associated with a centroid
(center point)
• Each point is assigned to the cluster with the
closest centroid
• Number of clusters, K, must be specified
• The basic algorithm is very simple
K-Means Clustering

Step 1: Select k
random seeds s.t. Initial
d(ki,kj) > dmin Seeds (if
k=3)
K-Means Clustering:

Initial Seeds
K-Means Clustering:

New
Centroids
K-Means Clustering:

Centroids
K-Means Clustering: Iterate
Until Stability

New
Centroids
ML enabling technologies
 Faster computers
 More data
 The web

 Parallel corpora (machine translation)

 Multiple sequenced genomes

 Gene expression arrays

 New ideas
 Kernel trick

 Large margins

 Boosting

 Graphical models

 …

58
Some Select references

 The web
 Kevin Murphy, MIT, AI Lab, PPT slides
 Avrim Blum, Carnegie Mellon University, PPT Slides
 Bishop C. Pattern Recognition and Machine Learning.
Springer, 2006.

59
m
Principal Component
Analysis (PCA)
PCA seeks a projection that best represents the
data in a least-squares sense.

PCA reduces the


dimensionality of
feature space by
restricting attention to
those directions along
which the scatter of the
data cloud is greatest.

.
We shall Build on This idea next
 Subspace Methods
 Training and Classification

 Complex Models (GMM)

 Statistical Methods (Monte Carlo Methods)

 What NOT.

89

You might also like