INTRODUCTION
What is machine learning?
Goal: programs that detect patterns and regularities in the
data
Strong patterns good predictions
Problem 1: most patterns are not interesting
Problem 2: patterns may be inexact (or
spurious)
Problem 3: data may be garbled or missing
Related Disciplines
Artificial Intelligence
Data Mining
Probability and Statistics
Information theory
Numerical optimization
Computational complexity theory
Control theory (adaptive)
Psychology (developmental, cognitive)
Neurobiology
Linguistics
Philosophy
2
What is machine learning?
A branch of artificial intelligence, concerned with the
design and development of algorithms that allow computers
to evolve behaviors based on empirical data.
As intelligence requires knowledge, it is necessary for the
computers to acquire knowledge.
Flood of data…..Highly complex systems,.. Speed of
programming (Supermarkets, Banks, telephone switches,
research, medical ..etc Google??) Any alternative ???
A program is said to learn from experience E with respect to
task T and performance measure P, if it’s performance at tasks
in T, as measured by P, improves with experience E.
Machine learning is programming computers to optimize a
performance criterion using example data or past experience
What is ML?
An algorithm is a sequence of instructions that when
carried out transforms input to output.
There are tasks with no algorithms.
The problem of sorting algorithm?
??? we gave a program a number of examples of unsorted
lists and corresponding sorted lists, and wanted the
program to learn (or, come up with an algorithm) to sort?
Learn pattern in data???
To be intelligent, a system that is in a changing environment
should have the ability to learn.
If a system can learn and adapt to such changes, the system
designer need not foresee and provide solutions for all
possible situations.
LEARNING
There are two ways that a system can improve:
1. By acquiring new knowledge
acquiring new facts
acquiring new skills
2. By adapting its behavior
solving problems more accurately
solving problems more efficiently
Why do we need Machine Learning?
• Some tasks cannot be defined well, except by examples (e.g. recognition of
faces or people).
• Large amounts of data may have hidden relationships and correlations.
Only automated approaches may be able to detect these.
• The amount of knowledge about a certain problem / task may be too large
for explicit encoding by humans (e.g. in medical diagnostics)
• Environments change over time, and new knowledge is constantly being
discovered. A continuous redesign of the systems “by hand” may be
difficult.
Some examples of tasks that are best solved by
using a learning algorithm
Recognizing patterns:
Facial identities or facial expressions
Handwritten or spoken words
Medical images
Generating patterns:
Generating images or motion sequences
Recognizing anomalies:
Unusual sequences of credit card transactions
Unusual patterns of sensor readings in a nuclear power
plant or unusual sound in your car engine.
Prediction:
Future stock prices or currency exchange rates
Some web-based examples of machine learning
The web contains a lot of data. Tasks with very big datasets
often use machine learning
especially if the data is noisy or non-stationary.
Spam filtering, fraud detection:
The enemy adapts so we must adapt too.
Recommendation systems:
Lots of noisy data. Million dollar prize!
Information retrieval:
Find documents or images with similar content.
Data Visualization:
Display a huge database in a revealing way
Learning task
• Classification:
Prediction of an item class.
• Forecasting:
Prediction of a parameter value.
• Characterization:
Find hypotheses that describe groups of items.
• Clustering:
Partitioning of the (unassigned) data set into clusters
with common properties. (Unsupervised learning)
dataset and pre-processing
Complexity of datasets:
• Many instances (examples)
• Instances with multiple features (properties / characteristics)
• Dependencies between the features (correlations)
Instance selection:
Remove identical / inconsistent / incomplete instances (e.g.
reduction of homologous genes, removal of wrongly annotated
genes)
Feature transformation / selection:
Projection techniques (e.g. principal components analysis)
Compression techniques (e.g. minimum description length)
Feature selection techniques
Defining the Learning Task
Improve on task, T, with respect to
performance metric, P, based on experience, E.
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words
T: Driving on four-lane highways using vision sensors
P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email messages as spam or legitimate.
P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels
Designing a Learning System
Choose the training experience
Choose exactly what is to be learned, i.e. the
target function.
Choose how to represent the target function.
Choose a learning algorithm to infer the target
function from the experience.
Learner
Environment/
Experience Knowledge
Performance
Element
What is ML?
Can we improve investment gain with help of stock data?
The learning Model
Understanding Hypothesis space
How many possible Boolean functions
4 features = 216 = 65536
After 7 examples, we still have
29 possibilities
The space of all hypothesis that
can be output by a learning algorithm
Version space : space not ruled
out by a training examples
Learning as search
Inductive learning: find a concept description that fits the data
Example: rule sets as description language
Enormous, but finite, search space
Simple solution:
enumerate the concept space
eliminate descriptions that do not fit examples
surviving descriptions contain target concept
18
witten&eibe
Uses of machine Learning
Machine Learning creates an optimized model of the
concept being learned based on data or past
experience. The model is parameterized.
Learning is the execution of a computer program to
optimize the parameter values so that the model fits
data or past experience well.
Uses of learning: Predictive and/or Descriptive.
Predictive: Use the model to predict things about an
unseen example.
Descriptive: Use the model to describe the examples
seen or experiences had. This model can be used in
some problem-solving situation.
The basic principle
10^5 machine learning algorithms
Hundreds new every year
Every algorithm has three components: –
1. Hypothesis space—possible outputs ( ANN,
SVM, Decision tree, Bayes network etc )
2. Search strategy---strategy for exploring space
(optimizing an objective function)
3. Evaluation like accuracy, precision and recall,
squared error ,Likelihood • Posterior probability •
Cost / Utility , Margin
Learning system model
Testing
Input Learning
Samples Method
System
Training
Training and testing
Data acquisition Practical usage
Universal set
(unobserved)
Training set Testing set
(observed) Labels are known (unobserved)
Labels are known but not given
Performance
There are several factors affecting the performance:
Types of training provided
The form and extent of any initial background knowledge
The type of feedback provided
The learning algorithms used
Two important factors:
Modeling
Optimization
Algorithms
The success of machine learning system also depends on the
algorithms.
The algorithms control the search to find and build the
knowledge structures.
The learning algorithms should extract useful information
from training examples.
Algorithms
Supervised learning ( )
Prediction
Classification (discrete labels), Regression (real values)
Unsupervised learning ( )
Clustering
Probability distribution estimation
Finding association (in features)
Dimension reduction [NO FEEDBACK]
Semi-supervised learning
Reinforcement learning [INDIRECT FEEDBACK]
Decision making (robot, chess machine)
Types of learning task
Supervised learning
Learn to predict output when given an input vector
Who provides the correct answer?
Reinforcement learning
Learn action to maximize payoff
Not much information in a payoff signal
Payoff is often delayed
Reinforcement learning is an important area that will not be
covered in this course.
Unsupervised learning
Create an internal representation of the input e.g. form
clusters; extract features
How do we know if a representation is good?
This is the new frontier of machine learning because most big
datasets do not come with labels.
Algorithms
Supervised learning Unsupervised learning
27 Semi-supervised learning
Machine learning structure
Supervised learning
Machine learning structure
Unsupervised learning
Semi-supervised learning (SSL)
Traditional supervised learning is limited to using labeled data.
SSL also uses unlabeled data to learn.
Let (x,y) be a labeled instance and (x,ø) be an unlabeled instance.
L: a set of n labaled instances.
U: a set of m unlabeled instances.
n << m
SSL tries to use L U U to learn a predictive model.
Learning techniques
• Linear classifier
, where w is an d-dim vector (learned)
Techniques:
Perceptron
Logistic regression
Support vector machine (SVM)
Ada-line
Multi-layer perceptron (MLP)
Learning techniques
• Non-linear case
Support vector machine (SVM):
Linear to nonlinear: Feature transform and kernel function
Learning techniques
Unsupervised learning categories and techniques
Clustering
K-means clustering
Spectral clustering
Density Estimation
Gaussian mixture model (GMM)
Graphical models
Dimensionality reduction
Principal component analysis (PCA)
Factor analysis
Classification
There are three methodologies:
a) Model a classification rule directly
Examples: k-NN, linear classifier, SVM, neural nets, …
b) Model the probability of class memberships given input data
Examples: logistic regression, probabilistic neural nets (softmax),…
c) Make a probabilistic model of data within each class
Examples: naive Bayes, model-based ….
Important ML taxonomy for learning models
probabilistic models vs non-probabilistic models
discriminative models vs generative models
Resulting model is also called the hypothesis
Classification
zebra tiger rhino panda
Algorith Model lion
hippo
m
elephant
giraffe
lion penguin snake
Given a model space and an optimality criterion, a model satisfying this criterion is sought
Some optimizing criteria:
Maximizing the prediction accuracy
Minimizing the hypothesis’ size
Maximizing the hypothesis fitness to the input data
Maximizing the hypothesis interpretability
Minimizing the time complexity of prediction
Classification
Learn a method for predicting the instance class from
pre-labeled (classified) instances
Many approaches:
Regression,
Decision Trees,
Bayesian,
Neural Networks,
...
Given a set of points from classes
what is the class of new point ?
37
Linear and Non-Linear Decision
boundary
Regression
• Regression analysis is used to predict the value of one variable (the
dependent variable) on the basis of other variables (the
independent variables).
• Learn a continuous function.
• Given, the following data, can we find
the value of the output when x = 0.44?
• Goal is to predict for input x an output
f(x) that is close to the true y.
• It is generally a problem of function approximation, or
interpolation, working out the value between values that we
know.
39