KEMBAR78
Machine Learning using Support Vector Machine | PPTX
Machine Learning
using
Support Vector Machine
Seminar presentation by:
Mohammad Mohsin Ul Haq
CSE/02/13
What is Machine Learning?
• Machine learning is the subfield of computer science that “gives computers the ability
to learn without being explicitly programmed”.
• Tom M. Mitchell provided a widely quoted, more formal definition: “A computer
program is said to learn from experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T, as measured by P, improves
with experience E.”
Types of Machine Learning
• Supervised Learning
Inferring a function from labelled training data. A supervised learning algorithm
analyses the training data (a list of input and their correct output) and produces
an appropriate function, which can be used for mapping new examples.
• Unsupervised Learning
Inferring a function to describe hidden structure from unlabelled data. No labels
are given to the learning algorithm, leaving it on its own to find structure in its
input.
• Reinforcement Learning
Concerned with how software agents ought to take actions in an environment so
as to maximize some notion of cumulative reward.
Types of Supervised Learning
• Regression
In a regression problem, we are trying to predict results within a continuous
output, meaning that we are trying to map input variables to some continuous
function. For example, predicting housing prices where the output is a real
number.
• Classification
In a classification problem, we are instead trying to predict results in a discrete
output. In other words, we are trying to map input variables into discrete
categories. For example, predicting whether a particular email is spam or not.
Machine Learning Tools and Techniques
• Linear Regression
Here the predicted function is of linear degree. It’s the most common type of
regression as it usually fits most of the regression problems. It uses minimization
of its Cost Function using Gradient Descent, etc. for its working. Linear
regression can be single variable or multivariate. Higher degree regression
technique is called Polynomial Regression.
• Logistic Regression
This technique is used for classification problems. Here, the predicted function
has a discrete range. It can be considered as a modified form of Linear Regression
which uses Sigmoid function for its task.
• Neural Network
This tool can be used both for regression as well as for classification. It consists of one
or more layers of computational units between the input and output for modelling
the problem using Backpropagation Algorithm. For complex problems with many
features to model, neural network provide efficient solution compared to other
techniques.
• Support Vector Machine
While Neural Network outputs the probability by which a particular input is close to
the output, SVM is a non-probabilistic binary linear classifier. It has the ability to
linearly separate the classes by a large margin. Add to it the Kernel, and SVM
becomes one of the most powerful classifier capable of handling infinite dimensional
feature vectors.
Support Vector Machine
The Hypothesis function for an SVM is same as that of logistic regression:
ℎ 𝜃 𝑥 =
1
1 + 𝑒−𝜃 𝑇 𝑥
The difference lies in the Cost Function:
𝐽 𝜃 = 𝐶
𝑖=1
𝑚
[𝑦 𝑖 𝑐𝑜𝑠𝑡1(𝜃 𝑇 𝑥(𝑖)) + 1 − 𝑦 𝑖 𝑐𝑜𝑠𝑡0(𝜃 𝑇 𝑥(𝑖))]
Where 𝑐𝑜𝑠𝑡0 and 𝑐𝑜𝑠𝑡1 are defined as
𝑐𝑜𝑠𝑡0 𝑧 =
0, 𝑖𝑓 𝑧 ≤ −1
𝑧 + 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑐𝑜𝑠𝑡1 𝑧 =
0, 𝑖𝑓 𝑧 ≥ 1
−𝑧 + 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Thus the learned parameter vector is obtained as:
𝜃 = min
𝜃
𝐽(𝜃) +
1
2
𝑖=1
𝑛
𝜃𝑗
2
where the minimization functionality can be obtained via Gradient Descent, Conjugate
gradient, BFGS, L-BFGS, etc. and the second term is for Regularization to prevent Overfitting.
Kernel
• A Kernel is a “similarity function” that we provide to a machine learning algorithm,
most commonly, an SVM. It takes two inputs and outputs how similar they are. The
means by which this similarity is determined differentiates one kernel function from
another. It is a shortcut that helps us do certain calculation faster which otherwise
would involve computations in higher dimensional space. Examples include Gaussian
Kernel, String Kernel, Chi-Squared Kernel, Histogram Intersection Kernel, etc.
• Kernel methods owe their name to the use of kernel functions, which enable them to
operate in a high-dimensional, implicit feature space without ever computing the
coordinates of the data in that space, but rather by simply computing the inner
products between the images of all pairs of data in the feature space. This operation is
often computationally cheaper than the explicit computation of the coordinates. This
approach is called the "kernel trick".
SVM Intuition
• We have 2 colours of balls on the table that we want to separate.
• We get a stick and put it on the table, this works
pretty well right?
• Some villain comes and places more balls on the
table, it kind of works but one of the balls is on the
wrong side and there is probably a better place to put
the stick now.
• SVMs try to put the stick in the best possible place by
having as big a gap on either side of the stick as
possible.
• Now when the villain returns the stick is still in a
pretty good spot.
• There is another trick in the SVM toolbox that is even
more important. Say the villain has seen how good
you are with a stick so he gives you a new challenge.
• There’s no stick in the world that will let you split
those balls well, so what do you do? You flip the table
of course! Throwing the balls into the air. Then, with
your pro ninja skills, you grab a sheet of paper and
slip it between the balls.
• Now, looking at the balls from where the villain is
standing, they balls will look split by some curvy line.
“The balls can be considered as data, the stick a classifier,
the biggest gap trick an optimization, flipping the table
kernelling and the piece of paper a hyperplane”
THAT’S SVM FOR YOU!

Machine Learning using Support Vector Machine

  • 1.
    Machine Learning using Support VectorMachine Seminar presentation by: Mohammad Mohsin Ul Haq CSE/02/13
  • 2.
    What is MachineLearning? • Machine learning is the subfield of computer science that “gives computers the ability to learn without being explicitly programmed”. • Tom M. Mitchell provided a widely quoted, more formal definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”
  • 3.
    Types of MachineLearning • Supervised Learning Inferring a function from labelled training data. A supervised learning algorithm analyses the training data (a list of input and their correct output) and produces an appropriate function, which can be used for mapping new examples. • Unsupervised Learning Inferring a function to describe hidden structure from unlabelled data. No labels are given to the learning algorithm, leaving it on its own to find structure in its input. • Reinforcement Learning Concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
  • 4.
    Types of SupervisedLearning • Regression In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. For example, predicting housing prices where the output is a real number. • Classification In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories. For example, predicting whether a particular email is spam or not.
  • 5.
    Machine Learning Toolsand Techniques • Linear Regression Here the predicted function is of linear degree. It’s the most common type of regression as it usually fits most of the regression problems. It uses minimization of its Cost Function using Gradient Descent, etc. for its working. Linear regression can be single variable or multivariate. Higher degree regression technique is called Polynomial Regression. • Logistic Regression This technique is used for classification problems. Here, the predicted function has a discrete range. It can be considered as a modified form of Linear Regression which uses Sigmoid function for its task.
  • 6.
    • Neural Network Thistool can be used both for regression as well as for classification. It consists of one or more layers of computational units between the input and output for modelling the problem using Backpropagation Algorithm. For complex problems with many features to model, neural network provide efficient solution compared to other techniques. • Support Vector Machine While Neural Network outputs the probability by which a particular input is close to the output, SVM is a non-probabilistic binary linear classifier. It has the ability to linearly separate the classes by a large margin. Add to it the Kernel, and SVM becomes one of the most powerful classifier capable of handling infinite dimensional feature vectors.
  • 7.
    Support Vector Machine TheHypothesis function for an SVM is same as that of logistic regression: ℎ 𝜃 𝑥 = 1 1 + 𝑒−𝜃 𝑇 𝑥 The difference lies in the Cost Function: 𝐽 𝜃 = 𝐶 𝑖=1 𝑚 [𝑦 𝑖 𝑐𝑜𝑠𝑡1(𝜃 𝑇 𝑥(𝑖)) + 1 − 𝑦 𝑖 𝑐𝑜𝑠𝑡0(𝜃 𝑇 𝑥(𝑖))]
  • 8.
    Where 𝑐𝑜𝑠𝑡0 and𝑐𝑜𝑠𝑡1 are defined as 𝑐𝑜𝑠𝑡0 𝑧 = 0, 𝑖𝑓 𝑧 ≤ −1 𝑧 + 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑐𝑜𝑠𝑡1 𝑧 = 0, 𝑖𝑓 𝑧 ≥ 1 −𝑧 + 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Thus the learned parameter vector is obtained as: 𝜃 = min 𝜃 𝐽(𝜃) + 1 2 𝑖=1 𝑛 𝜃𝑗 2 where the minimization functionality can be obtained via Gradient Descent, Conjugate gradient, BFGS, L-BFGS, etc. and the second term is for Regularization to prevent Overfitting.
  • 9.
    Kernel • A Kernelis a “similarity function” that we provide to a machine learning algorithm, most commonly, an SVM. It takes two inputs and outputs how similar they are. The means by which this similarity is determined differentiates one kernel function from another. It is a shortcut that helps us do certain calculation faster which otherwise would involve computations in higher dimensional space. Examples include Gaussian Kernel, String Kernel, Chi-Squared Kernel, Histogram Intersection Kernel, etc. • Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. This approach is called the "kernel trick".
  • 10.
    SVM Intuition • Wehave 2 colours of balls on the table that we want to separate.
  • 11.
    • We geta stick and put it on the table, this works pretty well right?
  • 12.
    • Some villaincomes and places more balls on the table, it kind of works but one of the balls is on the wrong side and there is probably a better place to put the stick now.
  • 13.
    • SVMs tryto put the stick in the best possible place by having as big a gap on either side of the stick as possible.
  • 14.
    • Now whenthe villain returns the stick is still in a pretty good spot.
  • 15.
    • There isanother trick in the SVM toolbox that is even more important. Say the villain has seen how good you are with a stick so he gives you a new challenge.
  • 16.
    • There’s nostick in the world that will let you split those balls well, so what do you do? You flip the table of course! Throwing the balls into the air. Then, with your pro ninja skills, you grab a sheet of paper and slip it between the balls.
  • 17.
    • Now, lookingat the balls from where the villain is standing, they balls will look split by some curvy line.
  • 18.
    “The balls canbe considered as data, the stick a classifier, the biggest gap trick an optimization, flipping the table kernelling and the piece of paper a hyperplane”
  • 19.