KEMBAR78
Machine learning algorithms | PPTX
Machine Learning
Group IX
What is machine learning?
● Give learning abilities to computers rather than defining all states
● Uses subfields of AI - computational learning theory and pattern recognition
● Make computer programs work on two special stages “Train” and “Predict”
2
Machine learning vs conditional programming
Conditional programming uses the simple if-then
else rules
Problem : Detect flower name by its features
Conditional approach - use if-else rules for all states
AI approach - Train ML model and predict the result.
3
Supervised learning
Supervised learning is the machine learning task of inferring a function from
labeled training data. The training data consist of a set of training examples.
4
Supervised learning algorithms
Decision trees
Naive bayes
K-nearest
5
Train Predict
Decision Tree
6
1. Decision Tree
Decision tree builds
classification model using tree
structure.
It breaks down a dataset into
smaller and smaller subsets.
Finding the optimal decision
tree is np-hard
So we use greedy technique
7
Decision tree algorithm
1. starting with whole training data
2. select attribute or value along dimension that gives “best” split
3. create child nodes based on split
4. recurse on each child using child data until a stopping criterion is reached
• all examples have same class - Entropy is 0
• amount of data is too small - < Min_samples_split
• tree too large
Problem: How do we choose the “best” attribute?
8
Simple Example
Weekend (Example) Weather Parents Money Decision (Category)
W1 Sunny Yes Rich Cinema
W2 Sunny No Rich Tennis
W3 Windy Yes Rich Cinema
W4 Rainy Yes Poor Cinema
W5 Rainy No Rich Stay in
W6 Rainy Yes Poor Cinema
W7 Windy No Poor Cinema
W8 Windy No Rich Shopping
W9 Windy Yes Rich Cinema
W10 Sunny No Rich Tennis
9
Python code
10
Decision tree
When Parent is the splitter entropy is
1.571
Parameters
Criterion = entropy*, gini(default)
Splitter = best(default)*, random
Min_samples_split = 2* (default)
* - used in here
11
How prediction works
Today is windy. I have money and parents not
at home. Predict what I will do??
Weather = “Windy” 1
Parent = “No” 0
Money = “Rich” 1
classified=[0, 1, 0, 0] I may start shopping!
12
Decision tree for large dataset
Sklearn iris data set
13
Naive bayes
14
2. Naive bayes
It is a classification technique based on Bayes’ Theorem with an assumption
of independence among predictors.
Primarily used for text classification which involves high dimensional training
data sets.
Example : Spam filtration, Sentimental analysis, and classifying news
articles.
Bayes theorem provides a way of calculating posterior probability P(c|x) from
P(c), P(x) and P(x|c).
15
P(c|x) is the posterior probability of class (c,target) given predictor (x,
attributes).
● P(c) is the prior probability of class.
● P(x|c) is the likelihood which is the probability of predictor given class.16
How Naive Bayes algorithm works?
Example :
Take training data set of weather and corresponding target variable ‘Play’
(suggesting possibilities of playing). Then classify whether players will play
or not based on weather condition.
Let’s follow the below steps to perform it…
17
Steps:
1. Convert the data set into a frequency table.
2. Create Likelihood table by finding the probabilities. (Overcast
probability=0.29 and probability of playing is 0.64)
18
3. Use Naive bayesian equation to calculate the posterior probability for
each class. The class with the highest posterior probability is the outcome
of prediction.
Problem: Players will play if weather is sunny. Is this statement is correct?
Solution: Solve it using the method of posterior probability.
P(Yes|Sunny)=P(Sunny|Yes)*P(Yes) / P(Sunny)
Here, P(Sunny|Yes)=3/9=0.33
P(Sunny)=5/14=0.36,
P(Yes)=9/14=0.64
P(Yes|Sunny)=0.33*0.64/0.36=0.60
19
Python Code
20
Output :
21
k - Nearest neighbour
22
3. k-Nearest Neighbour
Introduction
The KNN algorithm is a robust and versatile classifier that is often
used as a benchmark for more complex classifiers such as Artificial
Neural Networks (ANN) and Support Vector Machines (SVM).
Despite its simplicity, KNN can outperform more powerful classifiers
and is used in a variety of applications such as economic forecasting,
data compression and genetics.
23
What is KNN?
KNN falls in the supervised learning family of algorithms. Informally,
this means that we are given a labelled dataset consisting of training
observations (x,y)(x,y) and would like to capture the relationship
between xx and yy. More formally, our goal is to learn a function
h:X→Yh:X→Y so that given an unseen observation xx, h(x)h(x) can
confidently predict the corresponding output.
● KNN is non-parametric, instance-based and used in a supervised
learning setting.
● Minimal training but expensive testing. 24
How does KNN work?
The K-nearest neighbor algorithm essentially boils down to forming a majority vote
between the K most similar instances to a given “unseen” observation. Similarity is
defined according to a distance metric between two data points. A popular choice
is the Euclidean distance given by
25
How it works(cont...)
1. Assign k value preferably a small odd number.
2. Find the closest number of k points.
3. Assign the new point from the majority of classes.
26
How it works(cont...)
27
When K is small, we are restraining the region of a given prediction and forcing
our classifier to be “more blind” to the overall distribution. A small value for K
provides the most flexible fit, which will have low bias but high variance.
Graphically, our decision boundary will be more jagged.
28
On the other hand, a higher K averages more voters in each prediction and hence
is more resilient to outliers. Larger values of K will have smoother decision
boundaries which means lower variance but increased bias.
29
Exploring KNN in Code
30
Clustering
31
Unsupervised learning - Clustering
● organization of unlabeled data
into similarity groups
● Three types of clustering
techniques
Hierarchical
Partitional
Bayesian
32
Clustering Algorithms
K-means
● Partitional clustering algorithm
● Choose k(random) data points(seeds) to be the initial centroids
● Assign each data points to the closest centroid
33
K means
34
4. K-means
Algorithm
● Decide value for k
● Initialize the k cluster centers
● Assigning objects into nearest clusters
● Re-estimate the cluster centers
● If objects are not change the membership,exit and go to fourth step
35
Step 1
36
Step 2
37
Step 3
38
Step 4
39
Step 5
40
Python Code
Output Labels [0 0 1 1]
Predicted Label [0]
41
Output Labels [1 1 0 0]
Predicted Label [1]
42

Machine learning algorithms

  • 1.
  • 2.
    What is machinelearning? ● Give learning abilities to computers rather than defining all states ● Uses subfields of AI - computational learning theory and pattern recognition ● Make computer programs work on two special stages “Train” and “Predict” 2
  • 3.
    Machine learning vsconditional programming Conditional programming uses the simple if-then else rules Problem : Detect flower name by its features Conditional approach - use if-else rules for all states AI approach - Train ML model and predict the result. 3
  • 4.
    Supervised learning Supervised learningis the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. 4
  • 5.
    Supervised learning algorithms Decisiontrees Naive bayes K-nearest 5 Train Predict
  • 6.
  • 7.
    1. Decision Tree Decisiontree builds classification model using tree structure. It breaks down a dataset into smaller and smaller subsets. Finding the optimal decision tree is np-hard So we use greedy technique 7
  • 8.
    Decision tree algorithm 1.starting with whole training data 2. select attribute or value along dimension that gives “best” split 3. create child nodes based on split 4. recurse on each child using child data until a stopping criterion is reached • all examples have same class - Entropy is 0 • amount of data is too small - < Min_samples_split • tree too large Problem: How do we choose the “best” attribute? 8
  • 9.
    Simple Example Weekend (Example)Weather Parents Money Decision (Category) W1 Sunny Yes Rich Cinema W2 Sunny No Rich Tennis W3 Windy Yes Rich Cinema W4 Rainy Yes Poor Cinema W5 Rainy No Rich Stay in W6 Rainy Yes Poor Cinema W7 Windy No Poor Cinema W8 Windy No Rich Shopping W9 Windy Yes Rich Cinema W10 Sunny No Rich Tennis 9
  • 10.
  • 11.
    Decision tree When Parentis the splitter entropy is 1.571 Parameters Criterion = entropy*, gini(default) Splitter = best(default)*, random Min_samples_split = 2* (default) * - used in here 11
  • 12.
    How prediction works Todayis windy. I have money and parents not at home. Predict what I will do?? Weather = “Windy” 1 Parent = “No” 0 Money = “Rich” 1 classified=[0, 1, 0, 0] I may start shopping! 12
  • 13.
    Decision tree forlarge dataset Sklearn iris data set 13
  • 14.
  • 15.
    2. Naive bayes Itis a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. Primarily used for text classification which involves high dimensional training data sets. Example : Spam filtration, Sentimental analysis, and classifying news articles. Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). 15
  • 16.
    P(c|x) is theposterior probability of class (c,target) given predictor (x, attributes). ● P(c) is the prior probability of class. ● P(x|c) is the likelihood which is the probability of predictor given class.16
  • 17.
    How Naive Bayesalgorithm works? Example : Take training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Then classify whether players will play or not based on weather condition. Let’s follow the below steps to perform it… 17
  • 18.
    Steps: 1. Convert thedata set into a frequency table. 2. Create Likelihood table by finding the probabilities. (Overcast probability=0.29 and probability of playing is 0.64) 18
  • 19.
    3. Use Naivebayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction. Problem: Players will play if weather is sunny. Is this statement is correct? Solution: Solve it using the method of posterior probability. P(Yes|Sunny)=P(Sunny|Yes)*P(Yes) / P(Sunny) Here, P(Sunny|Yes)=3/9=0.33 P(Sunny)=5/14=0.36, P(Yes)=9/14=0.64 P(Yes|Sunny)=0.33*0.64/0.36=0.60 19
  • 20.
  • 21.
  • 22.
    k - Nearestneighbour 22
  • 23.
    3. k-Nearest Neighbour Introduction TheKNN algorithm is a robust and versatile classifier that is often used as a benchmark for more complex classifiers such as Artificial Neural Networks (ANN) and Support Vector Machines (SVM). Despite its simplicity, KNN can outperform more powerful classifiers and is used in a variety of applications such as economic forecasting, data compression and genetics. 23
  • 24.
    What is KNN? KNNfalls in the supervised learning family of algorithms. Informally, this means that we are given a labelled dataset consisting of training observations (x,y)(x,y) and would like to capture the relationship between xx and yy. More formally, our goal is to learn a function h:X→Yh:X→Y so that given an unseen observation xx, h(x)h(x) can confidently predict the corresponding output. ● KNN is non-parametric, instance-based and used in a supervised learning setting. ● Minimal training but expensive testing. 24
  • 25.
    How does KNNwork? The K-nearest neighbor algorithm essentially boils down to forming a majority vote between the K most similar instances to a given “unseen” observation. Similarity is defined according to a distance metric between two data points. A popular choice is the Euclidean distance given by 25
  • 26.
    How it works(cont...) 1.Assign k value preferably a small odd number. 2. Find the closest number of k points. 3. Assign the new point from the majority of classes. 26
  • 27.
  • 28.
    When K issmall, we are restraining the region of a given prediction and forcing our classifier to be “more blind” to the overall distribution. A small value for K provides the most flexible fit, which will have low bias but high variance. Graphically, our decision boundary will be more jagged. 28
  • 29.
    On the otherhand, a higher K averages more voters in each prediction and hence is more resilient to outliers. Larger values of K will have smoother decision boundaries which means lower variance but increased bias. 29
  • 30.
  • 31.
  • 32.
    Unsupervised learning -Clustering ● organization of unlabeled data into similarity groups ● Three types of clustering techniques Hierarchical Partitional Bayesian 32
  • 33.
    Clustering Algorithms K-means ● Partitionalclustering algorithm ● Choose k(random) data points(seeds) to be the initial centroids ● Assign each data points to the closest centroid 33
  • 34.
  • 35.
    4. K-means Algorithm ● Decidevalue for k ● Initialize the k cluster centers ● Assigning objects into nearest clusters ● Re-estimate the cluster centers ● If objects are not change the membership,exit and go to fourth step 35
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
    Python Code Output Labels[0 0 1 1] Predicted Label [0] 41
  • 42.
    Output Labels [11 0 0] Predicted Label [1] 42