MACHINE LEARNING
A JOURNEY FROM DATA TO DECISIONS
DEPARTMENT OF COMPUTER SCIENCE
Unsupervised Learning
2
Types of Unsupervised
Learning Algorithm
Clustering: Clustering is a method of
grouping the objects into clusters such
that objects with most similarities remains
into a group and has less or no similarities
with the objects of another group.
Association: An association rule is an
unsupervised learning method which is
used for finding the relationships between
variables in the large database.
3
K-Means Clustering
4
What is K-Means Clustering?
It is an iterative algorithm that divides
the unlabeled dataset into k different
clusters in such a way that each
dataset belongs only one group that
has similar properties.
5
How does the K-Means Algorithm
Work?
6
How does the K-Means Algorithm
Work?
7
How does the K-Means Algorithm
Work?
8
How does the K-Means Algorithm
Work?
9
How does the K-Means Algorithm
Work?
10
Final Clusters
11
Example K-Means Clustering
12
Types of Clustering Methods
▰Partitioning Clustering
▰Density-Based Clustering
▰Distribution Model-Based Clustering
▰Hierarchical Clustering
13
Partitioning Clustering
In this type, the dataset is divided into
a set of k groups, where K is used to
define the number of pre-defined
groups.
The cluster center is created in such a
way that the distance between the
data points of one cluster is minimum
as compared to another cluster
14
centroid.
Density-Based Clustering
It connects the highly-dense areas into
clusters, and the arbitrarily shaped
distributions are formed as long as the
dense region can be connected.
This algorithm does it by identifying
different clusters in the dataset and
connects the areas of high densities into
clusters.
The dense areas in data space are divided
from each other by sparser areas. 15
Distribution Model-Based
Clustering
The data is divided based on
the probability of how a
dataset belongs to a
particular distribution.
The grouping is done by
assuming some distributions
commonly Gaussian
Distribution.
16
Hierarchical Clustering
In this technique, the dataset is
divided into clusters to create a tree-
like structure, which is also called
a dendrogram.
The observations or any number of
clusters can be selected by cutting
the tree at the correct level.
17
Machine Learning Process
18
Association
19
Apriori Algorithm
20
Steps for Apriori Algorithm
▰Step-1: Determine the support of item sets in the transactional
database, and select the minimum support and confidence.
▰Step-2: Take all supports in the transaction with higher support
value than the minimum or selected support value.
▰Step-3: Find all the rules of these subsets that have higher
confidence value than the threshold or minimum confidence.
▰Step-4: Sort the rules as the decreasing order of lift.
21
Apriori Algorithm Working
Suppose we have the following
dataset that has various
transactions, and from this
dataset, we need to find the
frequent item sets and generate
the association rules using the
Apriori algorithm
22
Step-1: Calculating C1 and L1
Candidate set or C1. frequent item set L1
23
Step-2: Candidate Generation C2,
and L2
Candidate Generation C2 frequent item set L2
24
Step-3: Candidate generation C3,
and L3
Candidate Generation C3
As we can see from the above C3
table, there is only one combination
of item set that has support count
equal to the minimum support
count.
So, the L3 will have only one
combination, i.e., {A, B, C}.
25
Step-4: Finding the association
rules for the subsets
Rules Support Confidence As the given threshold
or minimum confidence
A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50% is 50%, so the first
three rules
B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50% A ^B → C,
B^C → A,
A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50% and
A^C → B
C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40% can be considered as
the strong association
A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%
rules for the given
problem.
B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%
26
Splitting the Dataset - Holdout
27
Stratified Sampling
28
Underfitting and Overfitting
29
Bias vs Variance
• Bias is the difference between observed value and the predicted value.
• Variance is defined as the difference in performance on the training set vs on the
test set.
30
31
32
33
34
Bias vs Variance
We generally want to minimize both
bias and variance i.e build a model
which not only fits the training data
well but also generalizes well on
test/validation data.
35
Enrich the Dataset
36
Improve Model Efficiency –
K-Fold Testing
37
Model Selection
38
Anaconda Environment
39
Value Addition
40
Sample Dataset - Iris
41
Dataset Types
42
Facets of data
■ Structured
■ Unstructured
■ Natural language
■ Machine-generated
■ Graph-based
■ Audio, video, and images
■ Streaming
43
Data Preprocessing
Techniques - Missing Data
Two ways to deal
with missing data:
1. By deleting the
particular row.
2. By calculating the
mean.
44
Encoding Categorical Data
45
Feature Scaling
• Scaling data means transforming it so that the values fit within some range or
scale, such as 0–100 or 0–1.
• Imagine you have an image represented as a set of RGB values ranging from 0 to
255. We can scale the range of the values from 0–255 down to a range of 0–1.
• This scaling process will not affect the algorithm output since every value is scaled
in the same way.
• But it can speed up the training process, because now the algorithm only needs to
handle numbers less than or equal to 1.
46
Example Dataset
47
Machine Learning with R
48
Datasets Resources
49
Open Data Resources
50
Technologies
Tools for Data Science
52
Applications
Image Processing
54
Banking and Finance
55
Sports
56
Digital Advertisements
57
Health Care
58
Speech Recognition
59
Internet Search
60
Recommender
System
61
Gaming
62
Augmented Reality
63
Self-Driving Cars
64
Robots
65
Questions & Answers Session
66