0% found this document useful (0 votes)

9 views66 pages

Week 15 Lecture Notes

Uploaded by

findinngclosure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views66 pages

Week 15 Lecture Notes

Uploaded by

findinngclosure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

MACHINE LEARNING

A JOURNEY FROM DATA TO DECISIONS

DEPARTMENT OF COMPUTER SCIENCE

Unsupervised Learning

2
Types of Unsupervised
Learning Algorithm
Clustering: Clustering is a method of
grouping the objects into clusters such
that objects with most similarities remains
into a group and has less or no similarities
with the objects of another group.

Association: An association rule is an

unsupervised learning method which is
used for finding the relationships between
variables in the large database.

3
K-Means Clustering

4
What is K-Means Clustering?

It is an iterative algorithm that divides

the unlabeled dataset into k different
clusters in such a way that each
dataset belongs only one group that
has similar properties.
5
How does the K-Means Algorithm
Work?

6
How does the K-Means Algorithm
Work?

7
How does the K-Means Algorithm
Work?

8
How does the K-Means Algorithm
Work?

9
How does the K-Means Algorithm
Work?

10
Final Clusters

11
Example K-Means Clustering

12
Types of Clustering Methods

▰Partitioning Clustering

▰Density-Based Clustering

▰Distribution Model-Based Clustering

▰Hierarchical Clustering
13
Partitioning Clustering

In this type, the dataset is divided into

a set of k groups, where K is used to
define the number of pre-defined
groups.

The cluster center is created in such a

way that the distance between the
data points of one cluster is minimum
as compared to another cluster
14
centroid.
Density-Based Clustering

It connects the highly-dense areas into

clusters, and the arbitrarily shaped
distributions are formed as long as the
dense region can be connected.

This algorithm does it by identifying

different clusters in the dataset and
connects the areas of high densities into
clusters.

The dense areas in data space are divided

from each other by sparser areas. 15
Distribution Model-Based
Clustering

The data is divided based on

the probability of how a
dataset belongs to a
particular distribution.

The grouping is done by

assuming some distributions
commonly Gaussian
Distribution.
16
Hierarchical Clustering

In this technique, the dataset is

divided into clusters to create a tree-
like structure, which is also called
a dendrogram.

The observations or any number of

clusters can be selected by cutting
the tree at the correct level.

17
Machine Learning Process

18
Association

19
Apriori Algorithm

20
Steps for Apriori Algorithm

▰Step-1: Determine the support of item sets in the transactional

database, and select the minimum support and confidence.
▰Step-2: Take all supports in the transaction with higher support
value than the minimum or selected support value.
▰Step-3: Find all the rules of these subsets that have higher
confidence value than the threshold or minimum confidence.
▰Step-4: Sort the rules as the decreasing order of lift.
21
Apriori Algorithm Working

Suppose we have the following

dataset that has various
transactions, and from this
dataset, we need to find the
frequent item sets and generate
the association rules using the
Apriori algorithm

22
Step-1: Calculating C1 and L1

Candidate set or C1. frequent item set L1

23
Step-2: Candidate Generation C2,
and L2

Candidate Generation C2 frequent item set L2

24
Step-3: Candidate generation C3,
and L3

Candidate Generation C3
As we can see from the above C3
table, there is only one combination
of item set that has support count
equal to the minimum support
count.
So, the L3 will have only one
combination, i.e., {A, B, C}.

25
Step-4: Finding the association
rules for the subsets

Rules Support Confidence As the given threshold

or minimum confidence
A ^B → C 2 Sup{(A ^B) ^C}/sup(A ^B)= 2/4=0.5=50% is 50%, so the first
three rules
B^C → A 2 Sup{(B^C) ^A}/sup(B ^C)= 2/4=0.5=50% A ^B → C,
B^C → A,
A^C → B 2 Sup{(A ^C) ^B}/sup(A ^C)= 2/4=0.5=50% and
A^C → B
C→ A ^B 2 Sup{(C^( A ^B)}/sup(C)= 2/5=0.4=40% can be considered as
the strong association
A→ B^C 2 Sup{(A^( B ^C)}/sup(A)= 2/6=0.33=33.33%
rules for the given
problem.
B→ B^C 2 Sup{(B^( B ^C)}/sup(B)= 2/7=0.28=28%

26
Splitting the Dataset - Holdout

27
Stratified Sampling

28
Underfitting and Overfitting

29
Bias vs Variance

• Bias is the difference between observed value and the predicted value.
• Variance is defined as the difference in performance on the training set vs on the
test set.
30
31
32
33
34
Bias vs Variance

We generally want to minimize both

bias and variance i.e build a model
which not only fits the training data
well but also generalizes well on
test/validation data.
35
Enrich the Dataset

36
Improve Model Efficiency –
K-Fold Testing

37
Model Selection

38
Anaconda Environment

39
Value Addition

40
Sample Dataset - Iris

41
Dataset Types

42
Facets of data

■ Structured
■ Unstructured
■ Natural language
■ Machine-generated
■ Graph-based
■ Audio, video, and images
■ Streaming
43
Data Preprocessing
Techniques - Missing Data
Two ways to deal
with missing data:

1. By deleting the
particular row.
2. By calculating the
mean.

44
Encoding Categorical Data

45
Feature Scaling

• Scaling data means transforming it so that the values fit within some range or
scale, such as 0–100 or 0–1.

• Imagine you have an image represented as a set of RGB values ranging from 0 to
255. We can scale the range of the values from 0–255 down to a range of 0–1.

• This scaling process will not affect the algorithm output since every value is scaled
in the same way.

• But it can speed up the training process, because now the algorithm only needs to
handle numbers less than or equal to 1.

46
Example Dataset

47
Machine Learning with R

48
Datasets Resources

49
Open Data Resources

50
Technologies
Tools for Data Science

52
Applications
Image Processing

54
Banking and Finance

55
Sports

56
Digital Advertisements

57
Health Care

58
Speech Recognition

59
Internet Search

60
Recommender
System

61
Gaming

62
Augmented Reality

63
Self-Driving Cars

64
Robots

65
Questions & Answers Session

Unit 6
No ratings yet
Unit 6
22 pages
Mooc Part 2
No ratings yet
Mooc Part 2
8 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
11 pages
49 Machine Learning
No ratings yet
49 Machine Learning
300 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
DM - MP
No ratings yet
DM - MP
15 pages
Data Mining
No ratings yet
Data Mining
68 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Data Mining Overview & Applications
No ratings yet
Data Mining Overview & Applications
47 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
Introduction To Classification and Classification Algorithms
100% (1)
Introduction To Classification and Classification Algorithms
9 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
Chapter5 - Machine Learning
No ratings yet
Chapter5 - Machine Learning
37 pages
Ai Word Document Session 2 Detailed Exaple
No ratings yet
Ai Word Document Session 2 Detailed Exaple
15 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
30 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
BML Answer Key
No ratings yet
BML Answer Key
21 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Machine Learning Colloquium Guide
No ratings yet
Machine Learning Colloquium Guide
12 pages
DM Cheat Sheet
No ratings yet
DM Cheat Sheet
7 pages
Unsupervised ML
No ratings yet
Unsupervised ML
15 pages
DM 100
No ratings yet
DM 100
17 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Slides Courtesy: Ling Chen lchen@L3S.de
No ratings yet
Slides Courtesy: Ling Chen lchen@L3S.de
42 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
Chapter - 4
No ratings yet
Chapter - 4
14 pages
Unit 2
No ratings yet
Unit 2
57 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
9 pages
Classification
No ratings yet
Classification
50 pages
ML Lect1
100% (1)
ML Lect1
51 pages
6th - SEM Machine Learning Notes PDF
100% (1)
6th - SEM Machine Learning Notes PDF
36 pages
Learning AI
No ratings yet
Learning AI
34 pages
Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Module 04
No ratings yet
Module 04
75 pages
Classification & Prediction Guide
100% (1)
Classification & Prediction Guide
67 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
13 pages
Introduction to Data Mining Basics
No ratings yet
Introduction to Data Mining Basics
47 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
ML 4,5
No ratings yet
ML 4,5
8 pages
Fifth Chapter Classification Clustering
No ratings yet
Fifth Chapter Classification Clustering
16 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
B.Tech CSE Web Tech Lab File
No ratings yet
B.Tech CSE Web Tech Lab File
1 page
Nagios XI Passive Check Setup Guide
No ratings yet
Nagios XI Passive Check Setup Guide
11 pages
Chatbot
No ratings yet
Chatbot
12 pages
Symbol Technologies PLC: University of Gondar Netapp Storage Implementation For Uog - Ucs Project As-Built Documentation
No ratings yet
Symbol Technologies PLC: University of Gondar Netapp Storage Implementation For Uog - Ucs Project As-Built Documentation
37 pages
JavaScript Basics for Web Developers
No ratings yet
JavaScript Basics for Web Developers
90 pages
EEG User Manual for Clinicians
No ratings yet
EEG User Manual for Clinicians
36 pages
OPC UA vs MQTT: Interoperability Challenges
No ratings yet
OPC UA vs MQTT: Interoperability Challenges
7 pages
Crowdsourcing Versus Mobile Network Testing - Ac - en - 5216 4204 92 - v0100
No ratings yet
Crowdsourcing Versus Mobile Network Testing - Ac - en - 5216 4204 92 - v0100
4 pages
Robotic WiFi Localization Advances
No ratings yet
Robotic WiFi Localization Advances
11 pages
EPA 2006 Architecture Standard and Guidance
No ratings yet
EPA 2006 Architecture Standard and Guidance
41 pages
Engineering Students' Sentiment Analysis
No ratings yet
Engineering Students' Sentiment Analysis
20 pages
The Study On Blockchain Based Library Management and Its Characterization
No ratings yet
The Study On Blockchain Based Library Management and Its Characterization
3 pages
Lists in Python Language
No ratings yet
Lists in Python Language
15 pages
90204-1023DEJ E Series External Lo Manual PDF
No ratings yet
90204-1023DEJ E Series External Lo Manual PDF
92 pages
Module 01: Django Introduction
No ratings yet
Module 01: Django Introduction
3 pages
Multi-Agent Systems: Tom Holvoet, Hoang Tung Dinh
No ratings yet
Multi-Agent Systems: Tom Holvoet, Hoang Tung Dinh
4 pages
JavaScript Notes
No ratings yet
JavaScript Notes
30 pages
PyCharm Reference Card
100% (1)
PyCharm Reference Card
2 pages
GPS Tracker Feature Comparison
No ratings yet
GPS Tracker Feature Comparison
1 page
Document Control Procedure Guide
No ratings yet
Document Control Procedure Guide
16 pages
Viewpoints For Requirements Elicitation: A Practical Approach
No ratings yet
Viewpoints For Requirements Elicitation: A Practical Approach
8 pages
D.I.T Project
100% (1)
D.I.T Project
21 pages
Day 5 Slot 3 Mid - SPR 24 25
No ratings yet
Day 5 Slot 3 Mid - SPR 24 25
100 pages
Remote-Frontend Jobs
No ratings yet
Remote-Frontend Jobs
1 page
Merise - MCP, MLC, MLD - Engl
100% (1)
Merise - MCP, MLC, MLD - Engl
7 pages
Powertech Controller - SC503 - Datasheet - V1.0 - 210716
No ratings yet
Powertech Controller - SC503 - Datasheet - V1.0 - 210716
2 pages
Compro Internet Services Provider
No ratings yet
Compro Internet Services Provider
21 pages
Ug1532 Zcu670 Eval BD - WTMKX
No ratings yet
Ug1532 Zcu670 Eval BD - WTMKX
94 pages
Smart Site Management Unit SCC800 Datasheet (Overseas Version)
100% (1)
Smart Site Management Unit SCC800 Datasheet (Overseas Version)
2 pages
HackLikePro v3
0% (1)
HackLikePro v3
72 pages

Week 15 Lecture Notes

Uploaded by

Week 15 Lecture Notes

Uploaded by

MACHINE LEARNING

A JOURNEY FROM DATA TO DECISIONS

DEPARTMENT OF COMPUTER SCIENCE

Association: An association rule is an

It is an iterative algorithm that divides

▰Distribution Model-Based Clustering

In this type, the dataset is divided into

The cluster center is created in such a

It connects the highly-dense areas into

This algorithm does it by identifying

The dense areas in data space are divided

The data is divided based on

The grouping is done by

In this technique, the dataset is

The observations or any number of

▰Step-1: Determine the support of item sets in the transactional

Suppose we have the following

Candidate set or C1. frequent item set L1

Candidate Generation C2 frequent item set L2

Rules Support Confidence As the given threshold

We generally want to minimize both

You might also like