Data Mining Basics for Beginners

The document provides an overview of data mining, focusing on machine learning and classification techniques. It explains the types of learning, including supervised and unsupervised learning, and details the classification process, its applications, and various techniques used. Additionally, it discusses the advantages and disadvantages of classification in data mining.

Uploaded by

ssri62439

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views20 pages

Data Mining Basics for Beginners

Uploaded by

ssri62439

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

CLASSIFICATION

BASIC CONCEPTS
IN
DATA MINING
PRESENTED BY
JAYASRI.A
VAISHNAVI.E
FELIX RAJ.A
BALACHANDAR.R
Machine learning

 Machine learning is a subfield of artificial intelligence, which is broadly

defined as the capability of a machine to imitate intelligent human
behavior. Artificial intelligence systems are used to perform complex
tasks in a way that is similar to how humans solve problems.
 TYPES OF LEARNING:
 SUPERVISED LEARNING

 UNSUPERVISED LEARNING
Supervised learning

o It is defined by its use of labeled datasets to train algorithms that to

classify data or predict outcomes accurately.
o Supervised learning uses a training set to teach models to yield the
desired output.
o This training dataset includes inputs and correct outputs, which allow
the model to learn over time.
o The algorithm measures its accuracy through the loss function, adjusting
until the error has been sufficiently minimized.
Unsupervised learning

o Unsupervised learning, also known as unsupervised machine

learning, uses machine learning algorithms to analyze and cluster
unlabeled datasets.
o These algorithms discover hidden patterns or data groupings without
the need for human intervention.
o Its ability to discover similarities and differences in information make it
the ideal solution for exploratory data analysis, cross-selling strategies,
customer segmentation, and image recognition.
Types of supervised learning

 Classification
 Classification is a technique that aims to
reproduce class assignments. It can predict the
response value and the data is separated into
“classes”.
 Regression
 Regression is related to continuous data (value
functions). The predicted output values are real
numbers. It deals with problems such as predicting
the price of a house or the trend in the stock price at
a given time, etc.
Data analytics

 Data analytics focuses on processing and performing statistical analysis

of existing datasets. Analysts concentrate on creating methods to
capture, process, and organize data to uncover actionable insights for
current problems, and establishing the best way to present this data.
 TYPES:
 Descriptive
 Diagnostic
 Predictive
 Prescriptive.
CLASSIFICATION

 Classification is a data mining function that assigns items in a collection

to target categories or classes. The goal of classification is to accurately
predict the target class for each case in the data.
 Classification is a widely used technique in data mining and is applied in
a variety of domains, such as email filtering, sentiment analysis, and
medical diagnosis.
 Classification is the problem of identifying to which of a set of categories
(subpopulations), a new observation belongs to, on the basis of a
training set of data containing observations and whose categories
membership is known.
Examples of Classification

 Classifying credit card transactions

as legitimate or fraudulent

 Classifying secondary structures of protein

as alpha-helix, beta-sheet, or random
coil
 Categorizing news stories as finance,
weather, entertainment, sports, etc
Example :
Raw mango & ripen mango
Feature-1: Weight

Weight
Feature-2: Weight and color
color weight lable
[22-140-10] 300 Raw
[10-240-10] 330 Ripen
x
[12-235-10] 310 Ripen
[250-130- 307 Raw
10]
[80-220-20] 333 Ripen
Color

y
Weight
RAW DATASET CLASSIFICATION

Raw dataset

Features/Label

Learning Algorithm

Model
Classification Techniques

 Decision Tree Based Method

 Rule- based method
 Memory-base reasoning
 Neural networks
 Navive Bayes and Bayesian Networks
 Support Vector Machines
 Linear Regression
Steps of classification

1. Learning Step (Training Phase): Construction of Classification Model

Different Algorithms are used to build a classifier by making the model
learn using the training set available. The model has to be trained for
the prediction of accurate results.
2. Classification Step: Model used to predict class labels and testing the
constructed model on test data and hence estimate the accuracy of the
classification rules.
Training and Testing

 The goal is to produce a trained (fitted) model that generalizes well to

new, unknown data. The fitted model is evaluated using “new”
examples from the held-out datasets (validation and test datasets) to
estimate the model's accuracy in classifying new data.
 Example:
 Suppose there is a person who is sitting under a fan and the fan starts
falling on him, he should get aside in order not to get hurt. So, this is his
training part to move away. While Testing if the person sees any heavy
object coming towards him or falling on him and moves aside then the
system is tested positively and if the person does not move aside then
the system is negatively tested.
ADVANTAGES

• Mining Based Methods are cost-effective and efficient

• Helps in identifying criminal suspects
• Helps in predicting the risk of diseases
• Helps Banks and Financial Institutions to identify defaulters so
that they may approve Cards, Loan, etc.
DISADVANTAGES

 Privacy: When the data is either are chances that a company may give
some information about their customers to other vendors or use this
information for their profit.
 Accuracy Problem: Selection of Accurate model must be there in order
to get the best accuracy and result.
THANK YOU

Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
Classification in Data Mining 12
No ratings yet
Classification in Data Mining 12
7 pages
Classification in Data Mining
No ratings yet
Classification in Data Mining
14 pages
Classification: Unit-III
No ratings yet
Classification: Unit-III
90 pages
Classification
No ratings yet
Classification
15 pages
Data Science & Analytics Basics
No ratings yet
Data Science & Analytics Basics
71 pages
Classification
No ratings yet
Classification
50 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Chapter3 Classification and Prediction
No ratings yet
Chapter3 Classification and Prediction
63 pages
Data Mining Jntuh Cse R18
No ratings yet
Data Mining Jntuh Cse R18
20 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
Machine Learning-Classification
No ratings yet
Machine Learning-Classification
52 pages
Classification
No ratings yet
Classification
20 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Data Science Lecture: Classification & Regression
No ratings yet
Data Science Lecture: Classification & Regression
27 pages
8 Data Mining Concepts 2
No ratings yet
8 Data Mining Concepts 2
75 pages
Lecture 3.1.1
No ratings yet
Lecture 3.1.1
17 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Lesson Data Mining
No ratings yet
Lesson Data Mining
75 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
115 pages
Classification
No ratings yet
Classification
23 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
13 pages
Chapter 5. Classification and Prediction
No ratings yet
Chapter 5. Classification and Prediction
122 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
11 pages
18mca52c U3
No ratings yet
18mca52c U3
8 pages
26076classification - Data Mining
No ratings yet
26076classification - Data Mining
4 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
Data Mining
No ratings yet
Data Mining
73 pages
Module 3 - Classification
No ratings yet
Module 3 - Classification
9 pages
Lect 4 - Linear Classification
No ratings yet
Lect 4 - Linear Classification
14 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Data Mining
No ratings yet
Data Mining
25 pages
Classification & Prediction Guide
No ratings yet
Classification & Prediction Guide
83 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Lect 1
No ratings yet
Lect 1
38 pages
DWM Unit 3
No ratings yet
DWM Unit 3
18 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
9 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
DM Classification 1 3
No ratings yet
DM Classification 1 3
19 pages
Unit 3
No ratings yet
Unit 3
33 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Overview Basics
No ratings yet
Overview Basics
16 pages
DM - Unit-1 - Fundamentals of Data Mining
No ratings yet
DM - Unit-1 - Fundamentals of Data Mining
43 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
139 pages
Data Mining Module 3
No ratings yet
Data Mining Module 3
27 pages
Design Cal of Cememt Silo PDF
100% (1)
Design Cal of Cememt Silo PDF
176 pages
Paybooks Employee Self Service
No ratings yet
Paybooks Employee Self Service
19 pages
The 4 Disciplines of Execution Revised and Updated
No ratings yet
The 4 Disciplines of Execution Revised and Updated
8 pages
RNA & Protein Synthesis Quiz
67% (3)
RNA & Protein Synthesis Quiz
6 pages
Cumulative Test 1-9 A: Grammar
No ratings yet
Cumulative Test 1-9 A: Grammar
6 pages
Fiscal & Monetary Policy (BBA)
No ratings yet
Fiscal & Monetary Policy (BBA)
30 pages
Topic 8
No ratings yet
Topic 8
58 pages
4Q2324 C1 Drills - Hydraulics
No ratings yet
4Q2324 C1 Drills - Hydraulics
6 pages
Remedial Test Q2
No ratings yet
Remedial Test Q2
3 pages
Raman Mahadevan
No ratings yet
Raman Mahadevan
14 pages
Internal Coil Calculation - Compress4
No ratings yet
Internal Coil Calculation - Compress4
17 pages
Gold Standard Benchmark For Cisco IOS Routers. Gold Standard Benchmark Version 3.0.1
No ratings yet
Gold Standard Benchmark For Cisco IOS Routers. Gold Standard Benchmark Version 3.0.1
37 pages
Disks - RouterOS - MikroTik Documentation
No ratings yet
Disks - RouterOS - MikroTik Documentation
1 page
Winterhalter Glasswasher Operating Instructions Gs202 Gs215
No ratings yet
Winterhalter Glasswasher Operating Instructions Gs202 Gs215
19 pages
Creating Effective Test Specifications
No ratings yet
Creating Effective Test Specifications
25 pages
Step-By-Step Configuration of MRP Types in Sap PP
100% (1)
Step-By-Step Configuration of MRP Types in Sap PP
3 pages
Phyto Medicine 2002
No ratings yet
Phyto Medicine 2002
4 pages
Treat and Sow Seeds
0% (1)
Treat and Sow Seeds
12 pages
Office: of The Secretary
No ratings yet
Office: of The Secretary
8 pages
Seismic Analysis and Retrofitting of R.C.C Structure
No ratings yet
Seismic Analysis and Retrofitting of R.C.C Structure
5 pages
Footloose
No ratings yet
Footloose
22 pages
Result Declared - MJ - 2025 - 05.07.2025
No ratings yet
Result Declared - MJ - 2025 - 05.07.2025
47 pages
M.Com Marketing Analysis: Apple
No ratings yet
M.Com Marketing Analysis: Apple
19 pages
An Analysis of The Wood Sugar Assay Using HPLC PDF
No ratings yet
An Analysis of The Wood Sugar Assay Using HPLC PDF
7 pages
SEI NCE DB 2016 Kenya Clean Cooking
No ratings yet
SEI NCE DB 2016 Kenya Clean Cooking
6 pages
Unit 1
No ratings yet
Unit 1
32 pages
User Manual
No ratings yet
User Manual
2 pages
(Original PDF) Mathematical Proofs: A Transition To Advanced Mathematics 4th Edition Available Instanly
100% (2)
(Original PDF) Mathematical Proofs: A Transition To Advanced Mathematics 4th Edition Available Instanly
155 pages
Java Image Processing
No ratings yet
Java Image Processing
15 pages
Mfat Action Plan Implementation Sy 2021-2022
No ratings yet
Mfat Action Plan Implementation Sy 2021-2022
1 page