CLASSIFICATION
BASIC CONCEPTS
IN
DATA MINING
PRESENTED BY
JAYASRI.A
VAISHNAVI.E
FELIX RAJ.A
BALACHANDAR.R
Machine learning
Machine learning is a subfield of artificial intelligence, which is broadly
defined as the capability of a machine to imitate intelligent human
behavior. Artificial intelligence systems are used to perform complex
tasks in a way that is similar to how humans solve problems.
TYPES OF LEARNING:
SUPERVISED LEARNING
UNSUPERVISED LEARNING
Supervised learning
o It is defined by its use of labeled datasets to train algorithms that to
classify data or predict outcomes accurately.
o Supervised learning uses a training set to teach models to yield the
desired output.
o This training dataset includes inputs and correct outputs, which allow
the model to learn over time.
o The algorithm measures its accuracy through the loss function, adjusting
until the error has been sufficiently minimized.
Unsupervised learning
o Unsupervised learning, also known as unsupervised machine
learning, uses machine learning algorithms to analyze and cluster
unlabeled datasets.
o These algorithms discover hidden patterns or data groupings without
the need for human intervention.
o Its ability to discover similarities and differences in information make it
the ideal solution for exploratory data analysis, cross-selling strategies,
customer segmentation, and image recognition.
Types of supervised learning
Classification
Classification is a technique that aims to
reproduce class assignments. It can predict the
response value and the data is separated into
“classes”.
Regression
Regression is related to continuous data (value
functions). The predicted output values are real
numbers. It deals with problems such as predicting
the price of a house or the trend in the stock price at
a given time, etc.
Data analytics
Data analytics focuses on processing and performing statistical analysis
of existing datasets. Analysts concentrate on creating methods to
capture, process, and organize data to uncover actionable insights for
current problems, and establishing the best way to present this data.
TYPES:
Descriptive
Diagnostic
Predictive
Prescriptive.
CLASSIFICATION
Classification is a data mining function that assigns items in a collection
to target categories or classes. The goal of classification is to accurately
predict the target class for each case in the data.
Classification is a widely used technique in data mining and is applied in
a variety of domains, such as email filtering, sentiment analysis, and
medical diagnosis.
Classification is the problem of identifying to which of a set of categories
(subpopulations), a new observation belongs to, on the basis of a
training set of data containing observations and whose categories
membership is known.
Examples of Classification
Classifying credit card transactions
as legitimate or fraudulent
Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
Categorizing news stories as finance,
weather, entertainment, sports, etc
Example :
Raw mango & ripen mango
Feature-1: Weight
Weight
Feature-2: Weight and color
color weight lable
[22-140-10] 300 Raw
[10-240-10] 330 Ripen
x
[12-235-10] 310 Ripen
[250-130- 307 Raw
10]
[80-220-20] 333 Ripen
Color
y
Weight
RAW DATASET CLASSIFICATION
Raw dataset
Features/Label
Learning Algorithm
Model
Classification Techniques
Decision Tree Based Method
Rule- based method
Memory-base reasoning
Neural networks
Navive Bayes and Bayesian Networks
Support Vector Machines
Linear Regression
Steps of classification
1. Learning Step (Training Phase): Construction of Classification Model
Different Algorithms are used to build a classifier by making the model
learn using the training set available. The model has to be trained for
the prediction of accurate results.
2. Classification Step: Model used to predict class labels and testing the
constructed model on test data and hence estimate the accuracy of the
classification rules.
Training and Testing
The goal is to produce a trained (fitted) model that generalizes well to
new, unknown data. The fitted model is evaluated using “new”
examples from the held-out datasets (validation and test datasets) to
estimate the model's accuracy in classifying new data.
Example:
Suppose there is a person who is sitting under a fan and the fan starts
falling on him, he should get aside in order not to get hurt. So, this is his
training part to move away. While Testing if the person sees any heavy
object coming towards him or falling on him and moves aside then the
system is tested positively and if the person does not move aside then
the system is negatively tested.
ADVANTAGES
• Mining Based Methods are cost-effective and efficient
• Helps in identifying criminal suspects
• Helps in predicting the risk of diseases
• Helps Banks and Financial Institutions to identify defaulters so
that they may approve Cards, Loan, etc.
DISADVANTAGES
Privacy: When the data is either are chances that a company may give
some information about their customers to other vendors or use this
information for their profit.
Accuracy Problem: Selection of Accurate model must be there in order
to get the best accuracy and result.
THANK YOU