ICS322 Machine
Learning
Lect4- Classification-intro
DR. Athira B
Asst. Professor, CSE
IIIT Kottayam
Classification – Basic Concepts
Classification is a form of data analysis that extracts models
describing important data classes. Such models, called
classifiers, predict categorical (discrete, unordered) class labels.
For example, we can build a classification model to categorize
bank loan applications as either safe or risky.
Classification has numerous applications, including fraud
detection, target marketing, performance prediction,
manufacturing, and medical diagnosis.
General approach to classification is a two-step process.
In the first step, we build a classification model based on
previous data.
In the second step, we determine if the model’s accuracy is
acceptable, and if so, we use the model to classify new data
What Is Classification?
A bank loans officer needs analysis of her data to learn
which loan applicants are “safe” and which are “risky”
for the bank
the data analysis task is classification, where a model
or classifier is constructed to predict class (categorical)
labels, such as “safe” or “risky” for the loan application
data;
These categories can be represented by discrete
values, where the ordering among values has no
meaning.
Suppose that the marketing manager wants to predict
how much a given customer will spend during a sale.
This data analysis task is an example of numeric
prediction, where the model constructed predicts a
continuous-valued function, or ordered value, as
opposed to a class label.
This model is a predictor. Also called Regression
analysis
Classification and numeric prediction are the two major
types of prediction problems.
What is a linear classifier?
• Linear classifier
• A classifier is a supervised machine learning algorithm used to solve
classification problems. Linear classifiers are the simplest ones that
are made by linear functions for classifying observations into different
categories.
What is Supervised Machine
Learning?
• Supervised Machine Learning is where you have input variables (x) and an output
variable (Y) and you use an algorithm to learn the mapping function from the input to
the output Y = f(X).
• The goal is to approximate the mapping function so well that when you have new input
data (x) you can predict the output variables (Y) for that data.
• Supervised learning problems can be further grouped
into Regression and Classification problems.
• Regression: Regression algorithms are used to predict a continuous numerical output.
For example, a regression algorithm could be used to predict the price of a house based
on its size, location, and other features.
• Classification: Classification algorithms are used to predict a categorical output. For
example, a classification algorithm could be used to predict whether an email is spam
or not
Classification Types
• There are two main classification types in machine learning:
• Binary Classification
• In binary classification, the goal is to classify the input into one of two
classes or categories. Example – On the basis of the given health conditions
of a person, we have to determine whether the person has a certain
disease or not.
• Multiclass Classification
• In multi-class classification, the goal is to classify the input into one of
several classes or categories. For Example – On the basis of data about
different species of flowers, we have to determine which specie our
observation belongs to.
Classification Types
• Binary Classification
• In binary classification, the goal is to classify the input into one of two
classes or categories. Example – On the basis of the given health
conditions of a person, we have to determine whether the person has
a certain disease or not.
• Multiclass Classification
• In multi-class classification, the goal is to classify the input into one of
several classes or categories. For Example – On the basis of data
about different species of flowers, we have to determine which specie
our observation belongs to.
• Multi-Label Classification
• In, Multi-label Classification the goal is to predict which of several
labels a new data point belongs to. This is different from multiclass
classification, where each data point can only belong to one class. For
example, a multi-label classification algorithm could be used to
classify images of animals as belonging to one or more of the
categories cat, dog, bird, or fish.
Classification Algorithms
• Linear Classifiers
• Linear models create a linear decision boundary between classes.
They are simple and computationally efficient. Some of the
linear classification models are as follows:
• Logistic Regression
• Support Vector Machines having kernel = ‘linear’
• Single-layer Perceptron
• Stochastic Gradient Descent (SGD) Classifier
Non-linear Classifiers
• Non-linear models create a non-linear decision boundary between classes. They can capture more
complex relationships between the input features and the target variable. Some of the non-
linear classification models are as follows:
• K-Nearest Neighbours
• Kernel SVM
• Naive Bayes
• Decision Tree Classification
• Ensemble learning classifiers:
• Random Forests,
• AdaBoost,
• Bagging Classifier,
• Voting Classifier,
• ExtraTrees Classifier
• Multi-layer Artificial Neural Networks
How does Classification Machine Learning
Work?
“What about classification
accuracy?”
the model is used for classification: the predictive accuracy
of the classifier is estimated.
If we were to use the training set to measure the classifier’s
accuracy, this estimate would likely be optimistic, because
the classifier tends to overfit the data (i.e., during learning
it may incorporate some particular anomalies of the training
data that are not present in the general data set overall).
Therefore, a test set is used, made up of test tuples and
their associated class labels.
They are independent of the training tuples, meaning that
they were not used to construct the classifier.
The accuracy of a classifier on a given test set is the
percentage of test set tuples that are correctly classified
by the classifier.
The associated class label of each test tuple is
compared with the learned classifier’s class prediction
for that tuple.
If the accuracy of the classifier is considered
acceptable, the classifier can be used to classify future
data tuples for which the class label is not known
For example, the classification rules learned in Figure
from the analysis of data from previous loan
applications can be used to approve or reject new or