KEMBAR78
Supervised Learning | PDF | Support Vector Machine | Accuracy And Precision
0% found this document useful (0 votes)
20 views30 pages

Supervised Learning

The document provides an overview of supervised learning, focusing on classification tasks in machine learning. It outlines key elements of a learning task, types of classification problems, and various classification algorithms such as k-NN, Naïve Bayes, SVM, Decision Trees, and Random Forests. Additionally, it discusses evaluation metrics for classification performance and applications and limitations of classification models.

Uploaded by

mwascoder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views30 pages

Supervised Learning

The document provides an overview of supervised learning, focusing on classification tasks in machine learning. It outlines key elements of a learning task, types of classification problems, and various classification algorithms such as k-NN, Naïve Bayes, SVM, Decision Trees, and Random Forests. Additionally, it discusses evaluation metrics for classification performance and applications and limitations of classification models.

Uploaded by

mwascoder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Murang’a University of Technology

Innovation for Prosperity


Lecture 2

Supervised Learning
- Classification
Elements a Learning Task
Together, these elements frame the scope, input, and evaluation of a machine
learning task. Three key elements that define a learning task in machine learning are:

1. Task (T)
• This defines what the machine learning model is expected to accomplish.
• Examples include classification, regression, clustering, or reinforcement learning
tasks.

2. Experience (E)
• This refers to the data or interaction the model uses to learn.
• For supervised learning, the experience involves labeled datasets with input-
output pairs. In unsupervised learning, the experience comes from unlabeled data
patterns. Reinforcement learning draws experience from agent-environment
interactions and feedback (rewards).

3
Elements a Learning Task
3. Performance Measure (P)
– This quantifies how well the model is achieving the task.
– Common metrics include accuracy, precision, recall, F1-score for
classification, mean squared error (MSE) for regression, or cumulative
reward for reinforcement learning.
– The performance measure evaluates the model's output on unseen
test data to ensure it generalizes well.

4
Introduction to Supervised Learning

• Supervised learning is a type of machine learning where the model


learns from labeled data.
• The goal is to map the input to the output and predict the labels of
unseen data accurately.
• Supervised Learning presents two types of problems; Classification
and Regression.
How It Works:
1. Input Data: Contains both features (independent variables) and labels
(dependent variables).
2. Learning Phase: The model identifies patterns in the data that map
inputs to outputs.
3. Prediction Phase: For new data, the model predicts the label using the
learned patterns.
5
Introduction to Classification
• Classification is a supervised learning task where the model assigns
a category or label to an input based on its features.
• It deals with discrete outputs, such as "yes" or "no," "cat" or "dog,"
or multiple classes like "setosa," "versicolor," and "virginica" in the
Iris dataset.

Key Terms in Classification:


• Classes: Categories or labels (e.g., spam/not spam).
• Features: Attributes used to classify the input (e.g., word
frequencies in an email).
• Decision Boundary: The boundary that separates different classes
in the feature space.

6
Types of Classification Tasks
1. Binary Classification:
• Two possible classes (e.g., spam vs. not spam).

2. Multiclass Classification:
• More than two classes (e.g., classifying images as "cat," "dog," or
"bird").

3. Multilabel Classification:
• Each instance can belong to multiple classes simultaneously (e.g.,
tagging a movie with genres like "action," "comedy," and "thriller").

7
Types of Classification Algorithms
1. k-Nearest Neighbors (k-NN)
• The k-NN algorithm is one of the simplest yet powerful classification algorithms. It
classifies data points based on the majority class among their nearest neighbors.

How It Works:
• Compute the distance (e.g., Euclidean, Manhattan) between the input data point
and all other points in the training set.
• Identify the k nearest neighbors to the data point.
• Assign the class that is most common among these k neighbors.
Strengths:
• Simple to implement and understand.
• Works well with small datasets and non-linear decision boundaries.
Weaknesses:
• Computationally expensive for large datasets.
• Sensitive to irrelevant or redundant features.

8
k-Nearest Neighbors (k-NN)

9
2. Naïve Bayes
• Naïve Bayes is based on Bayes' Theorem, which calculates the probability of a class
given a set of features.
• It assumes that the features are conditionally independent of each other, which
may not always be true in practice but works surprisingly well for many problems.
How It Works:
• For each class, calculate the likelihood of the data belonging to that class
based on the feature probabilities.
• Multiply these probabilities and apply Bayes’ Theorem to calculate the
posterior probability.
• Assign the class with the highest posterior probability.
Strengths:
• Extremely fast and efficient for high-dimensional data.
• Performs well on text classification problems.
Weaknesses:
• Assumes independence among features, which may not always hold true.
• Performs poorly if features are highly correlated.
10
Naïve Bayes

11
3. Support Vector Machines (SVM)

• SVM is a robust and versatile classification algorithm that works by finding the
hyperplane that best separates the data points of different classes.
• Depending on the type of data, there are two types of Support Vector
Machines:

Linear SVM or Simple SVM,


• is used for data that is linearly separable. A dataset is termed linearly
separable data if it can be classified into two classes using a single straight line.

Nonlinear SVM or Kernel SVM,


• is a type of SVM that is used to classify nonlinearly separated data, or data
that cannot be classified using a straight line. It has more flexibility for
nonlinear data because more features can be added to fit a hyperplane
instead of a two-dimensional space.

12
Support Vector Machines (SVM)
How SVM Works
• Separate Classes: SVM finds the best hyperplane that divides
data into distinct classes.
• Maximize Margin: It ensures the margin (distance) between the
hyperplane and the nearest data points (support vectors) is as
large as possible.
• Kernel Trick: For non-linear data, SVM transforms the data into a
higher-dimensional space using kernel functions to make it
separable.
• Support Vectors: The data points closest to the hyperplane are
called support vectors, which define the decision boundary.

13
Support Vector Machines (SVM)

14
4. Decision Trees
• Decision trees use a tree-like structure where internal nodes represent feature-
based decisions, and leaf nodes represent class labels.

How It Works:
• At each node, the algorithm selects the feature that best splits the data into
pure subsets (e.g., using metrics like Gini impurity or information gain).
• This process continues recursively until all data points are classified.
Strengths:
• Easy to interpret and visualize.
• Handles both numerical and categorical data.
Weaknesses:
• Prone to overfitting, especially for deep trees.
• Sensitive to small changes in data.

15
Decision Trees

16
5. Random Forests
• Random Forests address the limitations of decision trees by creating an ensemble
of trees and averaging their predictions.
How It Works:
• Generates multiple decision trees using bootstrap samples of the training
data.
• At each split, a random subset of features is considered to ensure diversity
among the trees.
• Final prediction is made by majority voting or averaging.
Strengths:
• Reduces overfitting compared to a single decision tree.
• Robust to noisy data and outliers.
Weaknesses:
• Can be computationally expensive for large datasets.
• Less interpretable than a single decision tree.

17
Random Forests

18
Evaluating Classification Performance

• Evaluating the performance of a classification model is crucial


for understanding how well it predicts the target variable.
• Evaluation metrics for classification tasks help us understand
how good machine learning models are by giving us valuable
information about different aspects of model performance.
• This information helps us choose, improve, and use these
models effectively.
• The following are some common metrics and techniques
used:
➢ Confusion Matrix, Accuracy, Recall, Precision and F1 Score

19
Confusion Matrix
• A confusion matrix is a table that summarizes the classification results
and indicates the number of true positive, true negative, false positive,
and false negative results.
• It provides a clear summary of predictions versus actual class labels, which
offers insights into the model’s accuracy and misclassifications.

20
Confusion Matrix

21
Accuracy Metric
• The accuracy score represents the percentage of correct predictions
in the overall test data.
• A high accuracy score indicates that the model is making a large
proportion of correct predictions, while a low accuracy score
indicates that the model is making too many incorrect predictions.

22
Recall Metric
• Recall provides the accuracy for individual classes.
• It is a crucial metric for evaluating model performance.
• Recall measures the proportion of true positives among all
actual positive instances.

23
Precision Metric
• Precision measures the proportion of true positives (correctly
classified positive cases) out of all cases classified as positive.
• Precision tells us how often the model’s positive predictions
are correct, highlighting the accuracy of its relevant
predictions.

24
F1-Score
• F1-score calculates the harmonic mean of recall and precision,
representing a balanced measure of model performance.
• F1-score will only be good, when your Recall and Precision value is
good.
• The F1 score combines precision and recall to produce a single
score that is the harmonic average of the two metrics.

25
Non-Parametric Models
• Non-parametric models do not make strong assumptions
about the data distribution and can have a flexible number of
parameters that can grow with the data.
• They are often more flexible but can be computationally more
expensive.
Examples: k-NN, Support Vector Machines (SVM), Decision
Trees, and Random Forests are non-parametric.
Strengths: Can capture complex relationships in data without
assuming a specific functional form.
Weaknesses: Require large amounts of data to generalize
effectively.

26
Non-Parametric Models
• Non-parametric methods make minimal assumptions about the
data compared to parametric methods. However, they still rely on
some key assumptions to function effectively.
• Here are three assumptions typically associated with non-
parametric methods:
– Independence: Data points are independent and not influenced
by others.
– Random Sampling: Data represents a random sample from the
population.
– Homogeneity of Measurement: Measurements are consistent
across all data points.

27
Applications of Classification
Classification models are widely used in diverse fields, offering solutions to
real-world problems:

i. Healthcare: Disease diagnosis (e.g., cancer detection using image


classification).
ii. Finance: Fraud detection in credit card transactions.
iii. Marketing: Customer segmentation (e.g., classifying customers based on
purchasing behavior).
iv. Natural Language Processing (NLP): Email spam detection.
v. Image Recognition: Object detection in autonomous vehicles.
vi. Cybersecurity: Intrusion detection in networks.
vii. Education: Plagiarism detection using text classification techniques.

28
Limitations of Classification
Data Dependency:
• Requires labeled data, which can be expensive and time-consuming to obtain.
Overfitting:
• Complex models may overfit the training data, leading to poor generalization.
Imbalanced Data:
• Models struggle when one class dominates the dataset (e.g., fraud detection).
Computational Cost:
• Some algorithms can be computationally expensive for large datasets.
Interpretability:
• Advanced models (e.g., Neural Networks) are "black boxes," making them
hard to explain.

29
Class Activity
1. Implement a Support Vector Machine (SVM) classifier on the
Iris dataset. Use a linear kernel and split the data into training
and testing sets with a test size of 0.2 and random state=42.
Calculate and print the accuracy of your model on the test set.
(7 Marks)
2. Using the breast cancer dataset from scikit-learn, implement a
binary classification model using any classifier covered in this
lecture. Print the following evaluation metrics for your model's
performance on the test set: Confusion Matrix, Accuracy,
Precision, Recall and F1-score. (13 Marks)

30

You might also like