KEMBAR78
DSC Module2 13.08.25 | PDF | Machine Learning | Statistical Classification
0% found this document useful (0 votes)
7 views38 pages

DSC Module2 13.08.25

The document provides an introduction to machine learning, detailing its definitions, types, and processes, including supervised, unsupervised, and reinforcement learning. It explains key concepts such as lazy learning with the K-Nearest Neighbor algorithm, data storage, abstraction, generalization, and evaluation. Additionally, it discusses practical applications of machine learning in various fields like healthcare, finance, and image recognition.

Uploaded by

farsankadavandy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views38 pages

DSC Module2 13.08.25

The document provides an introduction to machine learning, detailing its definitions, types, and processes, including supervised, unsupervised, and reinforcement learning. It explains key concepts such as lazy learning with the K-Nearest Neighbor algorithm, data storage, abstraction, generalization, and evaluation. Additionally, it discusses practical applications of machine learning in various fields like healthcare, finance, and image recognition.

Uploaded by

farsankadavandy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Module 2

Introduction to machine learning: How machines learn - Data storage, Abstraction,


Generalization, Evaluation, Machine learning in practice - Types of machine learning
algorithms.
Lazy learning: Classification using K-Nearest Neighbor algorithm - Measuring similarity with
distance, Choice of k, preparing data for use with k-NN.
Probabilistic learning: Understanding Naive Bayes - Conditional probability and Bayes
theorem, Naive Bayes algorithm for classification, The Laplace estimator, Using numeric
features with Naive Bayes.
Introduction to machine learning

AI is trained final output ML is a subset of AI. DL is a subset of ML.


machine which mimic like It is a technique to achieve It is a technique to achieve
human brain. AI. complex AI.
Eg: Amazon Alexa Eg: Spam detection Eg: Number plate detection
• Artificial intelligence (AI)is a technology which enables a machine to simulate
human intelligence (behaviour). The main applications of AI are online game
playing, intelligent humanoid robot, etc.
• Machine learning is a subset of AI which allows a machine to automatically learn
from past data without programming explicitly. The main applications of ML are
Google search algorithms, Facebook auto friend tagging suggestions, etc.
• Deep learning simulates the human brain, enabling systems that learn to identify
objects & perform complex tasks with increasing accuracy—all without human
intervention.
Introduction to Machine Learning
Machine learning is the subfield of computer science that gives the computers to ability
to learn without being explicitly programmed.

❖ ML focuses on the development of computer program that can access data & use it
learn for themselves.

1
❖ The process of learning begins with observations or data, such as examples, direct
experience or instructions in order to look for patterns in data & make better
decisions in the future based on the examples that we provide.
o Machine learning works on a simple concept.
❖ Understanding with experience
Examples
o Facebook: Continuously notices friends that you connect with people that you
visit your interest etc. on the basis of continuous learning, a list of Facebook
users are suggested that you can become friends with.
Tag friends: When you upload a picture of you with a friend, Facebook instantly
recognizes that friend. This is possible with the help of machine learning.
❖ Advertisement Recommendation
When you shop any product online, after some days you keep receiving
notifications for shopping suggestions.
The shopping website or the app recommends you some items that someone
matches with your interest.
This is possible with the help of machine learning on the basis of your behavior
with the website/ app past purchases, items liked or added to cart the
product recommendations are made.
The primary aim of ML is to allow computers to learn automatically without
any human interventions.
How do machines learn?
A formal definition of ML proposed by computer scientist Tom M Mitchell states that:
“A machine learns whenever it is able to utilize its “an experience” such that its
performance improves on similar experiences in the future”.
Human brains are naturally capable of learning from birth, the conditions necessary
for computers to learn must be made explicit. Whether the learner is a human or
machine, the basic learning process is similar.
It can be divided into 4 interrelated components (learning process steps):
1. Data storage
2. Abstraction
3. Generalization
4. Evaluation

2
1. Data Storage
Facilities for storing and retrieving huge amounts of data are an important component
of the learning process. Humans and computers alike utilize data storage as a foundation
for advanced reasoning.
• In a human being, the data is stored in the brain & data is retrieved using
electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar
devices to store data and use cables and other technology to retrieve data.
2. Abstraction
• The second component of the learning process is known as abstraction.
• Abstraction is the process of extracting knowledge about stored data. This involves
creating general concepts about the data as a whole. The creation of knowledge
involves application of known models & creation of new models.The process of fitting
a model to a dataset is known as training.
• When the model has been trained, the data is transformed in to an abstract form that
summarizes the original information.
Observations→ data → Model

3. Generalization
The 3rdcomponent of the learning process is known as generalization.
The term generalization describes the process of turning the knowledge about stored
data into a form that can be utilized for future action. These actions are to be carried
out on tasks that are similar, but not identical, to those what have been seen before.
In generalization, the goal is to discover those properties of the data that will be most
relevant to future tasks

For example, suppose that a ML algorithm learned to identify faces by finding 2 dark
circles representing eyes, positioned above a straight line indicating a mouth.
4. Evaluation
• Provides a feedback mechanism to measure the utility of learned knowledge &
inform potential improvements.
• To evaluate or measure the learner’s success, use this information to inform
additional training if needed.

3
• Models fail to perfectly generalize due to the problem of noise, a term that describes
unexplained or unexplainable variations in data.
• Noisy data is caused by seemingly random events, such that:
▪ Error due to inaccurate sensors
▪ Issues with human subjects
▪ Data quality problems include missing, null, truncated, incorrectly coded, or
corrupted values.
Machine Learning in Practice
5 step process
1. Data Collection
• The data collection step involves gathering the learning material an algorithm will
use to generate actionable knowledge. In most cases, the data will need to be
combined into a single source like a text file, spreadsheet or database.
2. Data Exploration & Preparation
• Checking the quality of data & preparing for the learning process.
3. Model training
• The selection of an appropriate algorithm, & the algorithm will represent the data in
the form of a model.
4. Model Evaluation
• To evaluate the accuracy of the model using a test dataset.
5. Model Improvement
• If better performance is needed, use advanced strategies to improve the performance
of the model.
• Switch to different model.

❖ Image Recognition
For face detection- The categories might be face versus no face present. There
might be a separate category for each person in a database of several individuals.
For character recognition- We can segment a piece of writing into smaller images,
each containing a single character. The categories might consist of the 26 letters of
the English alphabet, the 10 digits, & some special characters.
4
❖ Speech Recognition
Speech recognition is the translation of spoken words into text.
❖ Banking and financial services
ML can be used to predict the customers who are likely to default from paying
loans or credit card bills. This is of supreme importance as ML would help the
banks identify the customers who can be granted loans and credit cards.
❖ Healthcare
It is used to diagnose deadly diseases (eg: Cancer) based on the symptoms of
patients and tallying them with the past data of similar kind of patients.
TYPES OF MACHINE LEARNING

1. SUPERVISED LEARNING(PREDICTIVE MODELS)


Supervised learning is the machine learning task of learning a function that maps an
input to an output based on example input-output pairs. It infers a function from labeled
training data consisting of a set of training examples.
In supervised learning, each example is a pair consisting of an input object (typically a
vector) and a desired output value (also called the supervisory signal).

5
• Labeled data

Classification Regression
• Features & discrete labels • Features and continues real values
• Maps an input to discrete label (class) • Predict a real value for an input
Eg: spam or not, type of cancer Eg: temperature
Supervised learning is a ML method in which models are trained using labeled data.
SL needs supervision to train the model, which is similar to as a student learns things
in the presence of a teacher.
SL is commonly used in real world applications.
Eg: face & speech recognition, products or movie recommendations, sales forecasting etc.
In SL, learning data comes with description, labels, targets or desired outputs & the
objective is to find a general rule that maps inputs to outputs. This kind of learning
data is called labeled data.
The learned rule is then used to label new data with unknown outputs.
SL involves building a ML model that is based on labeled samples.
SL deals with learning a function from available training data. Here, a learning
algorithm analyses the training data &produces a derived function that can be used
for mapping new examples.
Eg: Logistic Regression, Neural Networks, Support Vector Machine (SVM), Naïve Bayes
Classifiers etc.
2. UNSUPERVISED LEARNING
Unsupervised learning is a type of machine learning algorithm used to draw inferences
from datasets consisting of input data without labeled responses.
The most common unsupervised learning method is cluster analysis, which is used
for exploratory data analysis to find hidden patterns or grouping in data.

Raw data Unsupervised learning algorithm Clusters

6
Goal: Construct an analyzer to find the hidden relationship between inputs

Use: Group or associate inputs according to their similarity.

3. REINFORCEMENT LEARNING

• Use software agents


• Based on rewards
• Objective is to maximize rewards for better learning
Reinforcement learning is the problem of getting an agent to act in the world so as to
maximize its rewards.
Lazy learning
(or instance-based learning or memory-based learning)
Lazy learning is a machine learning approach where the algorithm does not build a general
model during training. Instead, it stores the training data and delays most of the
processing until it receives a query (new test data) for prediction.
• In lazy learning, no explicit abstraction or model is created after training.
• When a new data point needs to be predicted, the algorithm searches the stored
training data for the most relevant examples.
• Because the learning step is minimal, predictions can be slower, especially for large
datasets.
• The approach is flexible, as it can adapt to new data without retraining.
• However, it requires more memory and computation at prediction time.
Eg: K-Nearest Neighbors (K-NN) is a classic lazy learner:
• It stores all training examples.

7
• When a new point is given, it calculates the distance to all stored points, finds the k
closest neighbors, and predicts the output based on them.
For instance, in a fruit classification task, K-NN compares a new fruit’s weight
and color with stored fruits and classifies it (e.g., apple, orange) based on the
closest matches.
Nearest Neighbor Classification
• Things that are alike are likely to have properties that are alike.
• Machine learning uses this principle to classify data by placing it in the same category
as similar or "nearest" neighbors.
• Classifying unlabeled examples by assigning them the class of similar labeled
examples.
• Nearest neighbor methods are extremely powerful.
• They have been used successfully for
Computer vision applications, including optical character recognition & facial
recognition in both still images & video.
Predicting whether a person will enjoy a movie or music recommendation.
Identifying patterns in genetic data, perhaps to use them in detecting specific
proteins or diseases.
o In general, nearest neighbor classifiers are well-suited for classification tasks, where
relationships among the features and the target classes are numerous, complicated,
or extremely difficult to understand, yet the items of similar class types tend to be
fairly homogeneous.
• If a concept is difficult to define, but you know it when you see it, then nearest
neighbors might be appropriate.
• If the data is noisy and thus no clear distinction exists among the groups, the nearest
neighbor algorithms may struggle to identify the class boundaries.
K-NN Algorithm
• K Nearest Neighbor is a simple algorithm that stores all the available cases and
classifies the new data or case based on a similarity measure.
• It is mostly used to classify a data point based on how its neighbors are classified.
Strengths Weaknesses
o Simple & effective o Does not produce a model, limiting the
ability to understand how the features are
o Makes no assumptions about related to the class
the underlying data o Requires selection of an appropriate k
distribution o Slow classification phase
o Fast training phase o Nominal features & missing data require
additional processing.

8
k-NN algorithm
Step 1: Define the value of k (a positive integer).
Step 2: Compute the distances between the test instance and the various
training instances.
Euclidean distance formula, 𝒅 = √(𝒙 − 𝒙𝟏)𝟐 + (𝒚 − 𝒚𝟏)𝟐
Step 3: Sort the distances in ascending order.
Step 4: Choose the k training instances which are nearest to the test instance
(k nearest neighbours).
Step 5: Among the class labels of the k nearest neighbours, choose the class label
which occurs most frequently.
Step 6: Assign the chosen class label to the test instance.

Eg: Food ingredients have 2 features, namely, sweetness & crunchiness. They are
measured in a scale of 1 to 10. The ingredients are of 3 types or classes, namely, “fruit",
“vegetable", and “protein". We have the data given in Table5.1 regarding some known
ingredients:

Use k-NN algorithm to determine the food type of tomato with sweetness = 6 &
crunchiness = 4.
Solution
Step 1: We choose k = 3.
Step 2: The feature vector of the test instance is (6, 4). We calculate the distances of
the test instance from the training instances.

9
Step 3: We rank the distances in ascending order.

Step 4: We now choose the 3 nearest neighbours:

Step 5: Among the 3 nearest neighbours, there are 2 ingredients of type “protein" & one
of type “vegetable". Hence the majority of the food ingredients are of type “protein".
Step 6: We assign the type “protein" to "tomato".
Choosing an appropriate k in k-NN
In the k-Nearest Neighbors (K-NN) algorithm, k is the number of nearest neighbors
considered when making a prediction.
The choice of k greatly affects the result:
1. Small k → The model becomes sensitive to noise (overfitting).
2. Large k → The model becomes too smooth and may ignore local patterns
(underfitting).
Usually, odd values of k are preferred for classification (to avoid ties).
3. A common approach is to choose k using cross-validation to find the value that gives the best accuracy
on unseen data.
Eg:

A test instance (solid triangle) is surrounded by training instances of 3 types:


• Solid-circle • Solid-square • Solid-star
1. k = 7 (solid line circle):
o Among the 7 nearest neighbors: 3 solid circles, 2 solid squares, 2 solid stars.
o Majority = solid circle → Predicted class = Solid circle
2. k = 12 (dashed line circle):
o Among the 12 nearest neighbors: 4 solid circles, 5 solid squares, 2 solid stars.
o Majority = solid square → Predicted class = Solid square

10
Choosing k in k-NN
There is no single fixed rule for choosing the best value of k in the k-Nearest Neighbors
algorithm:
1. k = Total number of observations
• If k equals the number of training samples, the concept of "nearest neighbors"
becomes meaningless because all distances are considered.
• The algorithm will always predict the majority class.
2. k = 1
• The prediction is highly sensitive to noise and outliers.
• Example: If a training sample is wrongly labeled, any test point closest to it will
also be classified incorrectly — even if most other neighbors belong to a different
class.
• This leads to overfitting:
o Training error = 0 (perfect on training data)
o Test error = High.
3. k = SQRT(number of observations)
• A good choice when the dataset is small.
• Helps balance bias and variance.
Effect of k on performance
• Low k values → High variance, overfitting, low train error, high test error.
• High k values → More bias, smoother decision boundaries, reduced test error (to
a point).
• Ideal k is a trade-off between bias & variance.
Preparing data for use with k-NN
Since the k-Nearest Neighbors (k-NN) algorithm depends on distances between points,
features with large numeric ranges can dominate the calculation.
Therefore, rescaling (feature scaling) is important before applying k-NN.
Two common methods are:
1. Min-Max Normalization (Feature Scaling)
• Rescales feature values to a standard range, usually 0 to 1.
𝑋 − min⁡(𝑋)
𝑋𝑛𝑒𝑤 =
max(𝑋) − min⁡(𝑋)
Where:
• Xnew – Scaled value
• X- Original value
• min(X) - Minimum value in the dataset for that specific feature
• max(X) - Maximum value in the dataset for that specific feature
Eg: If height ranges from 150 cm to 200 cm, a height of 175 cm would be scaled as:
𝑋 − min⁡(𝑋) 175 − 150⁡ 25
𝑋𝑛𝑒𝑤 = =⁡ =⁡ = 𝟎. 𝟓
max(𝑋) − min⁡(𝑋) 200 − 150 50

11
2. z-score standardization (Standard Scaling)
Transforms values so that they are expressed in terms of how many standard deviations
they are from the mean.
X − μ X − Mean(X)
X new= =
σ StdDev(X)
where,
Xnew – Standardized value
X- Original value
µ - Mean (average) of the dataset
σ – Standard deviation of the dataset
Eg: If the mean height is 170 cm and standard deviation is 10 cm,
a height of 175 cm would have a z-score:
X−μ X−Mean(X) 175−170⁡ 5
X new= = = = ⁡ ⁡ = 0.5
σ StdDev(X) 10 10
Min-Max Vs Z-score
• Min-max normalization keeps all values within a fixed range (0–1).
• Z-score standardization produces values in an unbounded range (can be negative or
positive).
Whichever scaling method is used on the training data must also be applied to the test
data before prediction
Problem 1
We have the following data from a questionnaire regarding goodness or badness of tissue
papers and data regarding two attributes, namely, acid durability and strength, of the tissue
papers.

Now, the factory produces a new tissue paper that yield X1 = 3 and X2 = 7. Can we guess the
classification of the new tissue paper?

12
Solution
Step 1: We choose k = 3.
Step 2: The feature vector of the test instance is (3, 7). We calculate the distances of the
test instance from the training instances.
The feature vector of Sample 1 is (7, 7). The distance between Sample 1 and new tissue paper
is calculated as follows:

The various distances are given in the following table

Step 3. Sort the distances in ascending order

Step 4. Among the 3-nearest neighbours, the class label which occurs most
frequently is “Good”.
Step 5. We assign the class label Good to the test instance.
Problem 2
Food ingredients have two features, namely, sweetness and crunchiness. They are measured
in a scale of 1 to 10. We have the data given in Table 5.1 regarding some known ingredients:

Use k-NN algorithm to determine the food type of tomato with sweetness = 6 and crunchiness
= 4.
Solution
Step 1. We choose k = 3.
Step 2. The feature vector of the test instance is (6, 4). We calculate the distances of the test
instance from the training instances.
13
Step 3. We rank the distances in ascending order

Step 4. We now choose the three nearest neighbours

Step 5. Among the 3 nearest neighbours, there are two ingredients of type “protein” and
one of type “vegetable”. Hence the majority of the food ingredients are of type “protein”.
Step 6. We assign the type “protein” to “tomato”.

Applications of KNN
• Banking System - KNN can be used in banking system to predict weather an
individual is fit for loan approval? Does that individual have the characteristics
similar to the defaulters’ one?
• Calculating Credit Ratings - KNN algorithms can be used to find an individual’s
credit rating by comparing with the persons having similar traits.
• Politics - With the help of KNN algorithms, we can classify a potential voter into
various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party ‘Congress’,
“Will Vote to Party ‘BJP’.
• Other areas in which KNN algorithm can be used are Speech Recognition,
Handwriting Detection, image Recognition & Video Recognition.

14
Tutorial
1. We have the following data from a questionnaire regarding goodness or badness of tissue
papers and data regarding two attributes, namely, acid durability and strength, of the
tissue papers.

Now, the factory produces a new tissue paper that yield X1 = 3 & X2 = 7. Can we guess
the classification of the new tissue paper?
Soln
Features: X1, X2
Class : bad, good
Test instance: X1= 3, X2= 7
2. Based on a survey conducted in an institution, students are classified based on the two
attributes of academic excellence (X) and other Activities (Y). Given the following data,
identify the classification of a student with X = 5 and Y = 7 using the k-NN algorithm
(choose k as 3).
X (Academic Excellence) Y (Other Activities) Z(Classification)
8 6 Outstanding
5 6 Good
7 3 Good
6 9 Outstanding
Soln
Features: X, Y
Class (Z): Outstanding, good
Test instance: X=5, Y=7
k=3
3. Consider the dataset given below. Using k-NN algorithm, predict the class label for the new instance
with height=172 cm and weight =57 kg.
Choose k=1 and k=3

15
Soln
Features: height, weight
Class : underweight, normal
Test instance: height=172 cm, weight =57 kg
k =1 & k = 3
4. Given the following dataset. Identify the T-Shirt Size of Tom having height 161 cm and weight 61kg
using k-NN algorithm. (Choose k as 3)

Soln
Features: height, weight
Class (T-shirt) : medium, large
Test instance: height= 161 cm, 61kg
k=3
5. Consider the dataset given below. Using k-NN algorithm, predict the class label for the new instance
with brightness=20 and saturation =35. Choose k=1 and k=3.

Soln
16
Features: brightness, saturation
Class : red, blue
Test instance: brightness = 20, saturation = 35
k =1 & k = 3
6. Given the following dataset. Identify the T-Shirt Size of Tom having height 161 cm and weight 61kg
using k-NN algorithm. (Choose k as 3)

Soln
Features: height, weight
Class (T-shirt) : medium, large
Test instance: height = 161 cm, weight : 61 kg
K=3
7. With the given data, Use k-NN algorithm to determine the Target attribute for a new instance with X =
5 and Y =3. (Choose k as 3)

Soln
Features: X, Y
Class (Target): class1, class2
Test instance: X=5, Y = 3
k=3

17
Probabilistic learning: Understanding Naive Bayes - Conditional probability and
Bayes theorem, Naive Bayes algorithm for classification, The Laplace estimator, Using
numeric features with Naive Bayes.
Understanding Naïve Bayes
The word “naive” in the “Naive Bayes Algorithm” means simple, unsophisticated, or
primitive.
The word “Bayes” refers to Thomas Bayes (1701–1761), an English statistician who
formulated a special case of what is now known as Bayes’ theorem. This theorem forms the
foundation of the Naïve Bayes algorithm.
• Probability is a value between 0 and 1 that indicates the likelihood of an event
occurring based on the available evidence.
• The lower the probability, the less likely the event is to occur.
• A probability of 0 means the event will definitely not occur, while a probability of 1
means the event will occur with complete certainty.
• Classifiers based on Bayesian methods use training data to calculate the observed
probability of each outcome based on the evidence provided by the feature values.
• When the classifier is applied to unlabeled data, it uses these learned probabilities to
predict the most likely class for the new features.
• Although the method is conceptually simple, it often achieves results comparable to more
sophisticated algorithms.
Bayesian classifiers have been effectively used in:
• Text classification – e.g., spam filtering in e-mail systems
• Anomaly detection – identifying unusual patterns in computer networks
• Medical diagnosis – estimating the probability of a disease based on observed
symptoms
When to Use Bayesian Classifiers
Bayesian classifiers are particularly effective for problems where information from many
attributes must be considered together to estimate the overall probability of an outcome.
Unlike some algorithms that ignore features with weak individual effects, Bayesian methods
use all available evidence, allowing even subtle contributions to influence the prediction.
When a large number of features each have relatively small effects, their combined influence
can be substantial, resulting in more accurate and reliable predictions
Basic Concepts of Bayesian Methods
• Bayesian probability theory is based on the idea that the estimated likelihood of an
event (or a potential outcome) should be determined from the evidence at hand,
considering multiple trials or opportunities for the event to occur.
• Bayesian methods provide insights into how the probability of these events can be
estimated from observed data.
Event – A subset of outcomes from the sample space; a set of outcomes of an
experiment to which a probability is assigned.
Trial – A single opportunity for the event to occur.

18
EVENT TRIAL
(Possible outcomes) (Single opportunity for the event to occur)
Heads result Coin flip
Rainy weather A single day
Message is spam Incoming e-mail message
Candidate becomes president Presidential election
Win the lottery Lottery ticket
Understanding probability
The probability of an event is calculated as:
𝐍𝐮𝐦𝐛𝐞𝐫⁡𝐨𝐟⁡𝐭𝐢𝐦𝐞𝐬⁡𝐭𝐡𝐞⁡𝐞𝐯𝐞𝐧𝐭⁡𝐨𝐜𝐜𝐮𝐫𝐞𝐝
𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲⁡𝐨𝐟⁡𝐚𝐧⁡𝐞𝐯𝐞𝐧𝐭, 𝐏(𝐀) =
𝐓𝐨𝐭𝐚𝐥⁡𝐧𝐮𝐦𝐛𝐞𝐫⁡𝐨𝐟⁡𝐭𝐫𝐢𝐚𝐥𝐬
Where:
P(A) – Probability of event A
Number of times the event occurred – Count of successful outcomes
Total number of trials – Number of attempts or observations
Eg:
(a) If it rained 3 out of 10 days with similar conditions as today:
𝟑
Probability of rain , P(Rain) = = 𝟎. 𝟑𝟎⁡𝒐𝒓⁡𝟑𝟎%
𝟏𝟎
(b) If 10 out of 50 prior email messages were spam,
𝟏𝟎
Probability of spam, P(Spam) = 𝟎. 𝟐𝟎⁡𝒐𝒓⁡𝟐𝟎%
𝟓𝟎⁡
We write probabilities as P(A), which denotes the probability of event A occurring.
Sum of Probabilities
The probability of all possible outcomes of a trial must sum to 1.
For example,
If P(spam)=0.20 or 20%
Then P(ham) = 1 – 0.20 = 0.80
Here, spam and ham are mutually exclusive (cannot occur together) and exhaustive
(cover all possible outcomes).
➢ This concludes that spam & ham are mutually exclusive & exhaustive events
which implies that they cannot occur at the same time & are the only possible
outcomes. Because an event cannot simultaneously happen & not happen, an
event is always mutually exclusive & exhaustive with its complement
➢ The complement of event A is typically denoted Ac or A'.
➢ The shorthand notation P(¬A) or P(Ac) can used to denote the probability of event A
not occurring, as in P(¬spam) = 0.80
The rectangle represents the possible outcomes for an e-mail message.
o The circle represents the 20 % probability that the message is spam.
o The remaining 80% represents the complement P(¬spam) or the messages that are
not spam:
19
Understanding Joint Probability
o A second event based on the outcome that an e-mail message contains the word
Viagra.
o In most cases, this word is likely to appear only in a spam message; its presence
in an incoming e-mail is therefore a very strong piece of evidence that the message
is spam.

• We know that 20 % of all messages were spam (the left circle) and 5% of all messages
contained the word Viagra (the right circle).

• We would like to quantify the degree of overlap between these 2 proportions.


• In other words, we hope to estimate the probability that both P(spam) & P(Viagra)
occur, which can be written as P (spam ∩ Viagra). Calculating P(spam ∩ Viagra)
depends on the joint probability of the 2 events
• If the 2 events are totally unrelated, they are called independent events. For
independent events A & B, the probability of both happening can be expressed as
P(A ∩ B) = P(A) * P(B). But here we know that P(spam) & P(Viagra) are likely to be
highly dependent, which means that this calculation is incorrect.
Conditional probability
• Without knowledge of an incoming message's content, the best estimate of its spam
status would be P(spam), the probability that any prior message was spam, which we
calculated previously to be 20 percent. This estimate is known as the prior probability.
• Suppose that you obtained additional evidence by looking more carefully at the set
of previously received messages to examine the frequency that the term Viagra
appeared. The probability that the word Viagra was used in previous spam
messages, or P(Viagra|spam), is called the likelihood.
20
• The probability that Viagra appeared in any message at all, or P(Viagra), is known
as the marginal likelihood.
• By applying Bayes' theorem to this evidence, we can compute a posterior probability
that measures how likely the message is to be spam. If the posterior probability is
greater than 50 percent, the message is more likely to be spam than ham and it
should perhaps be filtered.
The probability of the occurrence of an event ‘A’ given that an event ‘B’ has already
occurred is called the conditional probability of A given Band is denoted by P(A|B). We
have
𝐏(𝐀 ∩ 𝐁)
𝐏(𝐀|𝐁) = ⁡ if P(B)≠0
𝐏(𝐁)
Bayes theorem
The posterior probability P(A∣B), often called the conditional probability of A given B, is
determined using Bayes' Theorem. It describes the probability of event A happening,
given that event B has already occurred.
Bayes' Theorem is:
Let A & B be two events in a probability and P(B) ≠ 0 (i.e. event B has a non-zero
probability) then the conditional probability of A given B is:
𝑷(𝑩|𝑨). 𝑷(𝑨)
𝑷(𝑨|𝑩) = ⁡
𝑷(𝑩)
Conditional probability,
𝐏(𝐀 ∩ 𝐁) 𝐏(𝐁 ∩ 𝐀)
𝐏(𝐀|𝐁) = 𝐏(𝐁|𝐀) =
𝐏(𝐁) 𝐏(𝐀)
𝐏(𝐀 ∩ 𝐁) = 𝐏(𝐁 ∩ 𝐀)
𝑷(𝑨|𝑩) ∗ 𝑷(𝑩) = 𝑷(𝑩|𝑨) ∗ 𝑷(𝑨)
𝑷(𝑩|𝑨) ∗ 𝑷(𝑨)
𝑷(𝑨|𝑩) =
𝑷(𝑩)
Where:
• P(A∣B) is the posterior probability.
• P(B∣A) is the likelihood (probability of B occurring given that A is true).
• P(A) is the prior probability (initial belief about the probability of A).
• P(B) is the marginal likelihood (total probability of B).
Generally,
𝐥𝐢𝐤𝐞𝐥𝐲𝐡𝐨𝐨𝐝 ∗ 𝐏𝐫𝐢𝐨𝐫⁡𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲
𝐏𝐨𝐬𝐭𝐞𝐫𝐢𝐨𝐫⁡𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 = ⁡
𝐦𝐚𝐫𝐠𝐢𝐧𝐚𝐥
𝑷(𝑩|𝑨) ∗ 𝑷(𝑨)
𝑷(𝑨|𝑩) = ⁡
𝑷(𝑩)
𝐏(𝐕𝐢𝐚𝐠𝐫𝐚|𝐬𝐩𝐚𝐦) ∗ 𝐏(𝐬𝐩𝐚𝐦)
𝐏(𝐬𝐩𝐚𝐦|𝐕𝐢𝐚𝐠𝐫𝐚) =
𝐏(𝐕𝐢𝐚𝐠𝐫𝐚)
21
• If the posterior probability is greater than 50 percent, the message is more likely to
be spam than ham and it should perhaps be filtered.
To calculate these components of Bayes' theorem, construct a
• Frequency table: that records the number of times Viagra appeared in spam & ham
messages.
• Likelihood table: The rows of the likelihood table indicate the conditional
probabilities for Viagra (yes/no), given that an e-mail was either spam or ham
Frequency Table Likelihood Table
Viagra Viagra
YES NO TOTAL YES NO TOTAL
SPAM 4 16 20 SPAM 4/20 16/20 20
HAM 1 79 80 HAM 1/80 79/80 80
TOTAL 5 95 100 TOTAL 5/100 95/100 100
𝑷(𝑩|𝑨). 𝑷(𝑨)
𝑷𝒐𝒔𝒕𝒆𝒓𝒊𝒐𝒓⁡𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚, 𝑷(𝑨|𝑩) = ⁡
𝑷(𝑩)
𝐏(𝐕𝐢𝐚𝐠𝐫𝐚|𝐬𝐩𝐚𝐦) ∗ 𝐏(𝐬𝐩𝐚𝐦)
⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡𝐏(𝐬𝐩𝐚𝐦|𝐕𝐢𝐚𝐠𝐫𝐚) =
𝐏(𝐕𝐢𝐚𝐠𝐫𝐚)
𝟒 𝟐𝟎
(𝟐𝟎) ∗ (𝟏𝟎𝟎)
= 𝟓
= 𝟎. 𝟖𝟎
(𝟏𝟎𝟎)
Classification with Naive Bayes
• Our spam filter is extended by adding a few additional terms to be monitored in addition
to the term Viagra.
• Money, Groceries, and Unsubscribe.
• The Naive Bayes learner is trained by constructing a likelihood table for the
appearance of these 4 words (labeled W1, W2, W3, and W4), for 100 e-mails.
Likelihoo Viagra(W1) Money(W2) Groceries(W3) Unsubscribe(W4 Tota
d ) l
Yes No Yes No Yes No Yes No
Spam 4/20 16/20 10/20 10/20 0/20 20/20 12/20 8/20 20
Ham 1/80 79/80 14/80 66/80 8/80 72/80 23/80 57/80 80
Total 5/10 95/10 24/10 76/10 8/10 92/10 35/100 65/10 100
0 0 0 0 0 0 0
• As new messages are received, we need to calculate the posterior probability to
determine whether they are more likely to be spam or ham, given the likelihood of the
words found in the message text.
• We expect that the message is spam with an 85.7 percent probability and ham with a
14.3 percent probability.
• Because these are mutually exclusive and exhaustive events, the probabilities sum
to 1

22
Naïve Bayes Algorithm
Input:
• Training dataset with features X = (x1, x2, …, xn)
• Class labels C = {c1, c2, …, cp}
• Test instance Xtest
Output:
Predicted class label for Xtest
Step 1: Compute the Prior probability of each class P (ck)

for k=1, 2, …, p

Step 2: Compute Conditional Probabilities (Likelihoods)


For each feature value (xi) given class (ck):
𝐍𝐮𝐦𝐛𝐞𝐫⁡𝐨𝐟⁡𝐢𝐧𝐬𝐭𝐚𝐧𝐜𝐞𝐬⁡𝐰𝐡𝐞𝐫𝐞⁡𝐟𝐞𝐚𝐭𝐮𝐫𝐞⁡𝐱 𝐢 ⁡𝐨𝐜𝐜𝐮𝐫𝐬⁡𝐢𝐧⁡𝐜𝐥𝐚𝐬𝐬𝐜𝐤 ⁡
𝐏(𝐱 𝐢 |𝐜𝐤 ) =
𝐓𝐨𝐭𝐚𝐥⁡𝐧𝐮𝐦𝐛𝐞𝐫⁡𝐨𝐟⁡𝐢𝐧𝐬𝐭𝐚𝐧𝐜𝐞𝐬⁡𝐢𝐧⁡𝐜𝐥𝐚𝐬𝐬⁡𝐜𝐤
for i=1, 2, 3, …., n and for k=1, 2, …, p

Step 3: Compute the posterior probability for each class


For the test instance Xtest = (x1, x2, …, xn):

𝒒𝒌⁡ = 𝑷⁡(𝒄𝒌 ) ∗ ⁡𝑷(𝒙𝟏 ⁡|⁡𝒄𝒌 ). 𝑷(𝒙𝟐 ⁡|⁡𝒄𝒌 ) … 𝑷(𝒙𝒏 |𝒄𝒌 )


for each class k=1, 2, …, p

Step 4: Select the class with maximum posterior probability,


qj=max {q1, q2, …, qp}

Step 5: Assign the test instance X to class label cj

23
Problem1: Consider a training data set consisting of the fauna of the world. Each unit has
3 features named “Swim”, “Fly” and “Crawl”. Let the possible values of these features be
as follows:
Swim - Fast, Slow, No
Fly - Long, Short, Rarely, No
Crawl - Yes, No
For simplicity, each unit is classified as “Animal”, “Bird” or “Fish”. Let the training data
set be as in Table 1. Use naive Bayes algorithm to classify a particular species if its
features are (Slow, Rarely, No)?

Solution

x1 = Slow, x2 = Rarely, x3 = No
We construct the frequency table shown in Table 2 which summarizes the data. (It may
be noted that the construction of the frequency table is not part of the algorithm.)
Features
Class Swim (F1) Fly (F2) Crawl (F3) Total
fast slow no Long short rarely no yes no
Animal(c1) 2 2 1 0 0 1 4 2 3 5
Bird(c2) 1 0 3 1 2 0 1 0 4 4
Fish (c3) 1 2 0 0 0 0 3 0 3 3
Total 4 4 4 1 2 1 8 2 10 12

Step 1: We compute following Prior probabilities.


No. of⁡records⁡with⁡class⁡label⁡Animal 𝟓
𝐏(𝐜𝟏) = 𝐏(𝐀𝐧𝐢𝐦𝐚𝐥) = ⁡ =
Total⁡no: of⁡examples 𝟏𝟐
No. of⁡records⁡with⁡class⁡label⁡Bird 𝟒
𝐏(𝐜𝟐) = 𝐏(𝐁𝐢𝐫𝐝) = ⁡ =
Total⁡no: of⁡examples 𝟏𝟐
No. of⁡records⁡with⁡class⁡label⁡⁡Fish 𝟑
𝐏(𝐜𝟑) = 𝐏(𝐅𝐢𝐬𝐡) = ⁡ =
Total⁡no: of⁡examples 𝟏𝟐

Step 2: We construct conditional probabilities table


P(f1|ck), P(f2|ck), …, P(fn|ck)
For all values of f1, f2, …, fn and for k=1, 2, …, p
24
Features
class Swim (f1) Fly (f2) Crawl (f3) Total
fast slow no long short Rarely no yes No
Animal(c1) 2/5 2/5 1/5 0/5 0/5 1/5 4/5 2/5 3/5 5
Bird(c2) 1/4 0/4 3/4 1/4 2/4 0/4 1/4 0/4 4/4 4
Fish (c3) 1/3 2/3 0/3 0/3 0/3 0/3 3/3 0/3 3/3 3
Total 4 4 4 1 2 1 8 2 10 12

Table 3: Table of the conditional probabilities P(fi|ck)

Swim(f1) Fly(f2) Crawl f3)


Class Slow Rarely No
Animal(c1) P(slow|Animal)=2/5 P(rarely|Animal)=1/5 P(no| Animal)=3/5
Bird(c2) P(slow| Bird)=0/4 P(rarely| Bird)=0/4 P(no| Bird)=4/4
Fish (c3) P(slow| Fish)=2/3 P(rarely| Fish)=0/3 P(no| fish)=3/3

Step 3: We can now calculate the following numbers:


qk = P(x1|ck). P(x2|ck). … P(xn|ck) *P(ck)
3 class levels, k=1, 2, 3
X= (slow, rarely, no)
q1 = P(x1|c1). P(x2|c1). P(x3|c1)* P(c1)
= P(slow | animal).P(rarely| animal).P(No| animal)*P(animal)
𝟐 𝟏 𝟑 𝟓
= ⁡ 𝟓 ∗ 𝟓 ∗ 𝟓 ∗ 𝟏𝟐 = 𝟎. 𝟎𝟐
q2 = P(x1|c2). P(x2|c2). P(x3|c2)* P(c2)
= P(slow | fish).P(rarely| fish).P(No| fish)*P(fish)
𝟐 𝟎 𝟑 𝟑
= ⁡ 𝟑 ∗ 𝟑 ∗ 𝟑 ∗ 𝟏𝟐 = 𝟎
q3 = P(x1|c3). P(x2|c3). P(x3|c3) *P(c3)
= P(slow | Bird).P(rarely| Bird).P(No| Bird)*P(Bird)
𝟎 𝟎 𝟒 𝟒
= ⁡ 𝟒 ∗ 𝟒 ∗ 𝟒 ∗ 𝟏𝟐 = 𝟎
Step 4: Now
Max{q1, q2, q3} =max{0.02, 0, 0} = 0.02
Step 5: Highest probability is animal
P(animal| x) = 0.02
So we assign the class label “Animal” to the test instance (Slow, Rarely, No)
Classification with Naive Bayes
o Naive Bayes algorithm constructs tables of probabilities that are used to estimate the
likelihood that new examples belong to various classes.
o The probabilities are calculated using a formula known as Bayes' theorem, which
specifies how dependent events are related.

25
o Although Bayes' theorem can be computationally expensive, a simplified version that
makes so-called "naive" assumptions about the independence of features is capable of
handling extremely large datasets.
Laplace estimator (or Laplace smoothing or additive smoothing)
• The Laplace estimator is a probability estimation technique used to avoid zero
probabilities in statistical models by adding a small constant (usually 1) to each
observed count.
• It is especially useful in Naïve Bayes classification, where the absence of a feature
value in the training data can cause the probability for a class to become zero (the
zero-frequency problem).
• Without smoothing, multiplying probabilities would make the entire class
probability zero, leading to incorrect classifications.
• Laplace smoothing assigns a small non-zero probability to unseen events. This
ensures that every possible event has at least a minimal chance of occurring,
thereby improving the robustness of the model.
For categorical features:

where
α = 1 in Laplace smoothing
V is the number of distinct categories
Eg:
A dataset of fruits with counts:
• Apple = 3, Banana = 2, Orange = 0
• Total count = 3 + 2 + 0 = 5
• Vocabulary size, V = 3 (Apple, Banana, Orange)
• Denominator = Total count + (α × V) = 5 + (1 × 3) = 8
𝐂𝐨𝐮𝐧𝐭(𝐀𝐩𝐩𝐥𝐞) + 𝛂 𝟑 + 𝟏 𝟒
𝐏(𝐀𝐩𝐩𝐥𝐞) = ⁡ = = ⁡ = 𝟎. 𝟓
𝐝𝐞𝐧𝐨𝐦𝐢𝐧𝐚𝐭𝐨𝐫 𝟖 𝟖
𝐂𝐨𝐮𝐧𝐭(𝐁𝐚𝐧𝐚𝐧𝐚) + 𝛂 𝟐+𝟏 𝟑
𝐏(𝐁𝐚𝐧𝐚𝐧𝐚) = = ⁡⁡ = ⁡ = 𝟎. 𝟑𝟕𝟓
𝐝𝐞𝐧𝐨𝐦𝐢𝐧𝐚𝐭𝐨𝐫 𝟖 𝟖
𝐂𝐨𝐮𝐧𝐭(𝐎𝐫𝐚𝐧𝐠𝐞) + 𝛂 𝟎 + 𝟏 𝟏
𝐏(𝐎𝐫𝐚𝐧𝐠𝐞) = ⁡ = = ⁡ = 𝟎. 𝟏𝟐𝟓
𝐝𝐞𝐧𝐨𝐦𝐢𝐧𝐚𝐭𝐨𝐫 𝟖 𝟖
Final Probabilities:
• P(Apple)=0.5
• P(Banana)=0.375
• P(Orange)=0.125
Without smoothing, Orange would have probability 0, but Laplace smoothing gives it a
small non-zero probability.
26
Using numeric features with Naive Bayes
One simple and effective solution for handling numeric features in Naïve Bayes is
discretization, which means converting continuous numeric values into categories
called bins.
• This method is especially useful when there are large amounts of training data, which
is often the case in Naïve Bayes applications.
• There are several techniques for discretizing numeric features.

Binning (Discretization or Bucketing)


✓ Binning is the process of grouping continuous or numerical data points into a smaller
number of intervals or "bins".
✓ The main objective of binning is to simplify the data & make it more manageable for
analysis.
✓ Advantage: Binned data is easier to visualize, summarize, and interpret.
Example,
Suppose we add a feature to a spam dataset that records the time of day an e-mail is sent,
ranging from 0 to 24 hours. When plotted as a histogram, the distribution of time data may
look like the following diagram. Depicted using a histogram, the time data might look
something like the following diagram

Discretizing Numeric data


• When numeric values are converted into bins (categories), some detail is lost
because exact values are replaced with groups.
• Using too few bins may hide important patterns.
• Using too many bins may result in very small groups, making the Naïve Bayes
model more sensitive to noise (random variations).
Strength & Weakness of Naïve Bayes Algorithm
Strengths Weaknesses
Assumes features are equally important
Simple, fast, and very effective
and independent (often unrealistic)
Not ideal for datasets with many numeric
Works well with noisy and missing data
or highly correlated features
Requires relatively few training examples, but Estimated probabilities are less reliable
also scales well with large datasets than predicted classes
Easy to obtain estimated probabilities for

predictions

27
Tutorial 4
1. Given the data in the following table, use naive Bayes algorithm to predict the class if Weather = Sunny
and Car = Working
Weather Car Class
1 Sunny Working Go-out
2 Rainy Broken Go-out
3 Sunny Working Go-out
4 Sunny Working Go-out
5 Sunny Working Go-out
6 Rainy Broken Stay-home
7 Rainy Broken Stay-home
8 Sunny Working Stay-home
9 Sunny Broken Stay-home
10 Rainy Broken Stay-home
Soln
Features: Weather, Car
Class: Go-out, Stay-home
test instance: Sunny, Working
2. Use Naïve bayes algorithm to determine whether a red, SUV, domestic is a Stolen car or not using
the following data
Ex Colour Type Origin Stolen?
1 red sports domestic Yes
2 red sports domestic No
3 red sports domestic Yes
4 yellow sports domestic No
5 yellow sports imported Yes
6 yellow SUV imported No
7 yellow SUV imported Yes
8 yellow SUV domestic No
9 red SUV imported No
10 red sports imported Yes
Soln:
Test instance: Colour = red, Type = SUV, Origin = domestic
Features: Colour, Type, Origin
Class (Stolen): yes, no
3. Given a training dataset. Predict the class of a new patient with the symptoms
Fever: Yes, Cough: No, Body Ache: Yes, Fatigue: No, using Naive Bayes classifier.

28
Soln
Features
Fever, cough, body ache, fatigue
Class (disease): yes, no
Test instance: yes, no, yes, no
4. Given the following data on a certain set of patients seen by a doctor. Can the doctor conclude that a
person having chills, fever, mild headache and without running nose has flu? (Use Naive Bayes
classification).
Chills Running nose Headache Fever Has flu
Y N mild Y N
Y Y no N Y
Y N strong Y Y
N N mild Y Y
N N no N N
N Y strong Y Y
N N strong N N
Y Y mild Y Y
Soln
Features: chills, running nose, headache, fever
Class (Has flu): Y, N
Test instance: Y, N, mild, Y
5. Given a training dataset. Predict the Species type of new instance with Colour=Brown, Legs=2,
Height=Tall, Smelly=No using Naive bayes classifier

Soln
Features: color, legs, height, smelly
Class (Species) : M, H
Test instance: brown, 2, tall, no

University Questions
DEC 2018
1. Explain in detail about k-NN with its choice of k (3)
11. Describe Naive Bayes classifier with suitable examples. (6)
OR
12. Explain Joint probability, Conditional probability and Naive Bayes theorem (6)
APRIL 2018

29
MAY 2019

AUG 2017

30
MAY 2017
2. What are the strengths and weaknesses of K-NN Algorithm? (3)
11. Explain K-NN Algorithm with an example. Mention its Strengths &Weaknesses.
(6)
Or
12. With an example Explain Naive Bayes classification algorithm. (6)
September 2020
11 Explain the k-Nearest Neighbour algorithm with a suitable example. Also write
the strengths and weaknesses of the algorithm. (6)
• Explain the k – Nearest Neighbour algorithm with a suitable example of your choice.
Also, write the strengths and weaknesses of the algorithm.
• Algorithm steps – 3 Marks
• Example – 2 Marks
• Strengths and Weaknesses (a minimum of two each & each to be awarded 0.25
Marks) 1
OR
12 Give the summarized form of the Naïve Bayes classifier. Explain each of the terms
that are involved here. (6)
• The summarized Naïve Bayes classifier is given as Explain each of the terms
of this like posterior probability, likelihood, prior probability, marginal
likelihood etc. – 3 Marks
• Give the spam detection example i.e., computation of P(spam/Viagra). – 3
marks

31
1. Explain the differences between supervised and unsupervised machine learning
algorithms.
2. Describe the key concepts that define nearest neighbor classifiers, and why they are
considered "lazy" learners.
3. Explain how to apply k-NN classifier in a data science problem.
4. State Bayes' theorem in statistics. Outline the Naive Bayes algorithm to build
classification models.
5. Differentiate between supervised and unsupervised learning algorithms.
6. Explain how to choose the value of k in k-NN algorithm.
December 2022
1. Explain methods to prepare data for use with k-N.N
(3)
2. Explain Laplace estimator with the help of an example. (3)
Explanation of Laplace estimator – 1.5 Marks, Example – 1.5 Marks

3. Consider the given dataset. Apply Naïve Bayes algorithm and predict that if a
fruit has following properties, then which fruit it is.
Fruit = (Yellow, Sweet, Long) (6)

4. Given the following dataset. Identify the T-Shirt Size of Tom having height 161 cm
and weight 61kg using k-NN algorithm. (Choose k as 3)

DECEMBER 2021

32
1. What are the strengths and weaknesses of K-NN algorithm (3)
Strengths: Simple& effective, Makes no assumptions about the underlying
data distribution, Fast training phase (Any 3 strengths -1.5 marks )
Weaknesses: (Any 3 weaknesses -1.5 marks)
• Does not produce a model, limiting the ability to understand how the features
are related to the class
• Requires selection of an appropriate k
• Slow classification phase
2. Explain the differences between supervised and unsupervised machine learning
algorithms. (3)
A supervised learning algorithm learns from labeled training data, helps you
to predict outcomes for unforeseen data. Unsupervised learning is a machine
learning technique, where you do not need to supervise the model. Instead,
you need to allow the model to work on its own to discover information. It
mainly deals with the unlabelled data. (1.5 marks)
Explanation & one example for supervised machine learning algorithm-1.5
marks
3. Based on the survey conducted in an institution the students are classified based
on the 2 attributes academic excellence and other achievements. Consider the
data set given. Find the classification of a student with value of X is 5 and Y is 7
based on the data of trained samples using KNN algorithm. Choose k = 3

Step1: Choose K=3


Step2:Calculate distance

Step 3:Sort the distances in ascending order


Step4 : Among the 3-nearest neighbours, the class label which occurs
most frequently is “Good”.
Step 5: Assign the class label Good to the test instance.
Steps - 4 marks , correct answer-2 marks

33
4. Consider a training data set consisting of the fauna of the world. Each unit has 3
features named “Swim”, “Fly” and “Crawl”. Let the possible values of these features
be as follows:
Swim - Fast, Slow, No
Fly - Long, Short, Rarely, No
Crawl - Yes, No
For simplicity, each unit is classified as “Animal”, “Bird” or “Fish”. Let the training
data set be as in the table below . Use naive Bayes algorithm to classify a
particular species if its features are (Slow, Rarely,No)

2021 – 2025 June University Questions


Part A
1. Explain the differences between supervised and unsupervised machine learning algorithms.
2. Differentiate between supervised and unsupervised learning algorithms.
3. Why k-NN algorithm is called a lazy learner? Discuss.
4. Explain how to choose the value of k in k-NN algorithm.
5. What are the strengths and weaknesses of K-NN algorithm
6. Explain methods to prepare data for use with k-NN.
7. Explain disadvantages of K-NN classifier. (3)
8. State and explain Bayesian theorem for classification. (3)
9. Can Naive Bayes algorithm handle Numerical continuous variables? Justify your answer.
10. Explain Laplace estimator with the help of an example. (3)
Part B
1. Discuss various ways for preparing data to use with k-NN.
2. Explain the significance of Laplace estimator in bayesian classification. Explain the different ways to
prepare numeric features in naive Bayes algorithm?
3. Explain the differences between supervised and unsupervised machine learning algorithms.
4. Describe the key concepts that define nearest neighbour classifiers, and why they are considered
"lazy" learners.
5. Explain how to apply k-NN classifier in a data science problem.
6. Consider the dataset given below. Using k-NN algorithm, predict the class label for the new instance
with height=172 cm and weight =57 kg. Choose k=1 and k=3

34
7. Based on the survey conducted in an institution the students are classified based on the 2 attributes
academic excellence and other achievements. Consider the data set given. Find the classification of a
student with value of X is 5 and Y is 7 based on the data of trained samples using KNN algorithm.
Choose k = 3

8. . Based on a survey conducted in an institution, students are classified based on the two attributes of
academic excellence and other activities. Given the following data, identify the classification of a
student with X = 5 and Y = 7 using k-NN algorithm (choose k as 3).

9. Consider the dataset given below. Using k-NN algorithm, predict the class label for the new instance
with brightness=20 and saturation =35. Choose k=1 and k=3.

35
10. Given the following dataset. Identify the T-Shirt Size of Tom having height 161 cm and weight 61kg
using k-NN algorithm. (Choose k as 3)

11. With the given data, Use k-NN algorithm to determine the Target attribute for a new instance with X
= 5 and Y =3. (Choose k as 3)

12. State Bayes' theorem in statistics. Outline the Naive Bayes algorithm to build classification models.
13. Consider a training data set consisting of the fauna of the world. Each unit has 3 features named
“Swim”, “Fly” and “Crawl”. Let the possible values of these features be as follows:
Swim - Fast, Slow, No
Fly - Long, Short, Rarely
No Crawl - Yes, No
For simplicity, each unit is classified as “Animal”, “Bird” or “Fish”. Let the training data set be as in
the table below . Use naive Bayes algorithm to classify a particular species if its features are (Slow,
Rarely, No)

36
14. Consider the given dataset. Apply Naïve Bayes algorithm and predict that if a fruit has following
properties, then which fruit it is. Fruit = (Yellow, Sweet, Long)

15. Consider the training data of 10 samples in the given table where ‘Play’ is a class Day Outlook
attribute. Use Bayesian classifier to predict whether there will be a play if it is a rainy day with mild
temperature, Normal humidity and Strong wind.

16. Given a training dataset. Predict the Species type of new instance with Color=Brown, Legs=2,
Height=Tall, Smelly=No using Naive bayes classifier

17. Given a training dataset. Predict the class of a new patient with the symptoms Fever: Yes, Cough: No,
Body Ache: Yes, Fatigue: No, using Naive Bayes classifier.

37
18. Use Naive Bayes algorithm to determine whether a red domestic SUV car is a stolen car or not using
the following data:

38

You might also like