DSC Module2 13.08.25
DSC Module2 13.08.25
❖ ML focuses on the development of computer program that can access data & use it
learn for themselves.
1
❖ The process of learning begins with observations or data, such as examples, direct
experience or instructions in order to look for patterns in data & make better
decisions in the future based on the examples that we provide.
o Machine learning works on a simple concept.
❖ Understanding with experience
Examples
o Facebook: Continuously notices friends that you connect with people that you
visit your interest etc. on the basis of continuous learning, a list of Facebook
users are suggested that you can become friends with.
Tag friends: When you upload a picture of you with a friend, Facebook instantly
recognizes that friend. This is possible with the help of machine learning.
❖ Advertisement Recommendation
When you shop any product online, after some days you keep receiving
notifications for shopping suggestions.
The shopping website or the app recommends you some items that someone
matches with your interest.
This is possible with the help of machine learning on the basis of your behavior
with the website/ app past purchases, items liked or added to cart the
product recommendations are made.
The primary aim of ML is to allow computers to learn automatically without
any human interventions.
How do machines learn?
A formal definition of ML proposed by computer scientist Tom M Mitchell states that:
“A machine learns whenever it is able to utilize its “an experience” such that its
performance improves on similar experiences in the future”.
Human brains are naturally capable of learning from birth, the conditions necessary
for computers to learn must be made explicit. Whether the learner is a human or
machine, the basic learning process is similar.
It can be divided into 4 interrelated components (learning process steps):
1. Data storage
2. Abstraction
3. Generalization
4. Evaluation
2
1. Data Storage
Facilities for storing and retrieving huge amounts of data are an important component
of the learning process. Humans and computers alike utilize data storage as a foundation
for advanced reasoning.
• In a human being, the data is stored in the brain & data is retrieved using
electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar
devices to store data and use cables and other technology to retrieve data.
2. Abstraction
• The second component of the learning process is known as abstraction.
• Abstraction is the process of extracting knowledge about stored data. This involves
creating general concepts about the data as a whole. The creation of knowledge
involves application of known models & creation of new models.The process of fitting
a model to a dataset is known as training.
• When the model has been trained, the data is transformed in to an abstract form that
summarizes the original information.
Observations→ data → Model
3. Generalization
The 3rdcomponent of the learning process is known as generalization.
The term generalization describes the process of turning the knowledge about stored
data into a form that can be utilized for future action. These actions are to be carried
out on tasks that are similar, but not identical, to those what have been seen before.
In generalization, the goal is to discover those properties of the data that will be most
relevant to future tasks
For example, suppose that a ML algorithm learned to identify faces by finding 2 dark
circles representing eyes, positioned above a straight line indicating a mouth.
4. Evaluation
• Provides a feedback mechanism to measure the utility of learned knowledge &
inform potential improvements.
• To evaluate or measure the learner’s success, use this information to inform
additional training if needed.
3
• Models fail to perfectly generalize due to the problem of noise, a term that describes
unexplained or unexplainable variations in data.
• Noisy data is caused by seemingly random events, such that:
▪ Error due to inaccurate sensors
▪ Issues with human subjects
▪ Data quality problems include missing, null, truncated, incorrectly coded, or
corrupted values.
Machine Learning in Practice
5 step process
1. Data Collection
• The data collection step involves gathering the learning material an algorithm will
use to generate actionable knowledge. In most cases, the data will need to be
combined into a single source like a text file, spreadsheet or database.
2. Data Exploration & Preparation
• Checking the quality of data & preparing for the learning process.
3. Model training
• The selection of an appropriate algorithm, & the algorithm will represent the data in
the form of a model.
4. Model Evaluation
• To evaluate the accuracy of the model using a test dataset.
5. Model Improvement
• If better performance is needed, use advanced strategies to improve the performance
of the model.
• Switch to different model.
❖ Image Recognition
For face detection- The categories might be face versus no face present. There
might be a separate category for each person in a database of several individuals.
For character recognition- We can segment a piece of writing into smaller images,
each containing a single character. The categories might consist of the 26 letters of
the English alphabet, the 10 digits, & some special characters.
4
❖ Speech Recognition
Speech recognition is the translation of spoken words into text.
❖ Banking and financial services
ML can be used to predict the customers who are likely to default from paying
loans or credit card bills. This is of supreme importance as ML would help the
banks identify the customers who can be granted loans and credit cards.
❖ Healthcare
It is used to diagnose deadly diseases (eg: Cancer) based on the symptoms of
patients and tallying them with the past data of similar kind of patients.
TYPES OF MACHINE LEARNING
5
• Labeled data
Classification Regression
• Features & discrete labels • Features and continues real values
• Maps an input to discrete label (class) • Predict a real value for an input
Eg: spam or not, type of cancer Eg: temperature
Supervised learning is a ML method in which models are trained using labeled data.
SL needs supervision to train the model, which is similar to as a student learns things
in the presence of a teacher.
SL is commonly used in real world applications.
Eg: face & speech recognition, products or movie recommendations, sales forecasting etc.
In SL, learning data comes with description, labels, targets or desired outputs & the
objective is to find a general rule that maps inputs to outputs. This kind of learning
data is called labeled data.
The learned rule is then used to label new data with unknown outputs.
SL involves building a ML model that is based on labeled samples.
SL deals with learning a function from available training data. Here, a learning
algorithm analyses the training data &produces a derived function that can be used
for mapping new examples.
Eg: Logistic Regression, Neural Networks, Support Vector Machine (SVM), Naïve Bayes
Classifiers etc.
2. UNSUPERVISED LEARNING
Unsupervised learning is a type of machine learning algorithm used to draw inferences
from datasets consisting of input data without labeled responses.
The most common unsupervised learning method is cluster analysis, which is used
for exploratory data analysis to find hidden patterns or grouping in data.
6
Goal: Construct an analyzer to find the hidden relationship between inputs
3. REINFORCEMENT LEARNING
7
• When a new point is given, it calculates the distance to all stored points, finds the k
closest neighbors, and predicts the output based on them.
For instance, in a fruit classification task, K-NN compares a new fruit’s weight
and color with stored fruits and classifies it (e.g., apple, orange) based on the
closest matches.
Nearest Neighbor Classification
• Things that are alike are likely to have properties that are alike.
• Machine learning uses this principle to classify data by placing it in the same category
as similar or "nearest" neighbors.
• Classifying unlabeled examples by assigning them the class of similar labeled
examples.
• Nearest neighbor methods are extremely powerful.
• They have been used successfully for
Computer vision applications, including optical character recognition & facial
recognition in both still images & video.
Predicting whether a person will enjoy a movie or music recommendation.
Identifying patterns in genetic data, perhaps to use them in detecting specific
proteins or diseases.
o In general, nearest neighbor classifiers are well-suited for classification tasks, where
relationships among the features and the target classes are numerous, complicated,
or extremely difficult to understand, yet the items of similar class types tend to be
fairly homogeneous.
• If a concept is difficult to define, but you know it when you see it, then nearest
neighbors might be appropriate.
• If the data is noisy and thus no clear distinction exists among the groups, the nearest
neighbor algorithms may struggle to identify the class boundaries.
K-NN Algorithm
• K Nearest Neighbor is a simple algorithm that stores all the available cases and
classifies the new data or case based on a similarity measure.
• It is mostly used to classify a data point based on how its neighbors are classified.
Strengths Weaknesses
o Simple & effective o Does not produce a model, limiting the
ability to understand how the features are
o Makes no assumptions about related to the class
the underlying data o Requires selection of an appropriate k
distribution o Slow classification phase
o Fast training phase o Nominal features & missing data require
additional processing.
8
k-NN algorithm
Step 1: Define the value of k (a positive integer).
Step 2: Compute the distances between the test instance and the various
training instances.
Euclidean distance formula, 𝒅 = √(𝒙 − 𝒙𝟏)𝟐 + (𝒚 − 𝒚𝟏)𝟐
Step 3: Sort the distances in ascending order.
Step 4: Choose the k training instances which are nearest to the test instance
(k nearest neighbours).
Step 5: Among the class labels of the k nearest neighbours, choose the class label
which occurs most frequently.
Step 6: Assign the chosen class label to the test instance.
Eg: Food ingredients have 2 features, namely, sweetness & crunchiness. They are
measured in a scale of 1 to 10. The ingredients are of 3 types or classes, namely, “fruit",
“vegetable", and “protein". We have the data given in Table5.1 regarding some known
ingredients:
Use k-NN algorithm to determine the food type of tomato with sweetness = 6 &
crunchiness = 4.
Solution
Step 1: We choose k = 3.
Step 2: The feature vector of the test instance is (6, 4). We calculate the distances of
the test instance from the training instances.
9
Step 3: We rank the distances in ascending order.
Step 5: Among the 3 nearest neighbours, there are 2 ingredients of type “protein" & one
of type “vegetable". Hence the majority of the food ingredients are of type “protein".
Step 6: We assign the type “protein" to "tomato".
Choosing an appropriate k in k-NN
In the k-Nearest Neighbors (K-NN) algorithm, k is the number of nearest neighbors
considered when making a prediction.
The choice of k greatly affects the result:
1. Small k → The model becomes sensitive to noise (overfitting).
2. Large k → The model becomes too smooth and may ignore local patterns
(underfitting).
Usually, odd values of k are preferred for classification (to avoid ties).
3. A common approach is to choose k using cross-validation to find the value that gives the best accuracy
on unseen data.
Eg:
10
Choosing k in k-NN
There is no single fixed rule for choosing the best value of k in the k-Nearest Neighbors
algorithm:
1. k = Total number of observations
• If k equals the number of training samples, the concept of "nearest neighbors"
becomes meaningless because all distances are considered.
• The algorithm will always predict the majority class.
2. k = 1
• The prediction is highly sensitive to noise and outliers.
• Example: If a training sample is wrongly labeled, any test point closest to it will
also be classified incorrectly — even if most other neighbors belong to a different
class.
• This leads to overfitting:
o Training error = 0 (perfect on training data)
o Test error = High.
3. k = SQRT(number of observations)
• A good choice when the dataset is small.
• Helps balance bias and variance.
Effect of k on performance
• Low k values → High variance, overfitting, low train error, high test error.
• High k values → More bias, smoother decision boundaries, reduced test error (to
a point).
• Ideal k is a trade-off between bias & variance.
Preparing data for use with k-NN
Since the k-Nearest Neighbors (k-NN) algorithm depends on distances between points,
features with large numeric ranges can dominate the calculation.
Therefore, rescaling (feature scaling) is important before applying k-NN.
Two common methods are:
1. Min-Max Normalization (Feature Scaling)
• Rescales feature values to a standard range, usually 0 to 1.
𝑋 − min(𝑋)
𝑋𝑛𝑒𝑤 =
max(𝑋) − min(𝑋)
Where:
• Xnew – Scaled value
• X- Original value
• min(X) - Minimum value in the dataset for that specific feature
• max(X) - Maximum value in the dataset for that specific feature
Eg: If height ranges from 150 cm to 200 cm, a height of 175 cm would be scaled as:
𝑋 − min(𝑋) 175 − 150 25
𝑋𝑛𝑒𝑤 = = = = 𝟎. 𝟓
max(𝑋) − min(𝑋) 200 − 150 50
11
2. z-score standardization (Standard Scaling)
Transforms values so that they are expressed in terms of how many standard deviations
they are from the mean.
X − μ X − Mean(X)
X new= =
σ StdDev(X)
where,
Xnew – Standardized value
X- Original value
µ - Mean (average) of the dataset
σ – Standard deviation of the dataset
Eg: If the mean height is 170 cm and standard deviation is 10 cm,
a height of 175 cm would have a z-score:
X−μ X−Mean(X) 175−170 5
X new= = = = = 0.5
σ StdDev(X) 10 10
Min-Max Vs Z-score
• Min-max normalization keeps all values within a fixed range (0–1).
• Z-score standardization produces values in an unbounded range (can be negative or
positive).
Whichever scaling method is used on the training data must also be applied to the test
data before prediction
Problem 1
We have the following data from a questionnaire regarding goodness or badness of tissue
papers and data regarding two attributes, namely, acid durability and strength, of the tissue
papers.
Now, the factory produces a new tissue paper that yield X1 = 3 and X2 = 7. Can we guess the
classification of the new tissue paper?
12
Solution
Step 1: We choose k = 3.
Step 2: The feature vector of the test instance is (3, 7). We calculate the distances of the
test instance from the training instances.
The feature vector of Sample 1 is (7, 7). The distance between Sample 1 and new tissue paper
is calculated as follows:
Step 4. Among the 3-nearest neighbours, the class label which occurs most
frequently is “Good”.
Step 5. We assign the class label Good to the test instance.
Problem 2
Food ingredients have two features, namely, sweetness and crunchiness. They are measured
in a scale of 1 to 10. We have the data given in Table 5.1 regarding some known ingredients:
Use k-NN algorithm to determine the food type of tomato with sweetness = 6 and crunchiness
= 4.
Solution
Step 1. We choose k = 3.
Step 2. The feature vector of the test instance is (6, 4). We calculate the distances of the test
instance from the training instances.
13
Step 3. We rank the distances in ascending order
Step 5. Among the 3 nearest neighbours, there are two ingredients of type “protein” and
one of type “vegetable”. Hence the majority of the food ingredients are of type “protein”.
Step 6. We assign the type “protein” to “tomato”.
Applications of KNN
• Banking System - KNN can be used in banking system to predict weather an
individual is fit for loan approval? Does that individual have the characteristics
similar to the defaulters’ one?
• Calculating Credit Ratings - KNN algorithms can be used to find an individual’s
credit rating by comparing with the persons having similar traits.
• Politics - With the help of KNN algorithms, we can classify a potential voter into
various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party ‘Congress’,
“Will Vote to Party ‘BJP’.
• Other areas in which KNN algorithm can be used are Speech Recognition,
Handwriting Detection, image Recognition & Video Recognition.
14
Tutorial
1. We have the following data from a questionnaire regarding goodness or badness of tissue
papers and data regarding two attributes, namely, acid durability and strength, of the
tissue papers.
Now, the factory produces a new tissue paper that yield X1 = 3 & X2 = 7. Can we guess
the classification of the new tissue paper?
Soln
Features: X1, X2
Class : bad, good
Test instance: X1= 3, X2= 7
2. Based on a survey conducted in an institution, students are classified based on the two
attributes of academic excellence (X) and other Activities (Y). Given the following data,
identify the classification of a student with X = 5 and Y = 7 using the k-NN algorithm
(choose k as 3).
X (Academic Excellence) Y (Other Activities) Z(Classification)
8 6 Outstanding
5 6 Good
7 3 Good
6 9 Outstanding
Soln
Features: X, Y
Class (Z): Outstanding, good
Test instance: X=5, Y=7
k=3
3. Consider the dataset given below. Using k-NN algorithm, predict the class label for the new instance
with height=172 cm and weight =57 kg.
Choose k=1 and k=3
15
Soln
Features: height, weight
Class : underweight, normal
Test instance: height=172 cm, weight =57 kg
k =1 & k = 3
4. Given the following dataset. Identify the T-Shirt Size of Tom having height 161 cm and weight 61kg
using k-NN algorithm. (Choose k as 3)
Soln
Features: height, weight
Class (T-shirt) : medium, large
Test instance: height= 161 cm, 61kg
k=3
5. Consider the dataset given below. Using k-NN algorithm, predict the class label for the new instance
with brightness=20 and saturation =35. Choose k=1 and k=3.
Soln
16
Features: brightness, saturation
Class : red, blue
Test instance: brightness = 20, saturation = 35
k =1 & k = 3
6. Given the following dataset. Identify the T-Shirt Size of Tom having height 161 cm and weight 61kg
using k-NN algorithm. (Choose k as 3)
Soln
Features: height, weight
Class (T-shirt) : medium, large
Test instance: height = 161 cm, weight : 61 kg
K=3
7. With the given data, Use k-NN algorithm to determine the Target attribute for a new instance with X =
5 and Y =3. (Choose k as 3)
Soln
Features: X, Y
Class (Target): class1, class2
Test instance: X=5, Y = 3
k=3
17
Probabilistic learning: Understanding Naive Bayes - Conditional probability and
Bayes theorem, Naive Bayes algorithm for classification, The Laplace estimator, Using
numeric features with Naive Bayes.
Understanding Naïve Bayes
The word “naive” in the “Naive Bayes Algorithm” means simple, unsophisticated, or
primitive.
The word “Bayes” refers to Thomas Bayes (1701–1761), an English statistician who
formulated a special case of what is now known as Bayes’ theorem. This theorem forms the
foundation of the Naïve Bayes algorithm.
• Probability is a value between 0 and 1 that indicates the likelihood of an event
occurring based on the available evidence.
• The lower the probability, the less likely the event is to occur.
• A probability of 0 means the event will definitely not occur, while a probability of 1
means the event will occur with complete certainty.
• Classifiers based on Bayesian methods use training data to calculate the observed
probability of each outcome based on the evidence provided by the feature values.
• When the classifier is applied to unlabeled data, it uses these learned probabilities to
predict the most likely class for the new features.
• Although the method is conceptually simple, it often achieves results comparable to more
sophisticated algorithms.
Bayesian classifiers have been effectively used in:
• Text classification – e.g., spam filtering in e-mail systems
• Anomaly detection – identifying unusual patterns in computer networks
• Medical diagnosis – estimating the probability of a disease based on observed
symptoms
When to Use Bayesian Classifiers
Bayesian classifiers are particularly effective for problems where information from many
attributes must be considered together to estimate the overall probability of an outcome.
Unlike some algorithms that ignore features with weak individual effects, Bayesian methods
use all available evidence, allowing even subtle contributions to influence the prediction.
When a large number of features each have relatively small effects, their combined influence
can be substantial, resulting in more accurate and reliable predictions
Basic Concepts of Bayesian Methods
• Bayesian probability theory is based on the idea that the estimated likelihood of an
event (or a potential outcome) should be determined from the evidence at hand,
considering multiple trials or opportunities for the event to occur.
• Bayesian methods provide insights into how the probability of these events can be
estimated from observed data.
Event – A subset of outcomes from the sample space; a set of outcomes of an
experiment to which a probability is assigned.
Trial – A single opportunity for the event to occur.
18
EVENT TRIAL
(Possible outcomes) (Single opportunity for the event to occur)
Heads result Coin flip
Rainy weather A single day
Message is spam Incoming e-mail message
Candidate becomes president Presidential election
Win the lottery Lottery ticket
Understanding probability
The probability of an event is calculated as:
𝐍𝐮𝐦𝐛𝐞𝐫𝐨𝐟𝐭𝐢𝐦𝐞𝐬𝐭𝐡𝐞𝐞𝐯𝐞𝐧𝐭𝐨𝐜𝐜𝐮𝐫𝐞𝐝
𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲𝐨𝐟𝐚𝐧𝐞𝐯𝐞𝐧𝐭, 𝐏(𝐀) =
𝐓𝐨𝐭𝐚𝐥𝐧𝐮𝐦𝐛𝐞𝐫𝐨𝐟𝐭𝐫𝐢𝐚𝐥𝐬
Where:
P(A) – Probability of event A
Number of times the event occurred – Count of successful outcomes
Total number of trials – Number of attempts or observations
Eg:
(a) If it rained 3 out of 10 days with similar conditions as today:
𝟑
Probability of rain , P(Rain) = = 𝟎. 𝟑𝟎𝒐𝒓𝟑𝟎%
𝟏𝟎
(b) If 10 out of 50 prior email messages were spam,
𝟏𝟎
Probability of spam, P(Spam) = 𝟎. 𝟐𝟎𝒐𝒓𝟐𝟎%
𝟓𝟎
We write probabilities as P(A), which denotes the probability of event A occurring.
Sum of Probabilities
The probability of all possible outcomes of a trial must sum to 1.
For example,
If P(spam)=0.20 or 20%
Then P(ham) = 1 – 0.20 = 0.80
Here, spam and ham are mutually exclusive (cannot occur together) and exhaustive
(cover all possible outcomes).
➢ This concludes that spam & ham are mutually exclusive & exhaustive events
which implies that they cannot occur at the same time & are the only possible
outcomes. Because an event cannot simultaneously happen & not happen, an
event is always mutually exclusive & exhaustive with its complement
➢ The complement of event A is typically denoted Ac or A'.
➢ The shorthand notation P(¬A) or P(Ac) can used to denote the probability of event A
not occurring, as in P(¬spam) = 0.80
The rectangle represents the possible outcomes for an e-mail message.
o The circle represents the 20 % probability that the message is spam.
o The remaining 80% represents the complement P(¬spam) or the messages that are
not spam:
19
Understanding Joint Probability
o A second event based on the outcome that an e-mail message contains the word
Viagra.
o In most cases, this word is likely to appear only in a spam message; its presence
in an incoming e-mail is therefore a very strong piece of evidence that the message
is spam.
• We know that 20 % of all messages were spam (the left circle) and 5% of all messages
contained the word Viagra (the right circle).
22
Naïve Bayes Algorithm
Input:
• Training dataset with features X = (x1, x2, …, xn)
• Class labels C = {c1, c2, …, cp}
• Test instance Xtest
Output:
Predicted class label for Xtest
Step 1: Compute the Prior probability of each class P (ck)
for k=1, 2, …, p
23
Problem1: Consider a training data set consisting of the fauna of the world. Each unit has
3 features named “Swim”, “Fly” and “Crawl”. Let the possible values of these features be
as follows:
Swim - Fast, Slow, No
Fly - Long, Short, Rarely, No
Crawl - Yes, No
For simplicity, each unit is classified as “Animal”, “Bird” or “Fish”. Let the training data
set be as in Table 1. Use naive Bayes algorithm to classify a particular species if its
features are (Slow, Rarely, No)?
Solution
x1 = Slow, x2 = Rarely, x3 = No
We construct the frequency table shown in Table 2 which summarizes the data. (It may
be noted that the construction of the frequency table is not part of the algorithm.)
Features
Class Swim (F1) Fly (F2) Crawl (F3) Total
fast slow no Long short rarely no yes no
Animal(c1) 2 2 1 0 0 1 4 2 3 5
Bird(c2) 1 0 3 1 2 0 1 0 4 4
Fish (c3) 1 2 0 0 0 0 3 0 3 3
Total 4 4 4 1 2 1 8 2 10 12
25
o Although Bayes' theorem can be computationally expensive, a simplified version that
makes so-called "naive" assumptions about the independence of features is capable of
handling extremely large datasets.
Laplace estimator (or Laplace smoothing or additive smoothing)
• The Laplace estimator is a probability estimation technique used to avoid zero
probabilities in statistical models by adding a small constant (usually 1) to each
observed count.
• It is especially useful in Naïve Bayes classification, where the absence of a feature
value in the training data can cause the probability for a class to become zero (the
zero-frequency problem).
• Without smoothing, multiplying probabilities would make the entire class
probability zero, leading to incorrect classifications.
• Laplace smoothing assigns a small non-zero probability to unseen events. This
ensures that every possible event has at least a minimal chance of occurring,
thereby improving the robustness of the model.
For categorical features:
where
α = 1 in Laplace smoothing
V is the number of distinct categories
Eg:
A dataset of fruits with counts:
• Apple = 3, Banana = 2, Orange = 0
• Total count = 3 + 2 + 0 = 5
• Vocabulary size, V = 3 (Apple, Banana, Orange)
• Denominator = Total count + (α × V) = 5 + (1 × 3) = 8
𝐂𝐨𝐮𝐧𝐭(𝐀𝐩𝐩𝐥𝐞) + 𝛂 𝟑 + 𝟏 𝟒
𝐏(𝐀𝐩𝐩𝐥𝐞) = = = = 𝟎. 𝟓
𝐝𝐞𝐧𝐨𝐦𝐢𝐧𝐚𝐭𝐨𝐫 𝟖 𝟖
𝐂𝐨𝐮𝐧𝐭(𝐁𝐚𝐧𝐚𝐧𝐚) + 𝛂 𝟐+𝟏 𝟑
𝐏(𝐁𝐚𝐧𝐚𝐧𝐚) = = = = 𝟎. 𝟑𝟕𝟓
𝐝𝐞𝐧𝐨𝐦𝐢𝐧𝐚𝐭𝐨𝐫 𝟖 𝟖
𝐂𝐨𝐮𝐧𝐭(𝐎𝐫𝐚𝐧𝐠𝐞) + 𝛂 𝟎 + 𝟏 𝟏
𝐏(𝐎𝐫𝐚𝐧𝐠𝐞) = = = = 𝟎. 𝟏𝟐𝟓
𝐝𝐞𝐧𝐨𝐦𝐢𝐧𝐚𝐭𝐨𝐫 𝟖 𝟖
Final Probabilities:
• P(Apple)=0.5
• P(Banana)=0.375
• P(Orange)=0.125
Without smoothing, Orange would have probability 0, but Laplace smoothing gives it a
small non-zero probability.
26
Using numeric features with Naive Bayes
One simple and effective solution for handling numeric features in Naïve Bayes is
discretization, which means converting continuous numeric values into categories
called bins.
• This method is especially useful when there are large amounts of training data, which
is often the case in Naïve Bayes applications.
• There are several techniques for discretizing numeric features.
27
Tutorial 4
1. Given the data in the following table, use naive Bayes algorithm to predict the class if Weather = Sunny
and Car = Working
Weather Car Class
1 Sunny Working Go-out
2 Rainy Broken Go-out
3 Sunny Working Go-out
4 Sunny Working Go-out
5 Sunny Working Go-out
6 Rainy Broken Stay-home
7 Rainy Broken Stay-home
8 Sunny Working Stay-home
9 Sunny Broken Stay-home
10 Rainy Broken Stay-home
Soln
Features: Weather, Car
Class: Go-out, Stay-home
test instance: Sunny, Working
2. Use Naïve bayes algorithm to determine whether a red, SUV, domestic is a Stolen car or not using
the following data
Ex Colour Type Origin Stolen?
1 red sports domestic Yes
2 red sports domestic No
3 red sports domestic Yes
4 yellow sports domestic No
5 yellow sports imported Yes
6 yellow SUV imported No
7 yellow SUV imported Yes
8 yellow SUV domestic No
9 red SUV imported No
10 red sports imported Yes
Soln:
Test instance: Colour = red, Type = SUV, Origin = domestic
Features: Colour, Type, Origin
Class (Stolen): yes, no
3. Given a training dataset. Predict the class of a new patient with the symptoms
Fever: Yes, Cough: No, Body Ache: Yes, Fatigue: No, using Naive Bayes classifier.
28
Soln
Features
Fever, cough, body ache, fatigue
Class (disease): yes, no
Test instance: yes, no, yes, no
4. Given the following data on a certain set of patients seen by a doctor. Can the doctor conclude that a
person having chills, fever, mild headache and without running nose has flu? (Use Naive Bayes
classification).
Chills Running nose Headache Fever Has flu
Y N mild Y N
Y Y no N Y
Y N strong Y Y
N N mild Y Y
N N no N N
N Y strong Y Y
N N strong N N
Y Y mild Y Y
Soln
Features: chills, running nose, headache, fever
Class (Has flu): Y, N
Test instance: Y, N, mild, Y
5. Given a training dataset. Predict the Species type of new instance with Colour=Brown, Legs=2,
Height=Tall, Smelly=No using Naive bayes classifier
Soln
Features: color, legs, height, smelly
Class (Species) : M, H
Test instance: brown, 2, tall, no
University Questions
DEC 2018
1. Explain in detail about k-NN with its choice of k (3)
11. Describe Naive Bayes classifier with suitable examples. (6)
OR
12. Explain Joint probability, Conditional probability and Naive Bayes theorem (6)
APRIL 2018
29
MAY 2019
AUG 2017
30
MAY 2017
2. What are the strengths and weaknesses of K-NN Algorithm? (3)
11. Explain K-NN Algorithm with an example. Mention its Strengths &Weaknesses.
(6)
Or
12. With an example Explain Naive Bayes classification algorithm. (6)
September 2020
11 Explain the k-Nearest Neighbour algorithm with a suitable example. Also write
the strengths and weaknesses of the algorithm. (6)
• Explain the k – Nearest Neighbour algorithm with a suitable example of your choice.
Also, write the strengths and weaknesses of the algorithm.
• Algorithm steps – 3 Marks
• Example – 2 Marks
• Strengths and Weaknesses (a minimum of two each & each to be awarded 0.25
Marks) 1
OR
12 Give the summarized form of the Naïve Bayes classifier. Explain each of the terms
that are involved here. (6)
• The summarized Naïve Bayes classifier is given as Explain each of the terms
of this like posterior probability, likelihood, prior probability, marginal
likelihood etc. – 3 Marks
• Give the spam detection example i.e., computation of P(spam/Viagra). – 3
marks
31
1. Explain the differences between supervised and unsupervised machine learning
algorithms.
2. Describe the key concepts that define nearest neighbor classifiers, and why they are
considered "lazy" learners.
3. Explain how to apply k-NN classifier in a data science problem.
4. State Bayes' theorem in statistics. Outline the Naive Bayes algorithm to build
classification models.
5. Differentiate between supervised and unsupervised learning algorithms.
6. Explain how to choose the value of k in k-NN algorithm.
December 2022
1. Explain methods to prepare data for use with k-N.N
(3)
2. Explain Laplace estimator with the help of an example. (3)
Explanation of Laplace estimator – 1.5 Marks, Example – 1.5 Marks
3. Consider the given dataset. Apply Naïve Bayes algorithm and predict that if a
fruit has following properties, then which fruit it is.
Fruit = (Yellow, Sweet, Long) (6)
4. Given the following dataset. Identify the T-Shirt Size of Tom having height 161 cm
and weight 61kg using k-NN algorithm. (Choose k as 3)
DECEMBER 2021
32
1. What are the strengths and weaknesses of K-NN algorithm (3)
Strengths: Simple& effective, Makes no assumptions about the underlying
data distribution, Fast training phase (Any 3 strengths -1.5 marks )
Weaknesses: (Any 3 weaknesses -1.5 marks)
• Does not produce a model, limiting the ability to understand how the features
are related to the class
• Requires selection of an appropriate k
• Slow classification phase
2. Explain the differences between supervised and unsupervised machine learning
algorithms. (3)
A supervised learning algorithm learns from labeled training data, helps you
to predict outcomes for unforeseen data. Unsupervised learning is a machine
learning technique, where you do not need to supervise the model. Instead,
you need to allow the model to work on its own to discover information. It
mainly deals with the unlabelled data. (1.5 marks)
Explanation & one example for supervised machine learning algorithm-1.5
marks
3. Based on the survey conducted in an institution the students are classified based
on the 2 attributes academic excellence and other achievements. Consider the
data set given. Find the classification of a student with value of X is 5 and Y is 7
based on the data of trained samples using KNN algorithm. Choose k = 3
33
4. Consider a training data set consisting of the fauna of the world. Each unit has 3
features named “Swim”, “Fly” and “Crawl”. Let the possible values of these features
be as follows:
Swim - Fast, Slow, No
Fly - Long, Short, Rarely, No
Crawl - Yes, No
For simplicity, each unit is classified as “Animal”, “Bird” or “Fish”. Let the training
data set be as in the table below . Use naive Bayes algorithm to classify a
particular species if its features are (Slow, Rarely,No)
34
7. Based on the survey conducted in an institution the students are classified based on the 2 attributes
academic excellence and other achievements. Consider the data set given. Find the classification of a
student with value of X is 5 and Y is 7 based on the data of trained samples using KNN algorithm.
Choose k = 3
8. . Based on a survey conducted in an institution, students are classified based on the two attributes of
academic excellence and other activities. Given the following data, identify the classification of a
student with X = 5 and Y = 7 using k-NN algorithm (choose k as 3).
9. Consider the dataset given below. Using k-NN algorithm, predict the class label for the new instance
with brightness=20 and saturation =35. Choose k=1 and k=3.
35
10. Given the following dataset. Identify the T-Shirt Size of Tom having height 161 cm and weight 61kg
using k-NN algorithm. (Choose k as 3)
11. With the given data, Use k-NN algorithm to determine the Target attribute for a new instance with X
= 5 and Y =3. (Choose k as 3)
12. State Bayes' theorem in statistics. Outline the Naive Bayes algorithm to build classification models.
13. Consider a training data set consisting of the fauna of the world. Each unit has 3 features named
“Swim”, “Fly” and “Crawl”. Let the possible values of these features be as follows:
Swim - Fast, Slow, No
Fly - Long, Short, Rarely
No Crawl - Yes, No
For simplicity, each unit is classified as “Animal”, “Bird” or “Fish”. Let the training data set be as in
the table below . Use naive Bayes algorithm to classify a particular species if its features are (Slow,
Rarely, No)
36
14. Consider the given dataset. Apply Naïve Bayes algorithm and predict that if a fruit has following
properties, then which fruit it is. Fruit = (Yellow, Sweet, Long)
15. Consider the training data of 10 samples in the given table where ‘Play’ is a class Day Outlook
attribute. Use Bayesian classifier to predict whether there will be a play if it is a rainy day with mild
temperature, Normal humidity and Strong wind.
16. Given a training dataset. Predict the Species type of new instance with Color=Brown, Legs=2,
Height=Tall, Smelly=No using Naive bayes classifier
17. Given a training dataset. Predict the class of a new patient with the symptoms Fever: Yes, Cough: No,
Body Ache: Yes, Fatigue: No, using Naive Bayes classifier.
37
18. Use Naive Bayes algorithm to determine whether a red domestic SUV car is a stolen car or not using
the following data:
38