KEMBAR78
Machine Learning Question Bank | PDF | Machine Learning | Statistical Classification
0% found this document useful (0 votes)
6 views27 pages

Machine Learning Question Bank

The document provides an overview of Machine Learning (ML), defining it as a branch of Artificial Intelligence that enables machines to learn from data. It discusses various application areas, classifications of ML, steps in designing a learning system, characteristics of ML tasks, and differentiates between training, testing, and validation sets. Additionally, it explores different types of models, the importance of features in ML, and concludes with the significance of aligning features, models, and tasks for successful learning.

Uploaded by

apexgaming27889
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views27 pages

Machine Learning Question Bank

The document provides an overview of Machine Learning (ML), defining it as a branch of Artificial Intelligence that enables machines to learn from data. It discusses various application areas, classifications of ML, steps in designing a learning system, characteristics of ML tasks, and differentiates between training, testing, and validation sets. Additionally, it explores different types of models, the importance of features in ML, and concludes with the significance of aligning features, models, and tasks for successful learning.

Uploaded by

apexgaming27889
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Machine Learning Question Bank

Unit-1 – Chapter-1

Define Machine Learning ? Explain various types of applications area in


Machine Learning.

Definition of Machine Learning:

1. Meaning of Machine Learning:


Machine Learning (ML) is a branch of Artificial Intelligence that allows machines
to learn patterns from data and make decisions without being explicitly
programmed.

2. Arthur Samuel’s Definition:


According to Arthur Samuel, Machine Learning is "the field of study that gives
computers the ability to learn from data without being explicitly programmed."

3. Tom Mitchell’s Definition:


Tom Mitchell defines ML as: "A computer program is said to learn from
experience (E) with respect to tasks (T) and performance measure (P) if its
performance improves with experience."

Application Areas of Machine Learning:

I. Image Recognition:
ML is used to identify people, objects, or scenes in images. For example,
Facebook uses ML to automatically tag friends in photos using face recognition.
II. Speech Recognition:
It helps convert voice commands into text. Examples include Siri, Google
Assistant, and Alexa that understand and act on voice inputs.
III. Traffic Prediction:
Google Maps uses ML to predict traffic conditions using GPS data and past traffic
trends, helping users find the fastest routes.
IV. Product Recommendation:
E-commerce websites like Amazon and streaming services like Netflix use ML to
recommend products or movies based on user preferences and behaviour.
V. Self-Driving Cars:
Companies like Tesla use ML to train cars to detect objects, follow lanes, and
make driving decisions using real-time sensor data.
VI. Spam and Malware Filtering:
Email services use ML algorithms like Naïve Bayes to filter spam and detect
harmful attachments automatically.
VII. Medical Diagnosis:
ML helps doctors identify diseases by analysing medical images, health records,
and symptoms—for example, detecting brain tumours or cancer early.
Discuss the Classification of Machine Learning in detail.

Machine Learning is classified into different types based on how the learning process
happens and what kind of data is provided. The three main types are:
 Supervised Learning
 Unsupervised Learning
 Reinforcement Learning
1. Supervised Learning :
In supervised learning, the model is trained on labelled data—this means the input
data is paired with the correct output. The model learns the relationship and predicts
output for new inputs. Used in applications where historical data is available.
Examples include:
 Email Spam Detection
 Risk Assessment
 Image Classification
 Fraud Detection
There are two major problems in supervised learning:
 Regression: Predicts continuous values (e.g., house price, temperature)
 Classification: Predicts categories (e.g., spam or not spam)
2. Unsupervised Learning :
In unsupervised learning, the model is given unlabelled data. The system tries to learn
patterns and structures from the data without known outputs. Used for tasks like:
 Customer Segmentation
 Anomaly Detection
 Market Basket Analysis
 Clustering Images or Documents
Two main types of problems are:
 Clustering: Grouping similar data points (e.g., K-Means, Hierarchical Clustering)
 Association: Discovering rules that describe large portions of data (e.g., Market
Basket Analysis)
3. Reinforcement Learning :
Reinforcement learning is a feedback-based method where an agent learns by
interacting with the environment, receiving rewards for good actions and penalties for
bad actions. Used in dynamic and sequential decision-making tasks such as:
 Game Playing (e.g., Chess, Go)
 Robotics
 Self-driving cars
 Industrial automation systems
Discuss various steps in designing a learning system
1. Choosing the Training Experience
The first step is to select the right training data or experience that will be fed
into the machine learning algorithm. This data must be relevant and should
have a direct or indirect impact on the success of the model. For example, in a
chess game, the moves played, and their outcomes act as training experience
from which the model learns.

2. Choosing the Target Function


Once the training data is selected, a target function must be defined. This
function describes the goal of learning — it maps the input to the desired
output. For example, in a spam detection system, the target function might be
"SpamClassifier" which decides whether an email is spam or not.

3. Choosing the Representation for the Target Function


After defining the target function, it must be represented in a suitable form such
as linear equations, hierarchical graphs, or tabular formats. This representation
helps the machine understand and apply the logic behind the function. In a
chess game, it would represent all legal and optimal moves.

4. Choosing the Function Approximation Algorithm


The algorithm used to approximate the target function is selected next. This
algorithm helps the system learn from training examples by trial and error. The
more examples it sees, the better it becomes at selecting the correct output. For
example, the system may initially make mistakes but gradually learns from
experience and improves its accuracy.

5. Final Design of the Learning System


After going through various training instances, learning from errors, and refining
predictions, the system reaches its final design. This final model can make
intelligent decisions or predictions on new, unseen data. An example is Deep
Blue, the ML-based system that beat chess champion Garry Kasparov by
learning and improving through experience.

Explain the Characteristics of Machine Learning Tasks.


1. Automated Data Visualization
Machine learning offers tools that automatically visualize complex relationships in
both structured and unstructured data. This helps businesses uncover insights and
patterns easily, leading to better decision-making.

2. Automation at Its Best


One of the key characteristics of ML is its ability to automate repetitive and time-
consuming tasks. Industries like finance use ML to automate accounting, expense
management, invoicing, and even customer queries using chatbots.

3. Enhanced Customer Engagement


ML helps businesses improve customer interaction by analysing what kind of
content, words, or products resonate with users. For example, Pinterest uses ML to
personalize content suggestions based on user behaviour.

4. Increased Efficiency with IoT Integration


When combined with Internet of Things (IoT) technologies, machine learning can
significantly improve the efficiency of industrial and business processes. ML
analyses IoT-generated data to optimize operations and reduce waste.

5. Transformation of the Mortgage Market


Machine learning allows financial institutions to better understand customer
spending behaviour and creditworthiness beyond just credit scores. This helps
lenders make more informed decisions in mortgage and loan approvals.

6. Accurate Data Analysis


Unlike traditional trial-and-error methods, machine learning provides powerful
algorithms that can handle large and diverse datasets. This leads to faster, more
precise analysis and more reliable outcomes.

7. Improved Business Intelligence


ML enhances business intelligence by processing big data and extracting useful
insights. Industries like retail, healthcare, and finance use ML to support strategic
planning, product development, and customer service.

Differentiate between training set, testing set and validation set.

Aspect Training Set Validation Set Testing Set


Purpose Used to train the Used to tune Used to evaluate final
machine learning hyperparameters and model performance on
model improve model unseen data
performance

Data Labelled data used to Labelled data used for Labelled data used
Type fit the model tuning and model only for final evaluation
selection

Usage Used during model Used during training Used after training and
Time training (for validation and validation are complete
tuning)

Helps Learning patterns, Preventing overfitting, Checking


With building the model selecting the best generalization ability of
model version the final model

Seen by Yes, the model Yes, used indirectly No, completely unseen
Model? directly learns from it during model tuning during training and
tuning

Effect Directly affects how Helps adjust Does not influence the
on the model is trained parameters to improve model; only measures
Model performance performance

Example Fitting a regression or Choosing number of Measuring accuracy,


Use classification model layers in a neural precision, recall, etc.
network

Risk If Model may underfit if May cause overfitting if If leaked into training,
Misused data is insufficient used excessively results in
overestimated
performance

Typical 60–70% of the 10–20% of the dataset 20–30% of the dataset


Proporti dataset
on

Related Learning from Cross-validation, model Final model evaluation,


Concept examples selection real-world performance
check
Differentiate between Predictive and Descriptive task.

Aspect Predictive Tasks Descriptive Tasks

Purpose To predict future or unknown To discover hidden patterns or


outcomes based on input data relationships in the data

Target Involves a known target Does not involve a target variable


Variable variable (output is labelled) (no labels in data)

Learning Commonly used in supervised Commonly used in unsupervised


Type learning learning

Examples Classification (e.g., spam Clustering (e.g., customer


detection), Regression (e.g., grouping), Association Rule
predicting house prices) Discovery (e.g., Market Basket)

Output Predicts specific values or class Summarizes or explains structure


labels in data without predictions

Data Requires historical labelled data Works with unlabelled data


Requireme (input-output pairs)
nt

Focus Accuracy of prediction Understanding structure or


distribution of data

Alternative Sometimes called supervised Sometimes called exploratory data


Name predictive modelling analysis

Used For Decision-making, forecasting, Data summarization, insight


risk assessment generation

Example Playing checkers – predicting Subgroup discovery or clustering


from PDF probability of winning movies by genres
Write a note on Learning Vs. Designing.

1. Start with Training Data


In a learning-based approach, the process begins by collecting and preparing
training data. This data contains various examples that represent the problem
domain and includes relevant input and output relationships (if supervised learning
is used).
2. Feed Data into Machine Learning Algorithm
The training data is fed into a machine learning algorithm. This algorithm is
designed to analyze the data and identify patterns, correlations, or rules hidden
within the examples.
3. Build Logical and Mathematical Model
Using the data, the algorithm constructs a logical and mathematical model. This
model represents the learned knowledge and will be used to make future
predictions or decisions. The more data the model is exposed to, the better it
becomes at generalizing and improving accuracy.
4. Produce Output Based on Learning
Once the model is built, it can take new input data and generate an output. This
output is not the result of hardcoded logic, but rather the result of what the model
has learned from experience (i.e., the training data).

Learning vs Designing Philosophy


In traditional designing, the programmer manually creates a set of rules and
instructions that the system must follow. All logic is predefined, and there's no scope
for adaptation or improvement unless the programmer changes the code. In contrast,
learning allows the system to automatically improve over time by analysing more
data, thus reducing the need for manual intervention.

Example – Driverless Car


A driverless car, when designed using traditional methods, would require manually
coding every possible traffic scenario. In a learning system, the car is trained using
real-world driving data. The ML algorithm learns traffic rules, object detection, and
decision-making by building a model from this data, which results in smarter and
adaptive driving.
Unit-1 – Chapter-2

Describe the following model with example

1. Logical Model.
2. Probabilistic Model.
3. Geometric Model.

1. Logical Model

A logical model uses a series of logical conditions to divide the instance space into
segments. These models typically rely on if-then rules and are closely related to
decision trees and rule-based systems. They help classify data into groups by applying
logical expressions. In such models, the learning process involves determining which
conditions lead to which outputs.

For example, in spam filtering, a rule might be:

 if bonus = 1 then Class = spam

 else if lottery = 1 then Class = spam

 else Class = ham

This makes the logical model very easy to interpret and implement. It is especially
useful in applications where rule transparency is required.

2. Probabilistic Model

Probabilistic models are based on statistical probability. These models assign a


posterior probability to each possible output class based on given input data, often
using Bayes’ theorem. The system learns the probabilities of the output given the
input, using training data to calculate these values.

In the same spam detection example, a probabilistic model may calculate:

 P(spam∣bonus=1,lottery=0)

If this value is greater than 0.5, the model classifies the email as spam. Probabilistic
models are suitable when it's important to estimate the degree of belief or confidence
in the prediction.

3. Geometric Model

A geometric model represents instances as points in a multidimensional space and


uses geometric relationships to make decisions. These models work by creating
decision boundaries (such as lines, planes, or curves) that separate different classes in
space. Classification is done by checking which side of the boundary a point lies on.

An example of a geometric model is a linear classifier, where a straight line (in 2D) or
hyperplane (in higher dimensions) separates two classes. The formula used is:

 w⋅x=t
Another example is the k-nearest neighbour model, where a new instance is classified
based on the majority class among its closest neighbours in the feature space.
Geometric models are effective when data is numerically represented, and spatial
separation exists between categories.

Machine Learning is all about using the right feature to build right model
that achieve right task justify your answer.

1. Features represent the problem to the model


Features are the measurable properties of input data. If the right features are
selected, they capture the essential patterns needed to solve the problem. Without
the right features, even the best algorithms fail to learn effectively.

2. Models learn from features to perform tasks


The model’s learning capability depends on how well the features describe the
data. For example, in spam classification, features like "bonus" or "lottery" help the
model differentiate spam from non-spam emails.

3. Each model type suits different data and tasks


Logical models work well with rule-based classification, probabilistic models handle
uncertainty, and geometric models are ideal for spatially separable data. Using the
wrong model for the feature type or task can lead to incorrect results.

4. The task defines the goal of learning


Machine learning tasks can be predictive (like classification or regression) or
descriptive (like clustering). Choosing the right task helps determine the learning
method and evaluation strategy.

5. Wrong features or models lead to poor performance


If features are irrelevant, or if the model is too simple or too complex for the task,
the system may underfit or overfit, failing to generalize well to new data.

6. All three components are interdependent


The success of a machine learning system depends on the harmony between
features, model, and task. Each one affects the effectiveness of the others and
changing one may require adjusting the rest.

7. Learning is about mapping inputs (features) to outputs (task)


The entire process of machine learning is to use features as input, a model to learn
from those features, and achieve the correct output — that is, solving the intended
task.

8. Real-world systems demonstrate this alignment


In systems like driverless cars or email spam filters, choosing the right sensory or
textual features, the right modelling approach, and a well-defined goal is what
enables high performance.

9. Model generalization depends on proper input and output setup


Generalization — the ability of a model to perform well on unseen data — is only
possible when the input features and learning task are correctly aligned with the
problem structure.
10. Conclusion
Machine learning is not just about choosing a fancy algorithm. It's about selecting
the right features that describe the data well, applying the right model that can
learn from those features, and targeting the right task that matches the problem
goal. Without alignment between all three, learning will not succeed.
What are the various types of features available, Explain one in brief.

1. Binary Features
Binary features are attributes that can take only two values: typically, 0 or 1. These
values represent true/false, yes/no, or presence/absence of a particular
characteristic. Binary features are widely used in classification tasks, especially
where logical decision rules apply.
Example:
In email classification, a binary feature could be:
o bonus = 1 if the word "bonus" is present
o bonus = 0 if it is not present
This feature allows for straightforward rules like:
if bonus = 1 then Class = spam
2. Nominal Features
Nominal features are categorical attributes that can take on one of several discrete
values, but these values have no inherent order. Each category represents a label,
and all categories are treated as equally distinct without any ranking.
Example:
In movie classification, a feature like Genre can take values like:
o Action
o Comedy
o Drama
o Horror
3. Ordinal Features
Ordinal features are like nominal features but with an important difference their
values have a clear, meaningful order. However, the distance between the values is
not defined.
Example:
A satisfaction rating:
o Poor < Fair < Good < Excellent
While "Excellent" is clearly better than "Good", we cannot quantify how much
better. These features are important in models that can handle ordered
information.
4. Quantitative Features
Quantitative features (also called numerical features) are those that take on real
numerical values and have a mathematical meaning. The differences between
values are measurable and consistent.
Example:
o Age: 25, 30, 45
o Price: ₹199, ₹250, ₹399
These features can be directly used in mathematical computations like calculating
averages, distances, or trends, and are essential for regression and geometric
models.

Discuss the need of feature construction and feature transformation and


explain how it can be achieved.

Need for Feature Construction and Transformation

In machine learning, raw data collected from various sources often contains irrelevant,
redundant, or unstructured information. Models cannot perform efficiently if the input
features are poorly represented. Therefore, feature construction and feature
transformation are essential steps to improve the learning process by making data
more meaningful and usable for algorithms.

These processes help in:

 Improving model accuracy by creating more informative features

 Reducing noise and redundancy

 Making data compatible with algorithms that expect input in a certain form

 Helping the model generalize better to unseen data

1. Feature Construction

Feature construction involves creating new features from the existing raw data to
enhance the model’s predictive power. These new features help represent the
underlying patterns more clearly.

 Example :
From an email’s text, new features like "bonus", "lottery", and "win" can be
extracted. These do not exist explicitly in the raw data but are constructed based
on word presence or frequency.

This process helps transform unstructured text into structured features suitable for
machine learning algorithms.

2. Feature Transformation

Feature transformation refers to modifying existing features to make them more


suitable for learning. This includes scaling, normalizing, or encoding features so that
the model can process them effectively.

 Techniques :
o Text to Binary Transformation: Converting words into binary values — e.g., if the
word “bonus” appears, feature = 1, else 0.

o Word Frequency Count: Converting text features into numerical form based on
how often a word appears in the document.

o Dimensionality Reduction: Reducing the number of features to eliminate noise


and improve performance.

These transformations make the data machine-readable and help algorithms like
decision trees, SVMs, and neural networks learn more efficiently.

Explain the various approaches that can be used for feature selection.

Feature selection is the process of identifying and selecting the most relevant features
from a dataset to improve model performance, reduce overfitting, and lower
computational cost. There are three main approaches to feature selection:

1. Filter Approach

The filter approach selects features based on their statistical properties,


independently of any learning algorithm. Features are ranked using metrics like
information gain, correlation, or chi-square test, and the top-ranked ones are selected.

 Example: Selecting words in a text classification task based on their frequency


or relevance.

 Advantage: Fast and simple; works well as a preprocessing step.

 Limitation: Does not consider feature interactions.

2. Wrapper Approach

The wrapper approach evaluates subsets of features by actually training and testing a
model on them. It searches for the best-performing combination of features using
techniques like forward selection, backward elimination, or recursive feature
elimination.

 Example: Adding or removing one feature at a time and testing how the model
accuracy changes.

 Advantage: Considers interaction between features.

 Limitation: Computationally expensive, especially with large datasets.

3. Embedded Approach

The embedded approach performs feature selection as part of the model training
process. The learning algorithm itself selects the most important features while
building the model.
 Example: Decision trees automatically select the best features at each split;
LASSO regression shrinks less important feature coefficients to zero.

 Advantage: Efficient and integrated with model building.

 Limitation: Specific to certain algorithms.

Discuss the term variance and bias with respect to overfitting and
underfitting.

In machine learning, bias and variance are two key components that contribute to a
model's prediction error. Understanding how they relate to underfitting and overfitting
helps in building models that generalize well on unseen data.

Bias – Linked to Underfitting

Bias refers to the error that is introduced by approximating a complex real-world


problem using a simplified model. When a model has high bias, it means it makes
strong assumptions about the data and fails to learn the underlying patterns properly.

 Such models are often too simple to capture the complexity of the data.

 They tend to ignore important features and relationships in the dataset.

 This results in poor performance both on the training set and the test set.

 The model is said to underfit the data.

Example:
Using a straight line to fit a clearly curved dataset results in high bias. The model
cannot learn the curve and gives inaccurate predictions, even on training data.

Variance – Linked to Overfitting

Variance measures how much a model’s predictions change when it is trained on


different subsets of the data. A model with high variance is very sensitive to the
training data and learns even the noise or random fluctuations.

 Such models are typically too complex relative to the amount of data available.

 They perform very well on training data but fail to generalize on new, unseen
data.

 This leads to overfitting, where the model captures noise instead of the true
signal.

Example:
A deep decision tree that fits all training examples perfectly, including outliers, may
fail to predict well on test data due to high variance.
Bias-Variance Trade-off

There is a natural trade-off between bias and variance:

 High bias, low variance models are stable but often inaccurate — they underfit.

 Low bias, high variance models are flexible but often unstable — they overfit.

The challenge is to find the optimal model complexity that maintains a balance:

 Enough flexibility to capture real patterns (low bias)

 Enough simplicity to ignore noise (low variance)

This balance leads to low total error and good generalization.


Unit-2

Explain Multiclass classification with and example.

1. Multi-class classification is a supervised learning problem where the model predicts


one class label out of three or more possibilities. Unlike binary classification, which
deals with only two outcomes, this method requires the model to handle multiple
categories at once.

2. The classifier learns patterns from input data and maps them to the correct class
label. During training, it sees many examples of each class so that it can later
identify the correct category for new, unseen data.

3. A common example is crop classification, where the model predicts whether the
data represents wheat, rice, maize, or cotton. Each crop type is treated as a
separate class, and the algorithm uses features like soil type and climate to make
predictions.

4. Another example is music genre classification. Here, audio features such as


rhythm, melody, and tempo are analysed to categorize songs into genres like rock,
jazz, classical, or pop.

5. Multi-class classification is more challenging than binary classification because the


model must learn to separate multiple groups at once. This often requires more
complex decision boundaries.

6. Real-world applications include handwriting recognition, where digits 0–9 each


represent a class, and image recognition, where objects like cars, dogs, and birds
must be identified correctly.
7. Algorithms commonly used for multi-class classification include Decision Trees,
Naïve Bayes, Support Vector Machines with one-vs-one or one-vs-all strategies, and
Neural Networks.

8. To evaluate performance, metrics such as accuracy, confusion matrix, and


weighted accuracy are used. This help check how well the model works, especially
when class sizes are imbalanced.

9. Multi-class problems require larger datasets compared to binary ones, as each


class needs sufficient training examples. Without this, the model may perform
poorly on less represented categories.

10. In summary, multi-class classification is an important machine learning


approach that extends classification to multiple categories. It is widely used in
fields like agriculture, music analysis, healthcare, and computer vision.

Write a note on :

1. R2 method.
2. Mean Absolute Error.
3. Root Mean Square.

R² Method (Coefficient of Determination)

 R² is a statistical measure that shows how much of the variation in the


dependent variable is explained by the independent variables in a regression
model. In simple terms, it tells us how well the model fits the data.

 A value of R² close to 1 indicates that the model explains most of the variance,
while a value close to 0 means the model explains very little. For example, if R²
= 0.85, then 85% of the variation in the target is explained by the model.

 Example: Suppose we build a regression model to predict house prices based on


size. If the R² value is 0.90, it means 90% of the changes in house prices can be
explained by house size, and only 10% is due to other factors not included in
the model.

2. Mean Absolute Error (MAE)

 MAE measures the average of the absolute differences between actual and
predicted values. It shows how far predictions are from the true values, on
average, without considering direction (positive or negative).

 It is less sensitive to outliers compared to RMSE, making it a good metric when


large errors should not dominate the evaluation.
 Example: If the actual values are [10, 15, 20] and the predicted values are [12,
14, 18], then the absolute errors are [2, 1, 2]. The MAE = (2+1+2)/3 = 1.67.
This means, on average, the model’s predictions are about 1.67 units away from
the true values.

3. Root Mean Squared Error (RMSE)

 RMSE measures the square root of the average squared differences between
actual and predicted values. Unlike MAE, it penalizes larger errors more heavily
since the errors are squared before averaging.

 It is useful when large deviations are particularly undesirable. A lower RMSE


value indicates a better model fit.

 Example: Using the same data, actual values [10, 15, 20] and predictions [12,
14, 18], squared errors are [(2)², (1)², (2)²] = [4, 1, 4]. The mean squared error =
(4+1+4)/3 = 3. Then RMSE = √3 ≈ 1.73. This tells us the model’s average error
is about 1.73 units, with higher weight given to larger mistakes.

Explain the concept of cost function and gradient descent in regression.

1. Cost Function

 In regression, the cost function is used to measure how well the regression line
fits the data points. It calculates the difference between the predicted values (ŷ)
and the actual values (y). The goal is to minimize this difference so that the
model makes accurate predictions.

 The most common cost function in regression is the Mean Squared Error
(MSE). It is calculated by squaring the difference between actual and predicted
values, summing them across all data points, and dividing by the total number
of points.

 Formula:
1 2
MSE= Σ ( y i−^y ⅈ )
n
 Example: Suppose we are predicting house prices. If the actual prices are [200,
220] and the model predicts [210, 230], then errors are [-10, -10]. Squared
errors = [100, 100]. MSE = (100+100)/2 = 100. This means on average, the
model makes squared errors of 100 units, which we want to minimize.

2. Gradient Descent

 Gradient Descent is an optimization algorithm used to minimize the cost


function by adjusting the regression coefficients (parameters like slope m and
intercept c in linear regression).

 The process starts with random values of parameters and then iteratively
updates them in the direction of the negative gradient of the cost function. This
helps the model gradually move towards the values that minimize error.
 The size of each step is controlled by a parameter called the learning rate. A
small learning rate ensures steady but slow progress, while a large one speeds
up learning but risks overshooting the minimum point.

 Example: Imagine standing on a U-shaped hill (representing the cost curve) and
trying to reach the bottom. Each step you take downhill represents an update to
the model parameters. If the steps are too small, it will take longer to reach the
bottom; if they are too large, you might overshoot and miss the lowest point.

State and explain


the role of regularization in preventing overfitting and compare Ridge &
Lasso Regression.

1. Role of Regularization in Preventing Overfitting

 Regularization is a technique used to improve a model’s ability to generalize by


preventing it from fitting noise or irrelevant details in the training data.
Overfitting occurs when a model becomes too complex, learning patterns that
only exist in the training data but fail to appear in new, unseen data.

 Regularization works by adding a penalty term to the cost function, which


discourages the model from assigning very large weights to features. This
makes the model simpler and less sensitive to random fluctuations in the
training set.

 There are two main types of regularization: hard constraints, where strict limits
are set on parameter values, and soft constraints, where penalties are applied
through modified cost functions.

 By balancing the trade-off between model accuracy on training data and model
simplicity, regularization reduces variance and ensures that the regression
model generalizes better to unseen data.

2. Comparison Between Ridge Regression and Lasso Regression.

Aspect Ridge Regression (L2) Lasso Regression (L1)


Penalty Uses L2 regularization (sum of Uses L1 regularization (sum of
Type squares of coefficients added to absolute values of coefficients
cost function). added to cost).

Effect on Shrinks coefficients towards Can shrink coefficients to exactly


Coefficients zero but never makes them zero, effectively removing some
exactly zero. features.

Feature Does not perform feature Performs automatic feature


Selection selection, all variables remain selection by eliminating
in the model. unimportant features.

Best Use Works well when most Works best when only a few
Case predictors are useful and predictors are truly important,
multicollinear. others can be discarded.

Model Simplifies model by reducing Simplifies model by reducing both


Complexity variance and stabilizing variance and irrelevant variables.
coefficients.

Explain the concept of class probability estimation.

1. Class probability estimation is an approach in classification where the model does


not simply output a label like “spam” or “not spam”. Instead, it provides the
probability of the input belonging to each possible class.

2. This means the model assigns a confidence score for its prediction. For example,
instead of saying an email is spam, it may say “there is an 80% chance this is
spam”. This gives richer information than a hard yes/no output.

3. In binary classification, only one probability is needed, such as P(positive class). For
instance, if P(spam) = 0.8, then P(not spam) = 0.2 automatically, since
probabilities sum to one.

4. In multi-class classification, the model outputs a vector of probabilities, one for


each class. For example, for music genre classification, probabilities may look like
[rock: 0.6, jazz: 0.2, classical: 0.1, pop: 0.1].

5. Since true probabilities in real data are not directly known, models estimate them
by learning from patterns in the training data. These estimates depend on how
similar new inputs are to examples seen before.
6. Two extreme approaches to probability estimation exist. In one extreme, all
instances are considered identical, so the model always predicts the overall
proportion of positives (e.g., 30% spam for every email).

7. In the other extreme, only identical instances are considered similar. In this case, if
the model has seen the same input before, it predicts with complete certainty, but
it fails to generalize for unseen inputs.

8. A practical balance is achieved using methods like decision trees, where data is
split into groups based on features. At each leaf, the probability is calculated from
the proportion of positives and negatives in that group.

9. Assessing the quality of probability estimates is done using metrics like Squared
Error (SE) or Mean Squared Error (MSE), also known as the Brier Score. These
penalize models for being overconfident or uncertain.

10. In summary, class probability estimation allows models to express how confident
they are about predictions. This makes them more useful in real-world applications
like medical diagnosis, risk analysis, or spam filtering, where knowing the degree of
certainty is just as important as the predicted label.

What is hypothesis? Explain different type of hypothesis.

 A hypothesis is a provisional explanation or assumption made about a population


or a process. It is like an educated guess that can be tested using data and
experiments.

 In machine learning and statistics, a hypothesis must be testable and falsifiable,


meaning there should be a way to prove it wrong if evidence contradicts it.

 Example: In a study about teaching methods, one might hypothesize that


“students who study with visual aids perform better than those who do not.” This
can be tested using data from student performance.

Types of Hypotheses

I. Null Hypothesis (H₀)


o The null hypothesis assumes that there is no effect, no difference, or no
relationship between variables.
o It acts as the default or baseline assumption in hypothesis testing.
o Example: “There is no difference in exam scores between students who
study at night and those who study in the morning.”
II. Alternative Hypothesis (H₁ or Ha)
o The alternative hypothesis is the statement that we want to prove. It
assumes there is an effect, difference, or relationship present.
o Rejecting the null hypothesis usually supports the alternative hypothesis.
o Example: “Students who study at night score significantly higher than
those who study in the morning.”
III. Simple Hypothesis
o A simple hypothesis makes a prediction about the relationship between
one independent variable and one dependent variable.
o Example: “Increasing study time improves test scores.”
IV. Complex Hypothesis
o A complex hypothesis involves the relationship between two or more
independent variables and dependent variables.
o Example: “Student performance depends on study time and the type of
study material used.”
V. Directional Hypothesis
o A directional hypothesis not only predicts the existence of a relationship
but also the direction of the effect.
o Example: “Students who sleep more than 7 hours perform better in exams
than those who sleep less.”
VI. Non-Directional Hypothesis
o A non-directional hypothesis predicts a relationship exists but does not
state the direction.
Explain the binary classification with suitable example? Explain how the
performance of the binary classification assessed.

1. Binary Classification

1. Binary classification is a supervised learning task where the goal is to assign


input data into one of two possible classes. The classes are usually labelled as 0
and 1, or positive and negative.

2. It is one of the most common types of classification problems, widely used in


real-world applications like spam detection, medical diagnosis, and fraud
detection.

3. Example: In email spam filtering, the model must decide whether an email is
spam (class 1) or not spam (class 0). Each incoming email is analysed, and the
classifier assigns it to one of the two categories.

4. Other examples include predicting whether a patient has a disease (yes/no),


classifying an image as a cat or dog, or determining if a bank transaction is
fraudulent or genuine.

5. Binary classifiers are built using algorithms such as Logistic Regression, Decision
Trees, Support Vector Machines, or Neural Networks, depending on the
complexity of the data and problem.

2. Assessing Performance of Binary Classification

6. The performance of a binary classifier is usually assessed using a Confusion


Matrix, which compares actual outcomes with predicted outcomes. It consists of
four key terms: True Positives (TP), False Negatives (FN), False Positives (FP), and
True Negatives (TN).

7. True Positive (TP) means the model correctly predicted the positive class (e.g.,
predicting spam when it is spam). True Negative (TN) means the model correctly
predicted the negative class (not spam when it is not spam).

8. False Positive (FP) occurs when the model incorrectly predicts a positive
outcome (e.g., classifying a genuine email as spam). False Negative (FN) occurs
when the model misses a positive case (e.g., failing to detect a spam email).

9. Based on the confusion matrix, performance metrics such as Accuracy, True


Positive Rate (Recall), True Negative Rate (Specificity), Precision, and F1-Score
are calculated. For example, Accuracy = (TP + TN) / (Total instances).

10. Example: Suppose a spam filter is tested on 100 emails. Out of 75 spam
emails, it correctly identifies 60 (TP) but misses 15 (FN). Out of 25 normal
emails, it correctly classifies 15 (TN) but wrongly marks 10 as spam (FP). Using
this, we can calculate Accuracy = (60+15)/100 = 0.75 or 75%, and other
metrics for deeper evaluation.
List and explain at least 3 error measures used to evaluate the performance
of regression model.

1. Mean Absolute Error (MAE)

 MAE measures the average of the absolute differences between actual values
and predicted values. It shows how far the predictions are from the true values
on average.

 Formula:
1
MAE = n Σ| y i− ^y i|

 Example: If actual values are [10, 15, 20] and predicted values are [12, 14, 18],
the absolute errors are [2, 1, 2]. MAE = (2+1+2)/3 = 1.67. This means the
model is off by about 1.67 units on average.

2. Root Mean Squared Error (RMSE)

 RMSE measures the square root of the average squared differences between
actual and predicted values. It penalizes larger errors more strongly because of
squaring.

 Formula:

RMSE = 1√ n Σ ( y −^y )
1 2
i i
 Example: With actual [10, 15, 20] and predicted [12, 14, 18], squared errors =
[4, 1, 4]. MSE = (4+1+4)/3 = 3. RMSE = √3 ≈ 1.73. Here, the model’s average
error is about 1.73, with larger mistakes weighted more.

3. R-Squared (R²)

 R², also called the coefficient of determination, measures how much of the
variation in the dependent variable is explained by the model. It ranges from 0
to 1.

 Formula:
SSres
R2 = 1− SStot
 Example: If R² = 0.85 in a housing price model, it means 85% of the variation in
house prices is explained by features like size or location, and only 15% is
unexplained.

In summary:

 MAE → average absolute error, less sensitive to outliers.

 RMSE → square-root of squared error, penalizes large errors more.

 R² → explains the proportion of variance captured by the model.

What is Confusion Matrix, construct one and explain how it is used to


evaluate classification performance.

 A confusion matrix (also called a contingency table) is a tool used to evaluate the
performance of a classification model.

 It compares the actual values from the dataset with the predicted values from the
model and organizes them into a table.

 Rows represent actual classes, while columns represent predicted classes.

Structure of a Confusion Matrix

Predicted Predicted
Positive Negative

Actual True Positive False Negative


Positive (TP) (FN)

Actual False Positive True Negative


Negative (FP) (TN)
 True Positive (TP): Model correctly predicts positive class.

 True Negative (TN): Model correctly predicts negative class.

 False Positive (FP): Model predicts positive when it is actually negative (Type I
error).

 False Negative (FN): Model predicts negative when it is actually positive (Type II
error).

Example of Confusion Matrix

Suppose we test a spam detection system on 100 emails:

 75 emails are spam, and 25 are not spam.

 The model predicts 60 spams correctly (TP = 60), misses 15 spam emails (FN =
15), correctly identifies 15 normal emails (TN = 15), but wrongly marks 10
normal emails as spam (FP = 10).

Predicted Predicted Not


Spam Spam

Actual Spam TP = 60 FN = 15

Actual Not FP = 10 TN = 15
Spam

How It Evaluates Performance

 Accuracy = (TP + TN) / Total = (60+15)/100 = 75%.

 True Positive Rate (Recall or Sensitivity) = TP / (TP + FN) = 60/75 = 80%.

 True Negative Rate (Specificity) = TN / (TN + FP) = 15/25 = 60%.

 Precision = TP / (TP + FP) = 60/70 ≈ 85.7%.

Explain VC Dimension in detail & discuss its impact on growth function.

1. What is VC Dimension?

 The Vapnik–Chervonenkis (VC) dimension is a measure of the capacity or


complexity of a hypothesis class (the set of functions a model can learn).

 It was introduced by Vladimir Vapnik and Alexey Chervonenkis in the 1970s and
is central to statistical learning theory.

 Formally, the VC dimension of a hypothesis class HHH, denoted as d vc(H), is the


largest number of points that can be shattered by H.

2. What is Shattering?
 A hypothesis class is said to “shatter” a set of data points if, for every possible
labelling of those points, there exists a hypothesis in the class that classifies
them correctly.

 Example: A straight line in 2D has VC dimension 3. It can shatter any


arrangement of 3 points (separating them as + or − in all 2 3 = 8 ways). But it
cannot shatter 4 points in general, since not all labelling of 4 points can be
separated by a straight line.

3. VC Dimension and Model Complexity

 A model with a higher VC dimension can represent more complex decision


boundaries and fit more complicated datasets.

 However, too high a VC dimension increases the risk of overfitting, while too low
a VC dimension may cause underfitting.

4. Growth Function

 The growth function, denoted mh(n) measures the maximum number of distinct
labelling’s (dichotomies) that the hypothesis class H can implement on n points.

 If a hypothesis class can shatter n points, then m h(n) = 2n . But if it cannot


shatter, the growth function increases more slowly.

5. Impact of VC Dimension on Growth Function

 The VC dimension determines where the growth function changes behaviour.

 If dvc(H)=d, then:

o For n ≤ dn , mH(n) = 2n (all possible labelling’s are achievable).

o For n > d, mH(n) grows more slowly and is bounded by Sauer’s Lemma:
d
m H ( n) ≤∑ n
( )
1=0 ¿ i

Explain the concept of Underfitting and Overfitting with suitable example ,


Suggest some techniques to handle them.

1. Underfitting

 Definition: Underfitting happens when a model is too simple to capture the


underlying patterns in the data. It fails to learn the relationship between input
and output, giving poor performance on both training and test data.
 Example: Trying to fit a straight line (linear regression) to data that follows a
curved pattern. The model cannot capture the curve, so predictions are
inaccurate.
 Key Sign: Low accuracy on training data as well as test data.

2. Overfitting
 Definition: Overfitting occurs when a model is too complex, learning not only the
real patterns but also the noise in the training data. It performs very well on
training data but poorly on unseen data.
 Example: Fitting a high-degree polynomial to a small dataset. The curve passes
through almost all training points but gives wrong predictions for new data.
 Key Sign: High accuracy on training data but low accuracy on test data.

3. Techniques to Handle Underfitting

i. Use more complex models: If a linear model underfits, try polynomial


regression, decision trees, or neural networks.
ii. Feature engineering: Add meaningful features or interactions between variables
so the model can capture more relationships.
iii. Reduce regularization strength: If regularization (like Ridge or Lasso) is too
strong, it may overly simplify the model. Lowering it can improve performance.

4. Techniques to Handle Overfitting

i. Regularization: Apply Ridge (L2), Lasso (L1), or Elastic Net to penalize large
coefficients and simplify the model.

ii. Cross-validation: Use techniques like k-fold cross-validation to tune model


complexity and prevent over-reliance on training data.

iii. Pruning (for trees): Remove unnecessary branches in decision trees to make the
model simpler.

iv. Early stopping (for neural nets): Stop training once validation error starts
increasing, even if training error decreases.

v. Increase training data: More data helps the model generalize better and
reduces the chance of memorizing noise.

Define Regularization in Machine Learning, How does it contribute to


generalization briefly L1 & L2 regularization with example.

1. Definition of Regularization

 Regularization in machine learning is a technique used to reduce overfitting by


adding a penalty to the model’s loss (or cost) function.

 It controls the complexity of the model by discouraging large weights, making


the model simpler and more robust.

2. Contribution to Generalization
 A model that fits training data too closely often performs poorly on new data.
This is called overfitting.

 Regularization allows the model to accept a slightly higher training error in


exchange for better performance on unseen data (generalization).

 By balancing accuracy and simplicity, regularization ensures the model does not
memorize noise but learns true patterns.

3. L1 Regularization (Lasso Regression)

 How it works: Adds the sum of absolute values of coefficients to the cost
function as a penalty.

 Effect: Shrinks some coefficients to exactly zero, automatically performing


feature selection.

 Example: Suppose we predict student performance with 10 features, but only 3


are useful. L1 regularization will set the coefficients of the 7 irrelevant features
to zero, leaving only the important ones.

4. L2 Regularization (Ridge Regression)

 How it works: Adds the sum of squared values of coefficients to the cost function
as a penalty.

 Effect: Shrinks all coefficients towards zero but does not eliminate them
completely. It distributes weights more evenly, especially when predictors are
correlated.

 Example: In house price prediction with correlated features like number of


rooms and total area, Ridge keeps both features but reduces their coefficients to
prevent one from dominating.

You might also like