KEMBAR78
Unit 3 | PDF | Machine Learning | Cross Validation (Statistics)
0% found this document useful (0 votes)
8 views30 pages

Unit 3

AIML part 3

Uploaded by

manishukale472
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views30 pages

Unit 3

AIML part 3

Uploaded by

manishukale472
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

MIT School of Computing

Department of Computer Science & Engineering

Third Year Engineering

21BTCS003-ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

Class - T.Y. (SEM-II)

Unit III: Introduction to Machine Learning

AY 2024-2025 SEM-II
MIT School of Computing
Department of Computer Science & Engineering

Syllabus :UNIT III 9 hours

Introduction to machine learning; Applications, and motivation; programming


approach vs. machine learning approach in Artificial Intelligence; components of a
learning problem (such as data, model, and error functions); process of learning
(training); testing, bias and variance error; accuracy, confusion-matrix,
precision-recall; over-fitting; under-fitting; role of cross validation; regularization;
bias-variance analysis

2
MIT School of Computing
Department of Computer Science & Engineering

Definition of Machine Learning:

• Learning is any process by which a system improves performance from experience.

• A branch of artificial intelligence, concerned with the design and development of algorithms that allow
computers to evolve behaviors based on empirical data.

• Definition by Tom Mitchell (1998): A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at tasks T, as measured by P, improves
with experience E.

3
MIT School of Computing
Department of Computer Science & Engineering

Introduction to machine learning


Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to “self-learn” from
training data and improve over time, without being explicitly programmed. Machine Learning algorithms are
able to detect patterns in data and learn from them, in order to make their own predictions

Types of Machine Learning

Supervised Learning : supervised learning

models make predictions based

on labeled training data.

4
MIT School of Computing
Department of Computer Science & Engineering

Unsupervised Learning : Unsupervised learning algorithms uncover insights and relationships in unlabeled data.

Reinforcement Learning : Reinforcement learning is concerned with how a software agent (or computer program)
ought to act in a situation to maximize the reward.

5
MIT School of Computing
Department of Computer Science & Engineering

WHY IS MACHINE LEARNING IMPORTANT?


• Data is the lifeblood of all business. Data-driven decisions increasingly make the difference between keeping up
with competition or falling further behind. Machine learning can be the key to unlocking the value of corporate
and customer data and enacting decisions that keep a company ahead of the competition.

• MACHINE LEARNING USE CASES


1. Manufacturing. Predictive maintenance and condition monitoring
2. Retail. Upselling and cross-channel marketing
3. Healthcare and life sciences. Disease identification and risk satisfaction
4. Travel and hospitality. Dynamic pricing
5. Financial services. Risk analytics and regulation
6. Energy. Energy demand and supply optimization

6
MIT School of Computing
Department of Computer Science & Engineering

• Why Machine Learning

7
MIT School of Computing
Department of Computer Science & Engineering

8
MIT School of Computing
Department of Computer Science & Engineering

• The following is a list of some of the typical applications of machine learning.


1. In the retail business, machine learning is used to study consumer behavior.
2. In finance, banks analyze their past data to build models to use in credit applications, fraud detection, and
the stock market.
3. In manufacturing, learning models are used for optimization, control, and troubleshooting.
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and maximizing the quality of
service.
6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast enough by
computers. The World Wide Web is huge; it is constantly growing and searching for relevant information
cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the system
designer need not foresee and provide solutions for all possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and robotics.
9. Machine learning methods are applied in the design of computer-controlled vehicles to steer correctly
when driving on a variety of roads.
9
MIT School of Computing
Department of Computer Science & Engineering

programming approach vs. machine learning approach in Artificial Intelligence


• Approach to Problem Solving:
• Traditional Programming: In traditional programming, a programmer writes explicit rules or instructions
for the computer to follow. These rules dictate exactly how the computer should process input data to
produce the desired output. It requires a deep understanding of the problem and a clear way to encode the
solution in a programming language.

• Machine Learning: In machine learning, instead of writing explicit rules, a programmer trains a model
using a large dataset. The model learns patterns and relationships from the data, enabling it to make
predictions or decisions without being explicitly programmed for each possibility. This approach is
particularly useful for complex problems where defining explicit rules is difficult or impossible.
• Data Dependency:
• Traditional Programming: Relies less on data. The quality of the output depends mainly on the logic
defined by the programmer.
• Machine Learning: Heavily reliant on data. The quality and quantity of the training data significantly
impact the performance and accuracy of the model.
10
MIT School of Computing
Department of Computer Science & Engineering

• Flexibility and Adaptability:


• Traditional Programming: Has limited flexibility. Changes in the problem domain require manual
updates to the code.
• Machine Learning: Offers higher adaptability to new scenarios, especially if the model is retrained with
updated data.
• Problem Complexity:
• Traditional Programming: Best suited for problems with clear, deterministic logic.
• Machine Learning: Better for dealing with complex problems where patterns and relationships are not
evident, such as image recognition, natural language processing, or predictive analytics.
• Development Process:
• Traditional Programming: The development process is generally linear and predictable, focusing on
implementing and debugging predefined logic.
• Machine Learning: Involves an iterative process where models are trained, evaluated, and fine-tuned.
This process can be less predictable and more experimental.

11
MIT School of Computing
Department of Computer Science & Engineering

components of a learning problem


The learning process, whether by a human or a machine, can be divided into four components, namely, data
storage, abstraction, generalization, and evaluation. The figure illustrates the various components and the
steps involved in the learning process.

12
MIT School of Computing
Department of Computer Science & Engineering

Machine Learning Steps

1. Collecting Data
2. Preparing the Data
3. Choosing a Model
4. Training the Model
5. Evaluating the Model
6. Parameter Tuning
7. Making Predictions

13
MIT School of Computing
Department of Computer Science & Engineering

What Is Training Data?

Training data (or a training dataset) is the initial data used to train machine learning models.
Training datasets are fed to machine learning algorithms to teach them how to make predictions or perform a
desired task.

What Is the Difference Between Training Data and Testing Data?

Training data is the initial dataset you use to teach a machine learning application to recognize patterns or
perform to your criteria, while testing or validation data is used to evaluate your model’s accuracy.

14
MIT School of Computing
Department of Computer Science & Engineering

• What is Bias?

The bias is known as the difference between the prediction of the values by the Machine Learning model and the
correct value. Being high in biasing gives a large error in training as well as testing data. It recommended that an
algorithm should always be low-biased to avoid the problem of underfitting. By high bias, the data predicted is in
a straight line format, thus not fitting accurately in the data in the data set. Such fitting is known as
the Underfitting of Data.

• What is Variance?

The variability of model prediction for a given data point which tells us the spread of our data is called the
variance of the model. The model with high variance has a very complex fit to the training data and thus is not
able to fit accurately on the data which it hasn’t seen before. As a result, such models perform very well on
training data but have high error rates on test data. When a model is high on variance, it is then said to
as Overfitting of Data.

15
MIT School of Computing
Department of Computer Science & Engineering

▪Bias: Error in training data

▪Variance: Error in test data

16
MIT School of Computing
Department of Computer Science & Engineering

• Confusion Matrix?
A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is
the total number of target classes. The matrix compares the actual target values with those predicted by the
machine learning model.

17
MIT School of Computing
Department of Computer Science & Engineering

• Important Terms in a Confusion Matrix

1. True Positive (TP) : The predicted value matches the actual value, or the predicted class matches the actual
class.
The actual value was positive, and the model predicted a positive value.
2. True Negative (TN) : The predicted value matches the actual value, or the predicted class matches the actual
class.
The actual value was negative, and the model predicted a negative value.

3. False Positive (FP) – Type I Error : The predicted value was falsely predicted.
• The actual value was negative, but the model predicted a positive value.

4. False Negative (FN) – Type II Error : The predicted value was falsely predicted.
• The actual value was positive, but the model predicted a negative value.

18
MIT School of Computing
Department of Computer Science & Engineering

What is accuracy?
Accuracy is a metric that measures how often a machine learning model correctly predicts the
outcome. You can calculate accuracy by dividing the number of correct predictions by the total
number of predictions.

19
MIT School of Computing
Department of Computer Science & Engineering

What is precision?
• Precision is a metric that measures how often a machine learning model correctly predicts the positive class.
You can calculate precision by dividing the number of correct positive predictions (true positives) by the total
number of instances the model predicted as positive (both true and false positives).

What is recall?
• Recall is a metric that measures how often a machine learning model correctly identifies positive instances
(true positives) from all the actual positive samples in the dataset. You can calculate recall by dividing the
number of true positives by the number of positive instances. The latter includes true positives (successfully
identified cases) and false negative results (missed cases).

20
MIT School of Computing
Department of Computer Science & Engineering

What is underfitting
• Underfitting occurs when a model is not able to make accurate predictions based on training data and hence,
doesn’t have the capacity to generalize well on new data. Another case of underfitting is when a model is
not able to learn enough from training data (Fig), making it difficult to capture the dominating trend (the
model is unable to create a mapping between the input and the target variable).

21
MIT School of Computing
Department of Computer Science & Engineering

What is overfitting
• A model is considered overfitting when it does extremely well on training data but fails to perform on the same
level on the validation data (like the child who memorized every math problem in the problem book and would
struggle when facing problems from anywhere else). An overfitting model fails to generalize well, as it learns
the noise and patterns of the training data to the point where it negatively impacts the performance of the
model on new data (fig).

22
MIT School of Computing
Department of Computer Science & Engineering

Reasons for Underfitting


• The model is too simple, So it may be not capable to represent the complexities in the data.
• The input features which is used to train the model is not the adequate representations of underlying factors
influencing the target variable.
• The size of the training dataset used is not enough.
• Excessive regularization are used to prevent the overfitting, which constraint the model to capture the data
well.
• Features are not scaled.
Reasons for Overfitting:
• High variance and low bias.
• The model is too complex.
• The size of the training data.

23
MIT School of Computing
Department of Computer Science & Engineering

How to avoid Overfitting and Underfitting?


• To solve the issue of overfitting and underfitting, it is important to choose an appropriate model for the
given dataset.
• Hyper-performance tuning can also be performed.
• For overfitting reducing the model complexity can help similarly for underfitting the model complexity can
be increased.
• As overfitting is caused due to too many features in the dataset and underfitting is caused by too few
features so during feature engineering the numbers of features can be decreased and increased to avoid
overfitting and underfitting respectively.

24
MIT School of Computing
Department of Computer Science & Engineering

What is Cross-Validation?
• Cross validation is a technique used in machine learning to evaluate the performance of a model on unseen
data. It involves dividing the available data into multiple folds or subsets, using one of these folds as a
validation set, and training the model on the remaining folds. This process is repeated multiple times, each
time using a different fold as the validation set. Finally, the results from each validation step are averaged to
produce a more robust estimate of the model’s performance.
What is cross-validation used for?
• The main purpose of cross validation is to prevent overfitting, which occurs when a model is trained too well
on the training data and performs poorly on new, unseen data. By evaluating the model on multiple validation
sets, cross validation provides a more realistic estimate of the model’s generalization performance, i.e., its
ability to perform well on new, unseen data.
Types of Cross-Validation
• There are several types of cross validation techniques, including k-fold cross validation, leave-one-out cross
validation, and Holdout validation, Stratified Cross-Validation. The choice of technique depends on the
size and nature of the data, as well as the specific requirements of the modeling problem.

25
MIT School of Computing
Department of Computer Science & Engineering

What is Regularization in Machine Learning?


• Regularization refers to techniques that are used to calibrate machine learning models in order
to minimize the adjusted loss function and prevent overfitting or underfitting.

26
MIT School of Computing
Department of Computer Science & Engineering

• Regularization Techniques
1. Ridge Regularization :
Also known as Ridge Regression, it modifies the over-fitted or under fitted models by adding the penalty
equivalent to the sum of the squares of the magnitude of coefficients.
This means that the mathematical function representing our machine learning model is minimized and
coefficients are calculated. The magnitude of coefficients is squared and added. Ridge Regression performs
regularization by shrinking the coefficients present. The function depicted below shows the cost function of
ridge regression :

27
MIT School of Computing
Department of Computer Science & Engineering

2. Lasso Regression:
• It modifies the over-fitted or under-fitted models by adding the penalty equivalent to the sum of the absolute
values of coefficients.
• Lasso regression also performs coefficient minimization, but instead of squaring the magnitudes of the
coefficients, it takes the true values of coefficients. This means that the coefficient sum can also be 0, because
of the presence of negative coefficients. Consider the cost function for Lasso regression :

28
MIT School of Computing
Department of Computer Science & Engineering

29
MIT School of Computing
Department of Computer Science & Engineering

30

You might also like