0% found this document useful (0 votes)

19 views54 pages

Module 1

The document provides an introduction to machine learning, defining it as a field that enables computers to learn from data without explicit programming. It discusses the increasing need for machine learning due to its ability to handle complex tasks and highlights various applications, advantages, and disadvantages of machine learning techniques. Additionally, it covers different types of machine learning, including supervised and unsupervised learning, along with key concepts such as training and testing datasets, overfitting, and performance measures.

Uploaded by

kush tejani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views54 pages

Module 1

Uploaded by

kush tejani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Introduction to Machine learning

Module 1
What is Machine Learning?
● “ A Field of study that gives computers the ability to learn without being
explicitly programmed”
-Arthur Samuel (1959)

● Machine learning is an example of AI that algorithms and data, that

automatically analyse and make decisions by itself without human human
interventions.
● It performs how computer performs tasks on their own by previous
experiences.
● So, in machine language AI is generated on the basis of experience.
● The difference between normal computer software and machine learning is
that a normal human developer has not given codes that instructs the system
how to react to situation, instead it is being trained by large number of data.
What is Machine Learning?
Need for Machine Learning

● The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly.
● As a human, we have some limitations as we cannot access the huge amount of data
manually, so for this, we need some computer systems and here comes the machine
learning to make things easy for us.
● We can train machine learning algorithms by providing them the huge amount of data and
let them explore the data, construct the models, and predict the required output
automatically.
● The performance of the machine learning algorithm depends on the amount of data, and it
can be determined by the cost function. With the help of machine learning, we can save both
time and money.
Applications of Machine learning
● Virtual Personal Assistant
● Speech recognition
● Email spam and malware filtering
● Bioinformatics
● Natural language processing
● Real Time Examples
● Traffic prediction
● Virtual Personal Assistant
● Online Transportation
● Social Media Services
● Email spam filtering
● Product Recommendation
● Online Fraud detection
● Fast, Accurate, Efficient.
●Advantages of ML
Automation of most applications.
● Wide range of real life applications.
● Enhanced cyber security and spam detection.
● No human Intervention is needed.
● Handling multi dimensional data.
Disadvantages of ML
● It is very difficult to identify and rectify the errors.
● Data Acquisition.
● Interpretation of results Requires more time and space.
Difference Between Machine Learning And Artificial Intelligence

● Artificial Intelligence is a concept of creating intelligent machines that stimulates human

behaviour whereas Machine learning is a subset of Artificial intelligence that allows machine to
learn from data without being programmed.

Difference between Signal Processing and Machine learning

● Signal processing is a branch of electrical engineering used to model and analyse analog
and digital data representations of physical events. All the technology we use today and
even rely on in our everyday lives (computers, radios, videos, mobile phones) is enabled
by signal processing. Hence, it truly represents the science behind our digital lives.
● Machine learning is the study of computer algorithms that learn to do prediction and/or
classification based on just a set of collected data.
steps involved in developing a machine learning application
Types of Machine learning
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
Supervised Learning
● Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it predicts
the output.
● The system creates a model using labeled data to understand the datasets and learn about
each data, once the training and processing are done then we test the model by providing a
sample data to check whether it is predicting the exact output or not.
● As input data is fed into the model, it adjusts its weights until the model has been
fitted appropriately, which occurs as part of the cross validation process. Supervised
learning helps organizations solve for a variety of real-world problems at scale, such
as classifying spam in a separate folder from your inbox.
Supervised Machine Learning
Steps Involved in Supervised Learning:

● First Determine the type of training dataset

● Collect/Gather the labelled training data.
● Split the training dataset into training dataset, test dataset, and validation dataset.
● Determine the input features of the training dataset, which should have enough knowledge so that
the model can accurately predict the output.
● Determine the suitable algorithm for the model, such as support vector machine, decision tree, etc.
● Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
● Evaluate the accuracy of the model by providing the test set. If the model predicts the correct
output, which means our model is accurate.
How supervised learning works
● Supervised learning uses a training set to teach models to
yield the desired output. This training dataset includes
inputs and correct outputs, which allow the model to learn
over time. The algorithm measures its accuracy through the
loss function, adjusting until the error has been sufficiently
minimized.
● Supervised learning can be separated into two types of
problems when data mining—classification and regression:
● Classification uses an algorithm to accurately assign test
data into specific categories. It recognizes specific entities
within the dataset and attempts to draw some conclusions
on how those entities should be labeled or defined.
● Common classification algorithms are linear classifiers,
support vector machines (SVM), decision trees, k-nearest
neighbor, and random forest,
● Regression algorithms are used if there is a relationship between the input variable and the output
variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc. Some algorithms are, Linear Regression, Regression Trees, Non-Linear Regression
Bayesian Linear Regression, Polynomial Regression

● Classification algorithms are used when the output variable is categorical, which means there are
two classes such as Yes-No, Male-Female, True-false, etc. e.g. Spam Filtering,

Random Forest, Decision Trees, Logistic Regression, Support vector Machines

● Disadvantages of Supervised Learning:

● Supervised learning models are not suitable for handling the complex tasks.
● Supervised learning cannot predict the correct output if the test data is different from the training
dataset.
● Training required lots of computation times.
● In supervised learning, we need enough knowledge about the classes of object.
Unsupervised Learning
● As the name suggests, unsupervised learning is a
machine learning technique in which models are not
supervised using training dataset. Instead, models itself
find the hidden patterns and insights from the given data.
● The goal of unsupervised learning is to find the
underlying structure of dataset, group that data
according to similarities, and represent that dataset in a
compressed format.
● The task of the unsupervised learning algorithm is to
identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the
image dataset into the groups according to similarities
between images.
Unsupervised Learning
Clustering
Clustering: Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no similarities with
the objects of another group.

The clustering technique can be widely used in various tasks. Some most common uses of this technique are:

● Market Segmentation
● Statistical data analysis
● Social network analysis
● Image segmentation

It is used by the Amazon in its recommendation system to provide the recommendations as per the past search
of products. Netflix also uses this technique to recommend the movies and web-series to its users as per the
watch history.
Unsupervised Learning

● Association: An association rule is an unsupervised learning method which is used for

finding the relationships between variables in the large database. It determines the set of
items that occurs together in the dataset. Association rule makes marketing strategy more
effective. Such as people who buy X item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. A typical example of Association rule is Market Basket Analysis.
● Example: In Search Engines: Search engines also work on the clustering technique. The search
result appears based on the closest object to the search query. It does it by grouping similar data
objects in one group that is far from the other dissimilar objects. The accurate result of a query
depends on the quality of the clustering algorithm used.
Some popular unsupervised learning algorithms:

● K-means clustering
● KNN (k-nearest neighbors)
● Hierarchal clustering
● Anomaly detection
● Neural Networks
● Principle Component Analysis
● Independent Component Analysis
● Apriori algorithm
● Singular value decomposition
Advantages of Unsupervised Learning

● Unsupervised learning is used for more complex tasks as compared to supervised

learning because, in unsupervised learning, we don't have labeled input data.
● Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to
labeled data.

Disadvantages of Unsupervised Learning

● Unsupervised learning is intrinsically more difficult than supervised learning as it

does not have corresponding output.
● The result of the unsupervised learning algorithm might be less accurate as input
data is not labeled, and algorithms do not know the exact output in advance.
Supervised Vs Unsupervised Learning
Training and Testing Data Set

● In machine learning projects, we generally divide the original dataset into

training data and test data.
● We train our model over a subset of the original dataset, i.e., the training
dataset, and then evaluate whether it can generalize well to the new or unseen
dataset or test set.
Therefore, train and test datasets are the two key concepts of machine learning,
where the training dataset is used to fit the model, and the test dataset is used to
evaluate the model.

What is Training Dataset?

The training data is the biggest (in -size) subset of the original dataset, which is
used to train or fit the machine learning model. Firstly, the training data is fed to the
ML algorithms, which lets them learn how to make predictions for the given task.
What is testing Data Sets
● For Unsupervised learning, the training data contains unlabeled data points, i.e., inputs
are not tagged with the corresponding outputs. Models are required to find the patterns
from the given training datasets in order to make predictions.
● On the other hand, for supervised learning, the training data contains labels in order to
train the model and make predictions.
● The type of training data that we provide to the model is highly responsible for the
model's accuracy and prediction ability. It means that the better the quality of the
training data, the better will be the performance of the model.
● Training data is approximately more than or equal to 60% of the total data for an ML
project.
● The test dataset is another subset of original data, which is independent of the
training dataset. Usually, the test dataset is approximately 20-25% of the total original data
for an ML project.
Overfitting and Underfitting
● At this stage, we can also check and compare the testing accuracy with the training
accuracy, which means how accurate our model is with the test dataset against the
training dataset.
● If the accuracy of the model on training data is greater than that on testing data, then
the model is said to have overfitting.
● On the other hand, the model is said to be under-fitted when it is not able to capture the
underlying trend of the data.
● It means the model shows poor performance even with the training dataset.
● In most cases, underfitting issues occur when the model is not perfectly suitable for
the problem that we are trying to solve.
● To avoid the overfitting issue, we can either increase the training time of the model or
increase the number of features in the dataset.
How to detect underfitting and overfitting
Cross-Validation
● There are various ways by which we can avoid overfitting in the
model, such as Using the Cross-Validation method, early stopping
the training, or by regularization etc.
● Cross-validation is a technique for validating the model efficiency by
training it on the subset of input data and testing on previously
unseen subset of the input data.
● We can also say that it is a technique to check how a statistical model
generalizes to an independent dataset.
Hence the basic steps of cross-validations are:

● Reserve a subset of the dataset as a validation set.

● Provide the training to the model using the training dataset.
● Now, evaluate model performance using the validation set. If the model
performs well with the validation set, perform the further step, else
check for the issues.
Methods of cross validation
There are some common methods that are used for cross-validation. These methods are
given below:
1. Leave one out cross-validation
2. K-fold cross-validation
3. Stratified k-fold cross-validation
4. Time series Cross Validation
● Cross-Validation dataset: It is used to overcome the disadvantage of train/test split by
splitting the dataset into groups of train/test splits, and averaging the result.
● It can be used if we want to optimize our model that has been trained on the training
dataset for the best performance.
● It is more efficient as compared to train/test split as every observation is used for the
training and testing both.
Hypothesis Testing in Machine learning
● To trust your model and make
predictions, we utilize hypothesis
testing. When we will use sample data
to train our model, we make
assumptions about our population. By
performing hypothesis testing, we
validate these assumptions for a
desired significance level.
● ML professionals and data scientists make an initial assumption for the solution of
the problem.This assumption in Machine learning is known as Hypothesis.
● The hypothesis is defined as the supposition or proposed explanation based on
insufficient evidence or assumptions. It is just a guess based on some known facts
but has not yet been proven.
● A Hypothesis is an assumption made by scientists, whereas a model is a
mathematical representation that is used to test the hypothesis.
● Hypothesis space (H):
● Hypothesis space is defined as a set of all possible legal hypotheses; hence it is
also known as a hypothesis set. It is used by supervised machine learning algorithms to
determine the best possible hypothesis to describe the target function or best maps input
to output.
● Hypothesis (h):
● It is defined as the approximate function that best describes the target in supervised
machine learning algorithms. It is primarily based on data as well as bias and restrictions
applied to data.
Key steps to perform hypothesis test are as follows:

1. Formulate a Hypothesis
2. Determine the significance level
3. Determine the type of test
4. Calculate the Test Statistic values and the p values
5. P value is a probability (between 0-1) with the assumption that null hypothesis is
true.
6. Make Decision
1. https://www.youtube.com/watch?v=gP3tFs2mArw
Performance Measures
1. Performance Metrics for Classification

In machine learning, each task or problem is divided into classification and Regression.

Different evaluation metrics are used for both Regression and Classification tasks.

● Accuracy
● Confusion Matrix
● Precision
● Recall
● F-Score
● AUC(Area Under the Curve)-ROC
1. ACCURACY:

2. CONFUSION MATRIX: A confusion matrix is a tabular representation of prediction

outcomes of any binary classifier, which is used to describe the performance of the
classification model on a set of test data when true values are known.
● from sklearn.metrics import accuracy_score

● y_true = [1, 0, 1, 1, 0, 1]
● y_pred = [1, 0, 1, 0, 1, 1]

● accuracy = accuracy_score(y_true, y_pred)

● print(f'Accuracy: {accuracy}')

If a model correctly predicts 90 out of 100 instances, the accuracy is 90%.

● Precision:

● Definition: Precision is the ratio of correctly predicted positive observations to

the total predicted positives.
● Example: If a model predicted 20 instances as positive and 18 of them were
actually positive, the precision is 18/20.

from sklearn.metrics import precision_score

precision = precision_score(y_true, y_pred)

print(f'Precision: {precision}')
● Recall (Sensitivity or True Positive Rate):

● Definition: Recall is the ratio of correctly predicted positive observations to the

all observations in the actual class.
● Example: If there were a total of 30 actual positive instances, and the model
predicted 25 of them correctly, the recall is 25/30.

from sklearn.metrics import recall_score

recall = recall_score(y_true, y_pred)

print(f'Recall: {recall}')
● F1 Score:

● Definition: F1 Score is the weighted average of precision and recall. It ranges

from 0 to 1, where 1 is the best possible F1 Score.
● Example: If a model has precision of 0.8 and recall of 0.7, the F1 Score is 2 *
(0.8 * 0.7) / (0.8 + 0.7) = 0.74.

from sklearn.metrics import f1_score

f1 = f1_score(y_true, y_pred)
print(f'F1 Score: {f1}')
1. True Positive(TP): In this case, the prediction outcome is true, and it is true in reality, also.
2. True Negative(TN): in this case, the prediction outcome is false, and it is false in reality, also.
3. False Positive(FP): In this case, prediction outcomes are true, but they are false in actuality.
4. False Negative(FN): In this case, predictions are false, and they are true in actuality.

III. Precision

The precision determines the proportion of positive prediction that was actually correct. It can be
calculated as the True Positive or predictions that are actually true to the total positive predictions (True
Positive and False Positive).

IV. Recall or Sensitivity

It can be calculated as True Positive or predictions that are actually true to the total number of positives,
either correctly predicted as positive or incorrectly predicted as negative (true Positive and false negative).
When to use Precision and Recall?

From the above definitions of Precision and Recall, we can say that recall determines the
performance of a classifier with respect to a false negative, whereas precision gives
information about the performance of a classifier with respect to a false positive.

So, if we want to minimize the false negative, then, Recall should be as near to 100%, and if
we want to minimize the false positive, then precision should be close to 100% as possible.

In simple words, if we maximize precision, it will minimize the FP errors, and if we maximize
recall, it will minimize the FN error.
V. F-Scores
● F-score or F1 Score is a metric to evaluate a binary classification model on the basis of
predictions that are made for the positive class.
● It is calculated with the help of Precision and Recall.
● It is a type of single score that represents both Precision and Recall.
● So, the F1 Score can be calculated as the harmonic mean of both precision and Recall,
assigning equal weight to each of them.

● When to use F-Score?

● As F-score make use of both precision and recall, so it should be used if both of them are
important for evaluation, but one (precision or recall) is slightly more important to consider
than the other. For example, when False negatives are comparatively more important than
false positives, or vice versa.
● To calculate value at any
.. point in a ROC curve, we can evaluate a logistic regression
model multiple times with different classification thresholds, but this would not be
much efficient.
● So, for this, one efficient method is used, which is known as AUC.

● AUC: Area Under the ROC curve

AUC is known for Area Under the ROC curve. As its name suggests, AUC
calculates the two-dimensional area under the entire ROC curve, as shown below image:
ROC example:
● True Positives = Radar Operator interpreted signal as Enemy Planes and there were
Enemy planes (Good Result: No wasted Resources)
● True Negatives = Radar Operator said no planes and there were none (Good Result:
No wasted resources)
● False Positives = Radar Operator said planes, but there were none (Geese: wasted
resources)
● False Negatives = Radar Operator said no plane, but there were planes (Bombs
dropped: very bad outcome)
● Sensitivity = Probability of correctly interpreting the radar signal as Enemy planes
among those times when Enemy planes were actually coming
○ SE = True Positives / True Positives + False
Negatives
● Specificity = Probability of correctly interpreting the radar signal as no Enemy
planes among those times when no Enemy planes were actually coming
○ SP = True Negatives / True Negatives +
False Positives
Performance Metrics for Regression
● Regression is a supervised learning technique that aims to find the relationships
between the dependent and independent variables.
● A predictive regression model predicts a numeric or discrete value.
● The performance of a Regression model is reported as errors in the prediction.
Following are the popular metrics that are used to evaluate the performance of
Regression models.
• Mean Absolute Error.
• Mean Squared Error.
• R2 Score
• Adjusted R2.
R2 score
● The R2 score, or coefficient of determination, is a statistical measure that evaluates how well a
regression model's predictions align with actual data.
● REFERENCES:
● https://www.javatpoint.com/
● https://www.analyticsvidhya.com/blog/
● Introduction to Machine Learning by Ethem Alpaydin

Meta Motion Fitness Tracker 241109 213742 (1) Removed
No ratings yet
Meta Motion Fitness Tracker 241109 213742 (1) Removed
20 pages
Machine Learning Classification, Regression and Clustering
No ratings yet
Machine Learning Classification, Regression and Clustering
77 pages
Ml-Unit 1
No ratings yet
Ml-Unit 1
53 pages
Machine Learning Unit-I
No ratings yet
Machine Learning Unit-I
41 pages
AIMLINTRO
No ratings yet
AIMLINTRO
12 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Machine L
No ratings yet
Machine L
29 pages
Class Notes
No ratings yet
Class Notes
29 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
Machine Learning - Part - 1
No ratings yet
Machine Learning - Part - 1
17 pages
Ai Unit 4
No ratings yet
Ai Unit 4
32 pages
Unit 1
No ratings yet
Unit 1
52 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Intro To ML and Its Type, Applications
No ratings yet
Intro To ML and Its Type, Applications
8 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
225 pages
CH 1
No ratings yet
CH 1
34 pages
Ai Unit 4
No ratings yet
Ai Unit 4
34 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
78 pages
Intro to Machine Learning Basics
100% (3)
Intro to Machine Learning Basics
24 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
29 pages
Unit 3-Introduction To Machine Learning
No ratings yet
Unit 3-Introduction To Machine Learning
44 pages
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
No ratings yet
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
27 pages
AI (Part II)
No ratings yet
AI (Part II)
11 pages
UNIT 1ML Removed Removed
No ratings yet
UNIT 1ML Removed Removed
123 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Unit I
No ratings yet
Unit I
8 pages
Unit V
No ratings yet
Unit V
67 pages
Unit-1 Part-1 Material
No ratings yet
Unit-1 Part-1 Material
45 pages
Unit 1
No ratings yet
Unit 1
24 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
6 pages
Machine Learning
No ratings yet
Machine Learning
43 pages
Machine Learning Techniques-Bcds062!01!01
No ratings yet
Machine Learning Techniques-Bcds062!01!01
66 pages
Big-Data Unit-3
100% (1)
Big-Data Unit-3
54 pages
Unit5 ML Introduction
No ratings yet
Unit5 ML Introduction
32 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
18 pages
Machine Learning - Its Types
No ratings yet
Machine Learning - Its Types
8 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Learning Algorithms
No ratings yet
Learning Algorithms
28 pages
ML Unit1
No ratings yet
ML Unit1
31 pages
UNIT4
No ratings yet
UNIT4
12 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
14 pages
Business Data Mining Week 5
No ratings yet
Business Data Mining Week 5
19 pages
HPCL CBT 2025 - AI and ML Session 02 Handout by Ankush Gupta (ACE)
No ratings yet
HPCL CBT 2025 - AI and ML Session 02 Handout by Ankush Gupta (ACE)
19 pages
Machine Learning (MCA)
No ratings yet
Machine Learning (MCA)
5 pages
Machine Learning Is The Branch of
No ratings yet
Machine Learning Is The Branch of
12 pages
Module 1 - Intro To ML - V2
No ratings yet
Module 1 - Intro To ML - V2
47 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Module 1
No ratings yet
Module 1
28 pages
Ai Faheem
No ratings yet
Ai Faheem
16 pages
Unit 1
No ratings yet
Unit 1
19 pages
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d B39decf3b0fc
No ratings yet
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d B39decf3b0fc
19 pages
Machine Learning Lab Viva
100% (1)
Machine Learning Lab Viva
9 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
138 pages
Report
No ratings yet
Report
27 pages
ML@Chapter 1
No ratings yet
ML@Chapter 1
29 pages
R-22 Open Electives
No ratings yet
R-22 Open Electives
117 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
EI 2016 Art00013 Jiaze-Chen
No ratings yet
EI 2016 Art00013 Jiaze-Chen
8 pages
SVM Handout
No ratings yet
SVM Handout
9 pages
Techniques For Examining Student Data For Indicators of Future Success - A Survey and Analysis
No ratings yet
Techniques For Examining Student Data For Indicators of Future Success - A Survey and Analysis
8 pages
Ability Convolutional Feature Extraction For Chili Leaf Disease Using Support Vector Machine Classification
No ratings yet
Ability Convolutional Feature Extraction For Chili Leaf Disease Using Support Vector Machine Classification
8 pages
Malicious PDF Detection with XAI
No ratings yet
Malicious PDF Detection with XAI
20 pages
Multiclass Classification: Kashif Murtaza: PUCIT 1
No ratings yet
Multiclass Classification: Kashif Murtaza: PUCIT 1
48 pages
2019-Application of Machine Learning Approach For Intelligent Prediction of PipeSticking
No ratings yet
2019-Application of Machine Learning Approach For Intelligent Prediction of PipeSticking
14 pages
Emergent System Using Tweet Analyzer Naturally Inspired Computing Approach
No ratings yet
Emergent System Using Tweet Analyzer Naturally Inspired Computing Approach
4 pages
2024 Wetland-LULC
No ratings yet
2024 Wetland-LULC
14 pages
Question Bank For ML
No ratings yet
Question Bank For ML
3 pages
IoT Based Accident Prevention System Using Machine Learning Techniques
No ratings yet
IoT Based Accident Prevention System Using Machine Learning Techniques
30 pages
42-51CFDMLReview Updated1
No ratings yet
42-51CFDMLReview Updated1
11 pages
AI-Enhanced Perovskite Solar Cells
No ratings yet
AI-Enhanced Perovskite Solar Cells
10 pages
Sample Format Project Report
No ratings yet
Sample Format Project Report
3 pages
BT4241 RP
No ratings yet
BT4241 RP
8 pages
Course Code Course Title Course Planner: Through This Course Students Should Be Able To
No ratings yet
Course Code Course Title Course Planner: Through This Course Students Should Be Able To
4 pages
Pedestrian and Vehicle Behaviour Prediction in Autonomous Vehicle System - A Review
No ratings yet
Pedestrian and Vehicle Behaviour Prediction in Autonomous Vehicle System - A Review
29 pages
Current CVHaas
No ratings yet
Current CVHaas
70 pages
Final Year Project Work - Nishanth
No ratings yet
Final Year Project Work - Nishanth
69 pages
Sentiment Analysis For Social Media
No ratings yet
Sentiment Analysis For Social Media
26 pages
Emotionbasedmusicplayer 220821180153 9d9b8989
No ratings yet
Emotionbasedmusicplayer 220821180153 9d9b8989
17 pages
Spe 196096 Ms
No ratings yet
Spe 196096 Ms
21 pages
Sentiment Analysis: Approaches and Open Issues: Shahnawaz Parmanand Astya
No ratings yet
Sentiment Analysis: Approaches and Open Issues: Shahnawaz Parmanand Astya
5 pages
An Incremental Voltage Difference Based Technique For Online State of Health Estimation of Li-Ion Batteries
No ratings yet
An Incremental Voltage Difference Based Technique For Online State of Health Estimation of Li-Ion Batteries
11 pages
Evaluating The Effectiveness of Machine Learning Methods For
No ratings yet
Evaluating The Effectiveness of Machine Learning Methods For
8 pages
ML Viva Questions
100% (1)
ML Viva Questions
4 pages
Naïve Bayes for Text Classification
No ratings yet
Naïve Bayes for Text Classification
33 pages
Symptom-Based Disease Prediction A Machine Learnin
No ratings yet
Symptom-Based Disease Prediction A Machine Learnin
10 pages

Module 1

Uploaded by

Module 1

Uploaded by

Introduction to Machine learning

● Machine learning is an example of AI that algorithms and data, that

● Artificial Intelligence is a concept of creating intelligent machines that stimulates human

Difference between Signal Processing and Machine learning

● First Determine the type of training dataset

Random Forest, Decision Trees, Logistic Regression, Support vector Machines

● Disadvantages of Supervised Learning:

● Association: An association rule is an unsupervised learning method which is used for

● Unsupervised learning is used for more complex tasks as compared to supervised

Disadvantages of Unsupervised Learning

● Unsupervised learning is intrinsically more difficult than supervised learning as it

● In machine learning projects, we generally divide the original dataset into

What is Training Dataset?

● Reserve a subset of the dataset as a validation set.

2. CONFUSION MATRIX: A confusion matrix is a tabular representation of prediction

● accuracy = accuracy_score(y_true, y_pred)

If a model correctly predicts 90 out of 100 instances, the accuracy is 90%.

● Definition: Precision is the ratio of correctly predicted positive observations to

from sklearn.metrics import precision_score

precision = precision_score(y_true, y_pred)

● Definition: Recall is the ratio of correctly predicted positive observations to the

from sklearn.metrics import recall_score

recall = recall_score(y_true, y_pred)

● Definition: F1 Score is the weighted average of precision and recall. It ranges

from sklearn.metrics import f1_score

IV. Recall or Sensitivity

● When to use F-Score?

● AUC: Area Under the ROC curve

You might also like