0% found this document useful (0 votes)

31 views316 pages

Introduction To Machine Learning

The document provides an introduction to machine learning, covering mathematical basics, supervised and unsupervised learning, reinforcement learning, and data handling techniques. It discusses various machine learning models, evaluation metrics, and the importance of data quality, including how to manage incomplete and noisy data. Additionally, it outlines the course structure, references, and opportunities in the field of machine learning.

Uploaded by

kanishqchezian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views316 pages

Introduction To Machine Learning

Uploaded by

kanishqchezian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 316

Introduction to Machine

Learning
Dr. R. Rekha
Associate Professor
Department of IT,
PSG College of Technology,
Coimbatore
rekha.psgtech@gmail.com
rra.it@psgtech.ac.in
Mobile: 9842163683
21NN03 APPLIED MACHINE LEARNING
3003

MATHEMATICAL BASICS AND LEARNING SYSTEM: Definition of learning systems, Goals and applications of machine learning, Probability theory, Statistical
decision theory, Learning versus design, Feasibility of learning, Training versus testing, Labeled versus unlabeled dataset, Error, Noise, Theory of
generalization, Hypothesis class, Vapnik-Chervonenkis (VC) dimension, Bias, Variance, Learning curve, Model selection, Under-fitting and over-fitting, Cross
validation, Concept representation, Function approximation.
(12)

SUPERVISED LEARNING: Learning a class from examples, Learning multiple classes, Dimensions of a supervised machine learning algorithm, Discriminant
functions, Probabilistic generative models, Probabilistic discriminative models, Logistic regression, Linear regression, Perceptron Learning Algorithm.

(12)

UNSUPERVISED AND ENSEMBLE METHODS: Clustering, Expectation maximization (EM) for soft clustering, Semi-supervised learning with EM using
labeled and unlabeled data, Ensemble learning: boosting, bagging, Sampling: Basic sampling methods - Markov Chain Monte Carlo.
(10)

REINFORCEMENT LEARNING: Model free reinforcement learning: Q Learning, Algorithm for learning Q, Convergence, Updating sequences strategies,
Model based learning: Value iteration - Policy iteration, K-Armed bandit - Elements.

(11)

Total L: 45
REFERENCES:

1. Tom Mitchell, “Machine Learning”, McGraw Hill, USA, 2017.

2. Christopher Bishop, “Pattern Recognition and Machine Learning”, Springer, USA, 2011.
3. Suresh Samudrala, "Machine Intelligence: Demystifying Machine Learning, Neural Networks
and Deep Learning", Notion Press, New Delhi, 2019
4. Abu Mostafa Y S, MagdonIsmail M and Lin H T, “Learning from Data”, AML Book Publishers,
USA, 2012.
5. EthemAlpaydm, “Introduction to Machine Learning”, 3rd Edition, PHI Learning Private, USA,
2015.
6. Kevin P. Murphy, “Machine Learning: A Probabilistic Perspective”, MIT Press, USA, 2012.
Theory Courses with no Tutorial Component
(CA: 50% + FE: 50%)
• CA Distribution:
(i) Assignment Presentation 10 Marks
(ii) Objective Tests I (Surprise type) 05 Marks
(iii)Objective Tests II (Surprise type) 05 Marks
(iv) Internal Tests (Average of 2): 30 Marks
• Test I ( conducted for 50 marks) 30 Marks
• Test II ( conducted for 50 marks) 30 Marks
• Final Examination (FE) 50 Marks
Google classroom code

• q246tgu
• https://classroom.google.com/c/NTU2MDkwNzI2NDM5?cjc=q246tgu
Enormous Opportunities for
Machine Learning
• Virtual Personal Assistants. Siri, Alexa, Google Now
• Predictions while Commuting. ...
• Videos Surveillance. ...
• Social Media Services. ...
• Email Spam and Malware Filtering. ...
• Online Customer Support. ...
• Search Engine Result Refining. ...
• Product Recommendations
A Bit History
What is Machine Learning?
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.
How do machines learn?

We don’t want to code the logic for our program instead we want a
machine to figure out logic from the data on its own.
Payroll of employees in the company:

To predict the salary for a person with 8 years and 16 years of experience?

if (experience < = 10)

{ salary = experience * 1.5 * 100000}
else if(experience >10)
{ salary = experience * 2 * 100000}
We don’t do this here! We don’t write if, else in ML as we are not focused on
writing Algorithms.
• Machines - find relation between experience, job level, rare skill and
Salary

The factor of 1.5 or 2 which was calculated in previous example is called weight!

The columns in yellow are called features and the column in red is called label.

So ML calculate weights of features that contribute in deciding the label

based on the Algorithm we use!

Salary = Experience * Weight_1 + JobLevel * Weight_2 + Skill * Weight_3

How does Machine Learning work?
Machine Learning ≈ Looking for a
Function
Taxonomy of Machine Learning
STAGES OF MACHINE LEARNING
• Gathering data
• Kaggle and UCI Machine learning Repository

• Data pre-processing
• 80/20 rule
• Missing data, Noisy data, Inconsistent data
• Numeric, Categorical, Ordinal
• Conversion of data, Ignoring the missing values,
Filling the missing values, Outliers detection

• Researching the model that will be best for the type

of data – Supervised, Unsupervised
• Training and testing the model
– ‘Training data’ ,‘Validation data’ and ‘Testing data’.
• Evaluation – Confusion matrix, Accuracy

True positives : These are cases in which we predicted TRUE and our
predicted output is correct.
True negatives : We predicted FALSE and our predicted output is correct.
False positives : We predicted TRUE, but the actual predicted output is
FALSE.
False negatives : We predicted FALSE, but the actual predicted output is
TRUE.
Machine Learning Framework
Basic mathematics for machine learning
Sources of data:
What is data?
• Data is a collection of facts, such as numbers, words,
measurements, observations or just descriptions of things.
Why Is Data Dirty?

• Incomplete data may come from

• “Not applicable” data value when collected
• Different considerations between the time when the data was collected and when it is analyzed.
• e.g., occupation=“ ”

• Noisy data (incorrect values) may come from

• Faulty data collection instruments
• Human or computer error at data entry
• Errors in data transmission
• e.g., Salary=“-10”

• Inconsistent data may come from

• Different data sources
• e.g., Was rating “1,2,3”, now rating “A, B, C”
How to Handle Incomplete data ?
• Ignore the tuple:
• usually done when class label is missing
• Not effective when the percentage of missing values per attribute varies considerably.
• Fill in the missing value manually:
• tedious + infeasible
• Fill in it automatically with
• a global constant : e.g., “unknown”, a new class?!
• the attribute mean
• the most probable value:
• inference-based such as Bayesian formula or decision tree
(e.g., predict age based on the info)
How to Handle Noisy Data?

• Binning
• first sort data and partition into (equal-frequency) bins
• then one can smooth by bin means, smooth by bin median,
smooth by bin boundaries, etc.
• Regression
• smooth by fitting the data into regression functions
• Clustering
• detect and remove outliers
Data Transformation
• Normalization:
• scaled to fall within a small, specified range
• min-max normalization
• z-score normalization
• normalization by decimal scaling
Min max normalization
Accuracy is a statistical measure which is
defined as the quotient of correct predictions
made by a classifier divided by the sum of
predictions made by the classifier.

The classifier in our example correctly

predicted 42 male instances and 32 female
instance.

Therefore, the accuracy can be calculated

by:

accuracy = (42 + 32) / (42 + 8 + 18 + 32)

which is 0.72
Let a spam recognition classifier is described by the following confusion matrix:
Accuracy: (TN + TP) / (TN + TP + FN + FP)

Precision is the ratio of the correctly identified positive cases to all the predicted positive cases
Precision: TP/ (TP + FP)

Recall, also known as sensitivity, is the ratio of the correctly identified positive cases to all the actual positive cases
Recall: TP/ (TP + FN)
Confusion Predicted Precision: TP/ (TP + FP)
matrix
Spam Ham

Recall: TP/ (TP + FN)

Actual Spam 12 14

Ham 0 114

precision: 0.89
recall: 1.00

When a spam mail is not recognized as "spam" and is instead presented to us as "ham".
-If the percentage is not too high, it is annoying but not a disaster.

In contrast, when a non-spam message is wrongly labeled as spam, the email will not be shown in many cases or
even automatically deleted.
-For example, this carries a high risk of losing customers and friends.
There's a risk of making each type of error in every analysis, and the amount of risk is in your control.
Symmetric vs. Skewed Data

■ Median, mean and mode of symmetric, positively and negatively

skewed data
Positively and Negatively Correlated Data
Not Correlated Data
PROBABILITY
Probability

• How likely something is to happen.

• Many events can't be predicted with total certainty.

• The best we can say is how likely they are to happen, using the idea of
probability.
Tossing a Coin

• When a coin is tossed, there are two possible outcomes:

• heads (H) or
• tails (T)
• We say that the probability of the coin landing H is ½
• And the probability of the coin landing T is ½
Throwing Dice

• When a single die is thrown, there are six possible

outcomes: 1, 2, 3, 4, 5, 6.

• The probability of any one of them is 1/6

Example: the chances of rolling a "4" with a die

• Number of ways it can happen: 1

• (there is only 1 face with a "4" on it)

• Total number of outcomes: 6

• (there are 6 faces altogether)
• So the probability = 1/6
There are 5 marbles in a bag: 4 are blue, and 1 is red. What is the
probability that a blue marble gets picked?

• Number of ways it can happen: 4 (there are 4 blues)

• Total number of outcomes: 5 (there are 5 marbles in total)
• So the probability = 4/5 = 0.8
Frequentist probability
• The frequentist probability denotes the frequency with which the
event can happen amongst many trials/events.
• Rolling a dice is frequentist as ⅙ means that out of infinitely
many trials of rolling a dice, there’s a 1/6th chance that 6 is
going to show up.

Not all scenarios are frequency related

Bayesian probability
• We say that this event could occur with a certain
probability/certainty.
• Consider the statement — there’s a 32% chance that a diabetic
patient is going to develop heart failure.
• This statement isn’t prone to repetition where we create infinite
replicas of the patient’s symptoms.
• We instead quantify with a 32% certainty that heart failure could
happen.
EVENTS
• A probability event can be defined as a set of outcomes of an experiment.
• The toss of a coin, throw of a dice are all examples of random events.
• Example Events:
• Getting a Tail when tossing a coin is an event
• Rolling a "5" is an event.
• Events can be:
• Independent (each event is not affected by other events),
• Dependent (also called "Conditional", where an event is affected by other
events)
• Mutually Exclusive (events can't happen at the same time)
Independent Events

• Events can be "Independent", meaning each event is not affected by

any other events.

• Example: You toss a coin three times and it comes up "Heads" each
time ... what is the chance that the next toss will also be a "Head"?
• The chance is simply 1/2, or 50%, just like ANY OTHER toss of the coin.

• What it did in the past will not affect the current toss!
Dependent Events

• Some events can be "dependent" ... which means they can be affected by
previous events.
• Example: Drawing 2 Cards from a Deck
• Let's look at the chances of getting a King.
• For the 1st card the chance of drawing a King is 4 out of 52
• But for the 2nd card:
• If the 1st card was a King, then the 2nd card is less likely to be a King, as only 3 of the 51
cards left are Kings.
• If the 1st card was not a King, then the 2nd card is slightly more likely to be a King, as 4
of the 51 cards left are King.
Mutually Exclusive
• Mutually Exclusive means we can't get both events at the same time.
• Examples:
• Turning left or right are Mutually Exclusive (you can't do both at the same
time)
• Heads and Tails are Mutually Exclusive
MARGINAL PROBABILITY
• It gives the probabilities of various values of the variables
without reference to the values of the other variables

P(Female) = 0.46 which completely

ignores the sport the Female prefers,

P(Rugby) = 0.25 completely ignores

the gender.
Joint Probability

• The Joint probability is a statistical measure that is used to

calculate the probability of two events occurring together at the
same time — P(A and B) or P(A,B).

The joint probability of someone being a male

and liking football is 0.24.

The Joint probability is symmetrical meaning

that P(Male and Football) = P(Football and
Male)
Joint probability
• Find the probability that a candidate has got additional certification
and also a good salary package?
• Ans: 30/105 = 0.28 W/O add. With add.
certif Total
certif
Conditional probability

• It defines the probability of one event occurring given that another

event has occurred

• If we want to calculate the probability that a person would like

Rugby given that they are a female, we must take the joint
probability that the person is female and likes rugby (P(Female
and Rugby)) and divide it by the probability of the condition.
• P(Female, Rugby) = 0.05
• P(Female) = 0.46
• P(Rugby | Female) = 0.05 / 0.46 = 0.11 (to 2 decimal places).
Bayes' Theorem
• Way of finding a probability when we know certain other
probabilities.

• Which tells us:

• how often A happens given that B happens, written P(A|B),
• When we know:
• how often B happens given that A happens, written P(B|A)
• and how likely A is on its own, written P(A)
• and how likely B is on its own, written P(B)
EXAMPLE
• Past data tells you that 10% of patients entering your clinic have liver disease. The litmus test says
that “Patient is an alcoholic.” Five percent of the clinic’s patients are alcoholics. You might also
know that among those patients diagnosed with liver disease, 7% are alcoholics. Find the chances
of a person having liver disease given that the person is alcoholic.
• Solution:
• A – Liver disease, B – Alcoholic
• P(Liver disease) = 0.10, P(Alcoholic) = 0.05
• the probability that a patient is alcoholic, given that they have liver disease, is 7%. i.e
P(Alcoholic/Liver Disease) = 0.07

• Bayes’ theorem tells you:.

• P(Liver disease|Alcoholic) = (0.07 * 0.1)/0.05 = 0.14

• In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14
(14%).
Example
• You are planning a picnic today, but the morning is cloudy
• 50% of all rainy days start off cloudy!
• But cloudy mornings are common (about 40% of days start cloudy)
• And this is usually a dry month (only 3 of 30 days tend to be rainy, or 10%)
• What is the chance of rain during the day?
• Answer:
Example
• Can you discover P(Man|Pink) ?
Example
• Hunter says she is itchy. There is a test for Allergy to Cats, but
this test is not always right:
• For people that really do have the allergy, the test says
"Yes" 80% of the time
• For people that do not have the allergy, the test says
"Yes" 10% of the time ("false positive")
• If 1% of the population have the allergy, and Hunter's test
says "Yes", what are the chances that Hunter really has the
allergy?
• Solution:
• We want to know the chance of having the allergy when test
says "Yes", written P(Allergy|Yes)
PROBABILITY DISTRIBUTION
EXAMPLE
Types of Distributions

1.Binomial Distribution
2.Bernoulli Distribution
3.Uniform Distribution
4.Normal Distribution
5.Poisson Distribution
6.Exponential Distribution
BINOMIAL DISTRIBUTION
Which flavor people prefer most??
1. Binomial Distribution
A binomial distribution graph where the probability
of success does not equal the probability of failure
looks like
when probability of success = probability of failure,
in such a situation the graph of binomial distribution
looks like
EXPONENTIAL
DISTRIBUTION
NORMAL DISTRIBUTION
CENTRAL LIMIT THEOREM
Use of central limit theorem
• Biologists use the central limit theorem whenever they use data
from a sample of organisms to draw conclusions about the
overall population of organisms.

• For example, a biologist may measure the height of 30 randomly

selected plants and then use the sample mean height to estimate
the population mean height.
• If the biologist finds that the sample mean height of the 30 plants is 10.3
inches, then her best guess for the population mean height will also be
10.3 inches.
Surveys

• Human Resources departments often use the central limit theorem when using
surveys to draw conclusions about overall employee satisfaction at companies.
• For example, the HR department of some company may randomly select 50
employees to take a survey that assesses their overall satisfaction on a scale of 1
to 10.
• If it’s found that the average satisfaction among employees in the survey is 8.5
then the best guess for the average satisfaction rating of all employees at the
company is also 8.5.
Example
• Uniform distribution
• The probability of getting heads in a coin flip is 0.5, and the probability
of tails is also 0.5.
• In the case of a die, the probability of getting a specific number
between 1 and 6 is 1/6 = 0.16.
• In both these examples, the probabilities are uniformly distributed,
which means that each value has the same probability.
• Normal Distribution
• If you look at the distribution of heights within a population, you will find
that some heights are more common than others.
EXAMPLE - binomial distribution
• Banks use the binomial distribution to model the probability that a certain number of credit card
transactions are fraudulent.

• For example, suppose it is known that 2% of all credit card transactions in a certain region are
fraudulent. If there are 50 transactions per day in a certain region, we can use a Binomial Distribution
Calculator to find the probability that more than a certain number of fraudulent transactions occur in a
given day:

• P(X > 1 fraudulent transaction) = 0.26423

• P(X > 2 fraudulent transactions) = 0.07843
• P(X > 3 fraudulent transactions) = 0.01776
• And so on.

• This gives banks an idea of how likely it is that more than a certain number of fraudulent transactions
will occur in a given day.
Need for Probability distribution
• Probability distributions indicate the likelihood (the chance of
something happening) of an event or outcome.
• Probability distributions are used to determine the risk of certain
outcomes.
• You can use this information to make better decisions
• Probability distributions cannot guarantee an outcome. It only
gives the probability of observing any given outcome.
• If the probability distribution is correct, then repeating the same
experiment multiple times will provide results that follow the trend of the
underlying probability distribution.
Let's get some terminology straight, generally when we say 'a model' we refer to a particular method
for describing how some input data relates to what we are trying to predict. We don't generally refer to
particular instances of that method as different models. So you might say 'I have a linear regression model'
but you wouldn't call two different sets of the trained coefficients different models. At least not in the context of
model selection.
So, when you do K-fold cross validation, you are testing how well your model is able to get trained by
some data and then predict data it hasn't seen. We use cross validation for this because if you train using all
the data you have, you have none left for testing. You could do this once, say by using 80% of the data to
train and 20% to test, but what if the 20% you happened to pick to test happens to contain a bunch of points
that are particularly easy (or particularly hard) to predict? We will not have come up with the best estimate
possible of the models ability to learn and predict.
We want to use all of the data. So to continue the above example of an 80/20 split, we would do 5-
fold cross validation by training the model 5 times on 80% of the data and testing on 20%. We ensure that
each data point ends up in the 20% test set exactly once. We've therefore used every data point we have to
contribute to an understanding of how well our model performs the task of learning from some data and
predicting some new data.
But the purpose of cross-validation is not to come up with our final model. We don't use these 5 instances of
our trained model to do any real prediction. For that we want to use all the data we have to come up with the
best model possible. The purpose of cross-validation is model checking, not model building.
Now, say we have two models, say a linear regression model and a neural network. How can we say
which model is better? We can do K-fold cross-validation and see which one proves better at predicting the
test set points. But once we have used cross-validation to select the better performing model, we train that
model (whether it be the linear regression or the neural network) on all the data. We don't use the actual
GENERALIZATION
Underfitting is a scenario in data science where a data model is unable to capture the relationship
between the input and output variables accurately, generating a high error rate on both the
training set and unseen data.

Overfitting is a concept in data science, which occurs when a statistical model fits exactly against
its training data. ... When the model memorizes the noise and fits too closely to the training set, the
model becomes “overfitted,” and it is unable to generalize well to new data.
• When choosing a classifier for your data, an obvious question to
ask is “What kind of data can this classifier classify?”.

• For example, if you know your points can easily be separated

by a single line, you may opt to choose a simple linear
classifier, whereas if you know your points will be in many
separate groups, you may opt to choose a more powerful
classifier such as a random forest or multilayer perceptron.

• This fundamental question can be answered using a

classifier’s VC dimension, that formally quantifies the power of
a classification algorithm.
The VC dimension of a classifier is defined by Vapnik and
Chervonenkis to be the cardinality (size) of the largest set of
points that the classification algorithm can shatter.

In order to have a VC dimension of at least N, a classifier must

be able to shatter a single configuration of N points.
4
In classification in general, the hypothesis class is the set of possible classification
functions you're considering;

the learning algorithm picks a function from the hypothesis class.

For a decision tree learner, the hypothesis class would just be the set of all possible
decision trees.
• VC dimension is a formal measure of bias.
• The VC dimension of a representation system is defined to be
• the maximum number of datapoints that can be separated (i.e.,
grouped) in all possible ways.
• Another way of saying this is to describe it as the most datapoints that
can be `shattered' by the representation.
• More powerful representations are able to shatter larger sets of
datapoints. These have higher VC dimension.
• Less powerful representations can only shatter smaller sets of
datapoints. These then have lower VC dimension.
ERROR
• Error measures are a tool in ML that quantify the question “how
wrong was our estimation”.
• It is a function that compares the output of a learned hypothesis with
the output of the real target function.
• What this means in practice is that we compare the prediction of our
model with the real value in data.
Bias-Variance-Noise Tradeoff

• The prediction error of a machine learning model, that is, the difference
between ground truth and the learned model, is traditionally composed of
three parts:
Error = Variance + Bias + Noise
• Here, variance measures the fluctuation of learned functions given different
datasets,
• bias measures the difference between the ground truth and the best possible
function within our modeling space,
• Noisy Target: Target functions can have different outputs for two
identical observations (f(x₁ == (1,2,3)) = yes and f(x₂ == (1,2,3) = no i.e
have two different real valued outputs for x₁== x₂).
• and noise refers to the irreducible error due to non-deterministic outputs of the
ground truth function itself.
Example: Playing Dice
1. No Features
• Rolling a die is associated with generating a random
number M between one and six with an equal probability of 16.6%.
• From this perspective, the output function (the number rolled) is
completely non-deterministic, and the error is fully characterized by
the noise term.
Error = Noise
• Since we are not using any features, there is no model to learn and
thus no variance.
2. Some Features
• Let’s repeat the same experiment, but this time, let’s record the number N facing
up at the moment the die is released and the height h it is dropped from.
• Based on these two features, we can generate pairs of training data x = (h,
N) and y = M, where M is the result of the die roll.
• Armed with enough training data and a good model, we expect that for two new
inputs h and N our model is able to predict M better than chance.
• For example, if N = 1 and h = 5 cm, our model may predict M = 1 with 60%
confidence and thus show an improvement over our first model which would
have predicted M = 1 with only 16.6% confidence.
• Thus, we managed to reduce the overall prediction error by reducing the noise
term. By training a machine learning model we also introduced a bias and
variance term in our overall error.
Error = Variance + Bias + Noise
3. All Features
• Rolling a die is completely deterministic and there is absolutely no
randomness in it.
• As long as we keep track of all relevant quantities such as initial speed,
angular momentum, air resistance, drop height, etc., we can predict the
outcome of the roll with 100% accuracy.
• In this case, we expect that noise is completely eliminated and we are left
with just bias and variance.
Error = Variance + Bias
• If we consider a very complex (and suitable) modeling space, we will have
almost no bias.
• If we further assume a huge amount of training data, our variance will also
be very small. In such a case, our overall prediction error will be close to
zero.
Practical Implications

• The same techniques that reduce bias also reduce noise, and vice
versa.
• In particular, techniques that reduce variance such as collecting
more training samples won’t help reduce noise.
• Adding more features and considering more complex models will
help reduce both noise and bias.

Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Lecture 2
No ratings yet
Lecture 2
26 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
132 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
19 pages
Basic Concepts of Machine Learning For Beginners
No ratings yet
Basic Concepts of Machine Learning For Beginners
102 pages
INT354 - Unit 1
No ratings yet
INT354 - Unit 1
72 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
ML - Module 1
No ratings yet
ML - Module 1
30 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
Unit 1
No ratings yet
Unit 1
92 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
ML Module I
No ratings yet
ML Module I
71 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
03-Introduction To Machine Learning - DNN
No ratings yet
03-Introduction To Machine Learning - DNN
35 pages
Zarantech - Intro To ML
No ratings yet
Zarantech - Intro To ML
105 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
ML COMPLETE (Pure Sem Ka)
No ratings yet
ML COMPLETE (Pure Sem Ka)
347 pages
Data Science & ML Course Guide
No ratings yet
Data Science & ML Course Guide
83 pages
ML 1
No ratings yet
ML 1
35 pages
Unit I
No ratings yet
Unit I
150 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Unit 1
No ratings yet
Unit 1
93 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
ML - Unit 1
No ratings yet
ML - Unit 1
68 pages
Chap-6 Machine Learning Introduction
No ratings yet
Chap-6 Machine Learning Introduction
49 pages
ML Unit 1
No ratings yet
ML Unit 1
20 pages
Unit4 PPT
No ratings yet
Unit4 PPT
126 pages
Unit 1
No ratings yet
Unit 1
62 pages
Machine Learning Notes
91% (11)
Machine Learning Notes
19 pages
ML - Full Slides Srikanth Allamshatty
No ratings yet
ML - Full Slides Srikanth Allamshatty
369 pages
Machine Learning Basics & History
No ratings yet
Machine Learning Basics & History
458 pages
Unit 4
No ratings yet
Unit 4
34 pages
Unit1 ML NGP
No ratings yet
Unit1 ML NGP
106 pages
w1 - Introduction To ML
No ratings yet
w1 - Introduction To ML
40 pages
AI Unit 1
No ratings yet
AI Unit 1
30 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Introduction To ML Unit-1
No ratings yet
Introduction To ML Unit-1
90 pages
Intro to Machine Learning Steps
No ratings yet
Intro to Machine Learning Steps
35 pages
ML Unit-1
No ratings yet
ML Unit-1
64 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
BE02000041 Funda of AI Unit 3 Basics of ML
No ratings yet
BE02000041 Funda of AI Unit 3 Basics of ML
86 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Introduction
No ratings yet
Introduction
41 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
B.Tech IT Machine Learning Guide
No ratings yet
B.Tech IT Machine Learning Guide
135 pages
A Preliminary Idea On Machine Learning
No ratings yet
A Preliminary Idea On Machine Learning
40 pages
Machine Learning (Unit I)
No ratings yet
Machine Learning (Unit I)
12 pages
Selected T Chapter 3
No ratings yet
Selected T Chapter 3
62 pages
Chapter 5 Machine Learning
No ratings yet
Chapter 5 Machine Learning
96 pages
Intro Machine Learning
No ratings yet
Intro Machine Learning
4 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
ML 01
No ratings yet
ML 01
24 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Lecture 11
No ratings yet
Lecture 11
18 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Problem Unit 2
No ratings yet
Problem Unit 2
12 pages
Java Unit 1 ND 2 - 2M and 3M Notes
No ratings yet
Java Unit 1 ND 2 - 2M and 3M Notes
12 pages
MapReduce Scripts in HiveQL With Keywords
No ratings yet
MapReduce Scripts in HiveQL With Keywords
8 pages
Sorting and Aggregating in HiveQL
100% (1)
Sorting and Aggregating in HiveQL
10 pages
Pydroid
No ratings yet
Pydroid
3 pages
Managing Attrition in The Indian Information Technology Industry
No ratings yet
Managing Attrition in The Indian Information Technology Industry
5 pages
Client - Centric Consistency Models
No ratings yet
Client - Centric Consistency Models
9 pages
Cabling Guide For Console and AUX Ports - Cisco
No ratings yet
Cabling Guide For Console and AUX Ports - Cisco
12 pages
Newest Repair Tools and Spare Parts-Tiff2024
No ratings yet
Newest Repair Tools and Spare Parts-Tiff2024
42 pages
VAC Courses Info
No ratings yet
VAC Courses Info
10 pages
Serial Number Robot
No ratings yet
Serial Number Robot
3 pages
Powerpoint Dissertation Proposal
100% (2)
Powerpoint Dissertation Proposal
5 pages
Lorenzo PDF
No ratings yet
Lorenzo PDF
37 pages
Report On Martingale Theory
No ratings yet
Report On Martingale Theory
13 pages
2025 Copy of Fintech Eng m12345 Part 2
No ratings yet
2025 Copy of Fintech Eng m12345 Part 2
345 pages
Traffic Engineering Answer Key
No ratings yet
Traffic Engineering Answer Key
2 pages
Vehicle Warning Systems Guide
No ratings yet
Vehicle Warning Systems Guide
4 pages
Literature Review Help for Students
100% (2)
Literature Review Help for Students
7 pages
Build Your First AI Business in 6 Hours (Ultimate Beginner Guide)
No ratings yet
Build Your First AI Business in 6 Hours (Ultimate Beginner Guide)
10 pages
Icet Inst English
No ratings yet
Icet Inst English
8 pages
Warhammer 40k: Assassinorum Guide
0% (1)
Warhammer 40k: Assassinorum Guide
4 pages
Fractals Programming Python D 69144
No ratings yet
Fractals Programming Python D 69144
5 pages
LAB05 SCOR - Configure Cisco Firepower NGFW Discovery and IPS Policy
No ratings yet
LAB05 SCOR - Configure Cisco Firepower NGFW Discovery and IPS Policy
31 pages
600 Questões Sobre Inspeção de Soldagem Incluíndo Gabarito e Caderno de Desenhos
No ratings yet
600 Questões Sobre Inspeção de Soldagem Incluíndo Gabarito e Caderno de Desenhos
9 pages
Gowifi - Ahd - HLD - Rosario Maclang Bautista General Hospital Rev1
No ratings yet
Gowifi - Ahd - HLD - Rosario Maclang Bautista General Hospital Rev1
32 pages
Advanced Visualization
No ratings yet
Advanced Visualization
48 pages
Chapter Eight: File Management
No ratings yet
Chapter Eight: File Management
53 pages
CT-17B-6BT Manual Vers1 - 01 - 00 1
No ratings yet
CT-17B-6BT Manual Vers1 - 01 - 00 1
13 pages
Quickstart Guide: Trackfish 6500
No ratings yet
Quickstart Guide: Trackfish 6500
4 pages
Creopedia - EvoCreo Wikia - Fandom
No ratings yet
Creopedia - EvoCreo Wikia - Fandom
19 pages
XML Services Developer's Guide 7.1
No ratings yet
XML Services Developer's Guide 7.1
80 pages
LESSON 10 - Functions
No ratings yet
LESSON 10 - Functions
6 pages
Chapter 00.course Information
No ratings yet
Chapter 00.course Information
14 pages
Aoishiro - Walkthroughs - Fuwanovel Forums
No ratings yet
Aoishiro - Walkthroughs - Fuwanovel Forums
13 pages

Introduction To Machine Learning

Uploaded by

Introduction To Machine Learning

Uploaded by

Introduction to Machine

1. Tom Mitchell, “Machine Learning”, McGraw Hill, USA, 2017.

if (experience < = 10)

So ML calculate weights of features that contribute in deciding the label

Salary = Experience * Weight_1 + JobLevel * Weight_2 + Skill * Weight_3

• Researching the model that will be best for the type

• Incomplete data may come from

• Noisy data (incorrect values) may come from

• Inconsistent data may come from

The classifier in our example correctly

Therefore, the accuracy can be calculated

accuracy = (42 + 32) / (42 + 8 + 18 + 32)

Recall: TP/ (TP + FN)

■ Median, mean and mode of symmetric, positively and negatively

• How likely something is to happen.

• Many events can't be predicted with total certainty.

• When a coin is tossed, there are two possible outcomes:

• When a single die is thrown, there are six possible

• The probability of any one of them is 1/6

• Number of ways it can happen: 1

• Total number of outcomes: 6

• Number of ways it can happen: 4 (there are 4 blues)

Not all scenarios are frequency related

• Events can be "Independent", meaning each event is not affected by

P(Female) = 0.46 which completely

P(Rugby) = 0.25 completely ignores

• The Joint probability is a statistical measure that is used to

The joint probability of someone being a male

The Joint probability is symmetrical meaning

• It defines the probability of one event occurring given that another

• If we want to calculate the probability that a person would like

• Which tells us:

• Bayes’ theorem tells you:.

• For example, a biologist may measure the height of 30 randomly

• P(X > 1 fraudulent transaction) = 0.26423

• For example, if you know your points can easily be separated

• This fundamental question can be answered using a

In order to have a VC dimension of at least N, a classifier must

the learning algorithm picks a function from the hypothesis class.

You might also like