0% found this document useful (0 votes)

19 views42 pages

Lecture 08

The document outlines a course on Machine Learning, covering key topics such as logistic regression, gradient descent, and various types of logistic regression models. It details the mechanics of gradient descent as an optimization algorithm, its applications in logistic regression, and the assumptions necessary for its implementation. Additionally, it discusses the advantages and properties of logistic regression, including its use in classification problems and the importance of sample size and independence of observations.

Uploaded by

shawon.iitju.48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views42 pages

Lecture 08

Uploaded by

shawon.iitju.48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Machine Learning

ICT-4261

By-
Dr. Jesmin Akhter
Professor
Institute of Information Technology
Jahangirnagar University
Contents
The course will mainly cover the following topics:
 A Gentle Introduction to Machine Learning
 Linear Regression
 Logistic Regression
 Naive Bayes
 Support Vector Machines
 Decision Trees and Ensemble Learning
 Clustering Fundamentals
 Hierarchical Clustering
 Neural Networks and Deep Learning
 Unsupervised Learning
Outline

 Logistic Regression
– Gradient Descent
– Linear regression vs logistic regression
– Types of logistic regression
– Key properties of the logistic regression equation
– Stochastic gradient descent algorithms
– Linear classification
Gradient Descent

 Gradient descent is an iterative process and in each step, it tries to move down the slope and get
closer to the local minimum through which we optimize the parameters of a machine learning model.
It’s particularly used in neural networks, but also in logistic regression and support vector machines.
 It’s the most typical method for iterative minimization of a cost function. Its major limitation,
though, consists of its guaranteed convergence to a local, not necessarily global,
minimum:
Gradient Descent

 The gradient is calculated with respect to a vector of parameters for the model, typically the weights theta
 The sign of the gradient allows us to decide the direction of the closest minimum to the cost function. For a
given parameter , we iteratively optimize the vector by computing:

 And this is the graphical representation

Gradient Descent

 At step j , the weights are all modified by the product of the hyperparameter alpha times the gradient of the
cost function, computed with those weights. If the gradient is positive, then we decrease the weights; and
conversely, if the gradient is negative, then we increase them.

 we can summarize the Gradient Descent Algorithm as:


Start with random
• Loop until convergence:
– Compute Gradient
– Update
• Return
Minimizing the Cost with Gradient Descent

 How gradient descent, can iteratively approximate the local minimum of a function with
an arbitrary degree of precision?
 We start with identifying a starting point that is sufficient proximity to the function’s local minimum,
in this case,

 Then, iteratively, we move towards the closest local minimum by exploiting the gradient of the
function around
Minimizing the Cost with Gradient Descent

 Gradient descent is an iterative optimization algorithm, which finds the minimum of a

differentiable function. In this process, we try different values and update them to reach the
optimal ones, minimizing the output.

 we can apply this method to the cost function of logistic regression. This way, we can find an optimal
solution minimizing the cost over model parameters:

 We’re using the sigmoid function as the hypothesis function in logistic regression.
 Assume we have a total of n features. In this case, we have n parameters for the theta vector. To
minimize our cost function, we need to run the gradient descent on each parameter :
Minimizing the Cost with Gradient Descent

 Furthermore, we need to update each parameter simultaneously for each iteration in the direction that
decreases the cost function. In other words, we need to loop through the parameters

In the case of logistic regression, analogously, we use a cost

function that contains a logarithmic expression and we apply
gradient descent algorithm on it.

 Plugging this into the gradient descent function leads to the update rule:
 Surprisingly, the update rule is the same as the one derived by using the sum of the squared errors in linear
regression. As a result, we can use the same gradient descent formula for logistic regression as well.
 By iterating over the training samples until convergence, we reach the optimal parameters theta leading to
minimum cost.
Derive the loss function with matrix form

https://youtu.be/ABrrSwMYWSg?si=iBaxEQ3bqsUErd-i
Linear regression vs logistic regression
Sr. No Linear Regresssion Logistic Regression

Linear regression is used to predict the continuous Logistic regression is used to predict the categorical
1 dependent variable using a given set of independent dependent variable using a given set of independent
variables. variables.

2 Linear regression is used for solving Regression problem. It is used for solving classification problems.

The relationship between the dependent variable and The relationship DOES NOT need to be linear between the
3
independent variable must be linear. dependent and independent variables.

We are finding and using the line of best fit to help us We are using the S-curve (Sigmoid) to help us classify
4
easily predict outputs. predicted outputs.

Least square estimation method is used for estimation of Maximum likelihood estimation method is used for
5
accuracy. Estimation of accuracy.

The output must be continuous value,such as Output is must be categorical value such as 0 or 1, Yes or
6
price,age,etc. no, etc.

There should not be any collinearity between the

There is a possibility of collinearity between the
7 independent variable.
independent variables.
Some Examples

Some examples of such classifications and instances where the binary response is expected or implied
are:
 Fraud detection: Logistic regression models can help teams identify data anomalies, which are
predictive of fraud. Certain behaviors or characteristics may have a higher association with fraudulent
activities, which is particularly helpful to banking and other financial institutions in protecting their
clients..
 Disease prediction: In medicine, this analytics approach can be used to predict the likelihood of
disease or illness for a given population. Healthcare organizations can set up preventative care for
individuals that show higher tendency for specific illnesses.
– Determine the probability of heart attacks: With the help of a logistic model, medical
practitioners can determine the relationship between variables such as the weight, exercise, etc.,
of an individual and use it to predict whether the person will suffer from a heart attack or any
other medical complication.
Some Examples

 Possibility of enrolling into a university: Application aggregators can determine the probability of
a student getting accepted to a particular university or a degree course in a college by studying the
relationship between the estimator variables, such as GRE, GMAT, or TOEFL scores.
 Identifying spam emails: Email inboxes are filtered to determine if the email communication is
promotional/spam by understanding the predictor variables and applying a logistic regression
algorithm to check its authenticity.
Types of logistic regression

There are three types of logistic regression models, which are defined based on categorical response.
 Binary logistic regression
 Multinomial logistic regression
 Ordinal logistic regression

 Binary logistic regression

– In this approach, the response or dependent variable is dichotomous in nature—i.e. it has only two
possible outcomes (e.g. success/failure, 0/1, or true/false). Within logistic regression, it is one of the
most common classifiers for binary classification.
– Some popular examples of its use include
• predicting if an e-mail is spam or not spam or if a tumor is malignant or not malignant.
• Deciding on whether or not to offer a loan to a bank customer: Outcome = yes or no.
• Evaluating the risk of cancer: Outcome = high or low.
• Predicting a team’s win in a football match: Outcome = yes or no.
Types of logistic regression

 Multinomial logistic regression

– A categorical dependent variable has three or more discrete outcomes in a multinomial
regression type. This implies that this regression type has more than two possible outcomes.
– ; however, these values have no specified order such as "cat", "dogs", or "sheep“
Examples:
– Let’s say you want to predict the most popular transportation type for 2040. Here, transport type
equates to the dependent variable, and the possible outcomes can be electric cars, electric trains,
electric buses, and electric bikes.
– Predicting whether a student will join a college, vocational/trade school, or corporate industry.
– Estimating the type of food consumed by pets, the outcome may be wet food, dry food, or junk
food.
Types of logistic regression

 Ordinal logistic regression

– This type of logistic regression model is applied when the response variable has three or more
possible outcome, but in this case, these values do have a defined order.
• Examples: Ordered types of dependent variables represent,
• "low", "Medium", or "High“, grading scales from A to F or rating scales from 1 to 5.
• Formal shirt size: Outcomes = XS/S/M/L/XL
• Survey answers: Outcomes = Agree/Disagree/Unsure
• Scores on a math test: Outcomes = Poor/Average/Good

outcome gular defined order thaklei sheta ordinal

Key Advantages of Logistic Regression

1. Easier to implement machine learning methods: A machine learning model can be effectively
set up with the help of training and testing. The training identifies patterns in the input data (image) and
associates them with some form of output (label). Training a logistic model with a regression algorithm
does not demand higher computational power. As such, logistic regression is easier to implement,
interpret, and train than other ML methods.
2. Suitable for linearly separable datasets: A linearly separable dataset refers to a graph where a
straight line separates the two data classes. In logistic regression, the y variable takes only two values.
Hence, one can effectively classify data into two separate classes if linearly separable data is used.
3. Provides valuable insights: Logistic regression measures how relevant or appropriate an
independent/predictor variable is (coefficient size) and also reveals the direction of their relationship or
association (positive or negative).
Key properties of the logistic regression equation

Typical properties of the logistic regression equation include:

 Logistic regression’s dependent variable obeys ‘Bernoulli distribution’
 Estimation/prediction is based on ‘maximum likelihood.’
 Logistic regression does not evaluate the coefficient of determination (or R squared) as observed in
linear regression’. Instead, the model’s fitness is assessed through a concordance.
 For example, KS or Kolmogorov-Smirnov statistics look at the difference between cumulative events
and cumulative non-events to determine the efficacy of models through credit scoring.
Key properties of the logistic regression equation

While implementing logistic regression, one needs to keep in mind the following key assumptions:
1. The dependent/response variable is binary or dichotomous
 The first assumption of logistic regression is that response variables can only take on two possible
outcomes – pass/fail, male/female, and malignant/benign.
 This assumption can be checked by simply counting the unique outcomes of the dependent variable.
If more than two possible outcomes surface, then one can consider that this assumption is violated.
2. Little or no multicollinearity between the predictor/explanatory variables
 This assumption implies that the predictor variables (or the independent variables) should be
independent of each other. Multicollinearity relates to two or more highly correlated independent
variables. Such variables do not provide unique information in the regression model and lead to
wrongful interpretation.
 The assumption can be verified with the variance inflation factor (VIF), which determines the
correlation strength between the independent variables in a regression model.
Key properties of the logistic regression equation

3. Linear relationship of independent variables to log odds

 Log odds refer to the ways of expressing probabilities. Log odds are different from probabilities. Odds
refer to the ratio of success to failure, while probability refers to the ratio of success to everything
that can occur.
 For example, consider that you play twelve tennis games with your friend. Here, the odds of you
winning are 5 to 7 (or 5/7), while the probability of you winning is 5 to 12 (as the total games played
= 12).
 the order of observations. The plot helps in determining the presence or absence of a random
pattern. If a random pattern is present or detected, this assumption may be considered violated.
Key properties of the logistic regression equation

4. Prefers large sample size

 Logistic regression analysis yields reliable, robust, and valid results when a larger sample size of the dataset
is considered.
 This assumption can be validated by taking into account a minimum of 10 cases considering the least
frequent outcome for each estimator variable. Let’s consider a case where you have three predictor
variables, and the probability of the least frequent outcome is 0.30. Here, the sample size would be (10*3) /
0.30 = 100.
5. Problem with extreme outliers
 Another critical assumption of logistic regression is the requirement of no extreme outliers in the dataset.
 This assumption can be verified by calculating Cook’s distance (Di) for each observation to identify
influential data points that may negatively affect the regression model. In situations when outliers exist, one
can implement the following solutions:
 Eliminate or remove the outliers
 Consider a value of mean or median instead of outliers, or
 Keep the outliers in the model but maintain a record of them while reporting the regression results
Key properties of the logistic regression equation

6. Consider independent observations

 This assumption states that the dataset observations should be independent of each other. The
observations should not be related to each other or emerge from repeated measurements of the
same individual type.
 The assumption can be verified by plotting residuals against time, which signifies the order of
observations. The plot helps in determining the presence or absence of a random pattern. If a random
pattern is present or detected, this assumption may be considered violated.
Thank You

33
Derivation of Cost Function:

 Now, we will derive the cost function with the help of the chain rule as it allows us to calculate
complex partial derivatives by breaking them down.
 Step-1: Use chain rule and break the partial derivative of log-likelihood.

 Step-2: Find derivative of log-likelihood w.r.t p

 Step-3: Find derivative of ‘p’ w.r.t ‘z’

 Step-4: Find derivate of z w.r.t θ

 Step-5: Put all the derivatives in equation 1

 Hence the derivative of our cost function is:

 Now since we have our derivative of the cost function, we can write our gradient descent algorithm
as:
– If the slope is negative (downward slope) then our gradient descent will add some value to our
new value of the parameter directing it towards the minimum point of the convex curve.
Whereas if the slope is positive (upward slope) the gradient descent will minus some value to
direct it towards the minimum point.
Deriving the Gradient Descent formula for Logistic Regression
Deriving the Gradient Descent formula for Logistic Regression
Deriving the Gradient Descent formula for Logistic Regression
How Does Gradient Descent Work?

 Gradient descent is an optimization algorithm used to minimize the cost function of a model.
 The cost function measures how well the model fits the training data and is defined based on the difference between
the predicted and actual values.
 The gradient of the cost function is the derivative with respect to the model’s parameters and points in the direction of
the steepest ascent.
 The algorithm starts with an initial set of parameters and updates them in small steps to minimize the cost function.
 In each iteration of the algorithm, the gradient of the cost function with respect to each parameter is computed.
 The gradient tells us the direction of the steepest ascent, and by moving in the opposite direction, we can find the
direction of the steepest descent.
 The size of the step is controlled by the learning rate, which determines how quickly the algorithm moves towards the
minimum.
 The process is repeated until the cost function converges to a minimum, indicating that the model has reached the
optimal set of parameters.
 There are different variations of gradient descent, including batch gradient descent, stochastic gradient descent, and
mini-batch gradient descent, each with its own advantages and limitations.
 Efficient implementation of gradient descent is essential for achieving good performance in machine learning tasks. The
choice of the learning rate and the number of iterations can significantly impact the performance of the algorithm.

Logistic Regression Course Overview
No ratings yet
Logistic Regression Course Overview
16 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
41 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
53 pages
Logistic Regression Guide & Concepts
No ratings yet
Logistic Regression Guide & Concepts
25 pages
Logistic Regression - Metrics and Iteration
No ratings yet
Logistic Regression - Metrics and Iteration
26 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic Regression Report
No ratings yet
Logistic Regression Report
39 pages
Fai Module 3
No ratings yet
Fai Module 3
67 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
P 2.1 Logistic Regression
No ratings yet
P 2.1 Logistic Regression
18 pages
MACHINE LEARNING Presentation Logistic Regression
No ratings yet
MACHINE LEARNING Presentation Logistic Regression
18 pages
Sonia Jessica - 2022 - How Does Logistic Regression Work
No ratings yet
Sonia Jessica - 2022 - How Does Logistic Regression Work
4 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
Ai Tech Agency Infographics
No ratings yet
Ai Tech Agency Infographics
65 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Final ML
No ratings yet
Final ML
54 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
ML Mod 3
No ratings yet
ML Mod 3
4 pages
ML Assignment Kv2
No ratings yet
ML Assignment Kv2
10 pages
Fileml
No ratings yet
Fileml
54 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Unit 3-ML
No ratings yet
Unit 3-ML
99 pages
Chapter Two Dss
No ratings yet
Chapter Two Dss
3 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
ML (08-08-2024)
No ratings yet
ML (08-08-2024)
5 pages
ML 4
No ratings yet
ML 4
80 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
Unit-3 - Introduction To ML, Part-1
No ratings yet
Unit-3 - Introduction To ML, Part-1
3 pages
Linear Regression and Logistic Regression
No ratings yet
Linear Regression and Logistic Regression
19 pages
Logistic REGRESSION
No ratings yet
Logistic REGRESSION
10 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
A Research Project On Applying Logistic Regression To Predict Result of Binary Classification Problems
No ratings yet
A Research Project On Applying Logistic Regression To Predict Result of Binary Classification Problems
6 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 34-62
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 34-62
29 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
11logistic Regression in Machine Learning - GeeksforGeeks
No ratings yet
11logistic Regression in Machine Learning - GeeksforGeeks
4 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
ML Assignment3 Solution
No ratings yet
ML Assignment3 Solution
13 pages
2-Logistic Regression
No ratings yet
2-Logistic Regression
15 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
10 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
3 pages
Lecture 4-Logistic Regression
No ratings yet
Lecture 4-Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
22 pages
Machine Learning Unit 2 Que and Ans
No ratings yet
Machine Learning Unit 2 Que and Ans
16 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
Iet Cipher ML Bootcamp (Session-1)
No ratings yet
Iet Cipher ML Bootcamp (Session-1)
67 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Machine Learning Regression Basics
No ratings yet
Machine Learning Regression Basics
22 pages
Lec12 Logreg
No ratings yet
Lec12 Logreg
41 pages
CCTV Camera 2
No ratings yet
CCTV Camera 2
8 pages
Xbox One X Manual en FR Es - Us Ca
No ratings yet
Xbox One X Manual en FR Es - Us Ca
28 pages
Faculty Profile-CSE Updated
No ratings yet
Faculty Profile-CSE Updated
3 pages
40097501718
No ratings yet
40097501718
3 pages
NGCP: GIGASOL3 Outage Notice
No ratings yet
NGCP: GIGASOL3 Outage Notice
1 page
Seagate HDD Data Sheet
No ratings yet
Seagate HDD Data Sheet
2 pages
Mould Steel for Injection Moulding
No ratings yet
Mould Steel for Injection Moulding
1 page
Spare Parts Urgently Needed
No ratings yet
Spare Parts Urgently Needed
2 pages
Online Bookstore Abstraction
No ratings yet
Online Bookstore Abstraction
9 pages
High Performance: Pretensioned Spun High Strength Concrete Piles
No ratings yet
High Performance: Pretensioned Spun High Strength Concrete Piles
7 pages
Unit 5 - Final
No ratings yet
Unit 5 - Final
12 pages
Experiment: 5: AIM: Study of CB & CE Characteristics of Transistor Theory
100% (4)
Experiment: 5: AIM: Study of CB & CE Characteristics of Transistor Theory
5 pages
The Role of Big Data Analytics in Detecting and Preventing Financial Fraud
No ratings yet
The Role of Big Data Analytics in Detecting and Preventing Financial Fraud
8 pages
Taptite 2000 Brochure 1
No ratings yet
Taptite 2000 Brochure 1
12 pages
CIA2 Group5 Negotiations
No ratings yet
CIA2 Group5 Negotiations
14 pages
Sap MM User Guide Invoice Verification Miro
100% (1)
Sap MM User Guide Invoice Verification Miro
9 pages
DoS Attack Protection Guide
No ratings yet
DoS Attack Protection Guide
3 pages
Innovative Mining Services - Capability Statement
No ratings yet
Innovative Mining Services - Capability Statement
9 pages
Telehealth Access in Nepal Pandemic
No ratings yet
Telehealth Access in Nepal Pandemic
124 pages
Grade 12 Mil q1 Module 1 SHSPH
No ratings yet
Grade 12 Mil q1 Module 1 SHSPH
21 pages
Expansion Chamber Design According To Blair
No ratings yet
Expansion Chamber Design According To Blair
6 pages
Manufacturers and Models of Solar PV
No ratings yet
Manufacturers and Models of Solar PV
45 pages
On Egsismo: Frequently Asked Questions
No ratings yet
On Egsismo: Frequently Asked Questions
2 pages
Building Automation Product Catalogue - Issue 3: A Vital Part of Your World
No ratings yet
Building Automation Product Catalogue - Issue 3: A Vital Part of Your World
54 pages
Machines 12 00157 v2
No ratings yet
Machines 12 00157 v2
19 pages
Sun2000-2huawei Sun2000-215ktl-H0 h3 Type Test Verification Report Type A - 25082021
No ratings yet
Sun2000-2huawei Sun2000-215ktl-H0 h3 Type Test Verification Report Type A - 25082021
30 pages
Logix5550 Controller: Firmware Release Notes
No ratings yet
Logix5550 Controller: Firmware Release Notes
2 pages
Smart Meter - Wireless - Iot - Wifi - GSM Network
No ratings yet
Smart Meter - Wireless - Iot - Wifi - GSM Network
3 pages
Power Quality Techniques for Engineers
No ratings yet
Power Quality Techniques for Engineers
1 page
Big Data Documentation - Big Data Documentation
No ratings yet
Big Data Documentation - Big Data Documentation
2 pages

Lecture 08

Uploaded by

Lecture 08

Uploaded by

Machine Learning

 And this is the graphical representation

 we can summarize the Gradient Descent Algorithm as:

 Gradient descent is an iterative optimization algorithm, which finds the minimum of a

In the case of logistic regression, analogously, we use a cost

There should not be any collinearity between the

 Binary logistic regression

 Multinomial logistic regression

 Ordinal logistic regression

outcome gular defined order thaklei sheta ordinal

Typical properties of the logistic regression equation include:

3. Linear relationship of independent variables to log odds

4. Prefers large sample size

6. Consider independent observations

 Step-2: Find derivative of log-likelihood w.r.t p

 Step-4: Find derivate of z w.r.t θ

 Step-5: Put all the derivatives in equation 1

You might also like