KEMBAR78
Lecture 08 | PDF | Logistic Regression | Linear Regression
0% found this document useful (0 votes)
19 views42 pages

Lecture 08

The document outlines a course on Machine Learning, covering key topics such as logistic regression, gradient descent, and various types of logistic regression models. It details the mechanics of gradient descent as an optimization algorithm, its applications in logistic regression, and the assumptions necessary for its implementation. Additionally, it discusses the advantages and properties of logistic regression, including its use in classification problems and the importance of sample size and independence of observations.

Uploaded by

shawon.iitju.48
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views42 pages

Lecture 08

The document outlines a course on Machine Learning, covering key topics such as logistic regression, gradient descent, and various types of logistic regression models. It details the mechanics of gradient descent as an optimization algorithm, its applications in logistic regression, and the assumptions necessary for its implementation. Additionally, it discusses the advantages and properties of logistic regression, including its use in classification problems and the importance of sample size and independence of observations.

Uploaded by

shawon.iitju.48
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Machine Learning

ICT-4261

By-
Dr. Jesmin Akhter
Professor
Institute of Information Technology
Jahangirnagar University
Contents
The course will mainly cover the following topics:
 A Gentle Introduction to Machine Learning
 Linear Regression
 Logistic Regression
 Naive Bayes
 Support Vector Machines
 Decision Trees and Ensemble Learning
 Clustering Fundamentals
 Hierarchical Clustering
 Neural Networks and Deep Learning
 Unsupervised Learning
Outline

 Logistic Regression
– Gradient Descent
– Linear regression vs logistic regression
– Types of logistic regression
– Key properties of the logistic regression equation
– Stochastic gradient descent algorithms
– Linear classification
Gradient Descent

 Gradient descent is an iterative process and in each step, it tries to move down the slope and get
closer to the local minimum through which we optimize the parameters of a machine learning model.
It’s particularly used in neural networks, but also in logistic regression and support vector machines.
 It’s the most typical method for iterative minimization of a cost function. Its major limitation,
though, consists of its guaranteed convergence to a local, not necessarily global,
minimum:
Gradient Descent

 The gradient is calculated with respect to a vector of parameters for the model, typically the weights theta
 The sign of the gradient allows us to decide the direction of the closest minimum to the cost function. For a
given parameter , we iteratively optimize the vector by computing:

 And this is the graphical representation


Gradient Descent

 At step j , the weights are all modified by the product of the hyperparameter alpha times the gradient of the
cost function, computed with those weights. If the gradient is positive, then we decrease the weights; and
conversely, if the gradient is negative, then we increase them.

 we can summarize the Gradient Descent Algorithm as:



Start with random
• Loop until convergence:
– Compute Gradient
– Update
• Return
Minimizing the Cost with Gradient Descent

 How gradient descent, can iteratively approximate the local minimum of a function with
an arbitrary degree of precision?
 We start with identifying a starting point that is sufficient proximity to the function’s local minimum,
in this case,

 Then, iteratively, we move towards the closest local minimum by exploiting the gradient of the
function around
Minimizing the Cost with Gradient Descent

 Gradient descent is an iterative optimization algorithm, which finds the minimum of a


differentiable function. In this process, we try different values and update them to reach the
optimal ones, minimizing the output.

 we can apply this method to the cost function of logistic regression. This way, we can find an optimal
solution minimizing the cost over model parameters:

 We’re using the sigmoid function as the hypothesis function in logistic regression.
 Assume we have a total of n features. In this case, we have n parameters for the theta vector. To
minimize our cost function, we need to run the gradient descent on each parameter :
Minimizing the Cost with Gradient Descent

 Furthermore, we need to update each parameter simultaneously for each iteration in the direction that
decreases the cost function. In other words, we need to loop through the parameters

In the case of logistic regression, analogously, we use a cost


function that contains a logarithmic expression and we apply
gradient descent algorithm on it.

 Plugging this into the gradient descent function leads to the update rule:
 Surprisingly, the update rule is the same as the one derived by using the sum of the squared errors in linear
regression. As a result, we can use the same gradient descent formula for logistic regression as well.
 By iterating over the training samples until convergence, we reach the optimal parameters theta leading to
minimum cost.
Derive the loss function with matrix form

https://youtu.be/ABrrSwMYWSg?si=iBaxEQ3bqsUErd-i
Linear regression vs logistic regression
Sr. No Linear Regresssion Logistic Regression

Linear regression is used to predict the continuous Logistic regression is used to predict the categorical
1 dependent variable using a given set of independent dependent variable using a given set of independent
variables. variables.

2 Linear regression is used for solving Regression problem. It is used for solving classification problems.

The relationship between the dependent variable and The relationship DOES NOT need to be linear between the
3
independent variable must be linear. dependent and independent variables.

We are finding and using the line of best fit to help us We are using the S-curve (Sigmoid) to help us classify
4
easily predict outputs. predicted outputs.

Least square estimation method is used for estimation of Maximum likelihood estimation method is used for
5
accuracy. Estimation of accuracy.

The output must be continuous value,such as Output is must be categorical value such as 0 or 1, Yes or
6
price,age,etc. no, etc.

There should not be any collinearity between the


There is a possibility of collinearity between the
7 independent variable.
independent variables.
Some Examples

Some examples of such classifications and instances where the binary response is expected or implied
are:
 Fraud detection: Logistic regression models can help teams identify data anomalies, which are
predictive of fraud. Certain behaviors or characteristics may have a higher association with fraudulent
activities, which is particularly helpful to banking and other financial institutions in protecting their
clients..
 Disease prediction: In medicine, this analytics approach can be used to predict the likelihood of
disease or illness for a given population. Healthcare organizations can set up preventative care for
individuals that show higher tendency for specific illnesses.
– Determine the probability of heart attacks: With the help of a logistic model, medical
practitioners can determine the relationship between variables such as the weight, exercise, etc.,
of an individual and use it to predict whether the person will suffer from a heart attack or any
other medical complication.
Some Examples

 Possibility of enrolling into a university: Application aggregators can determine the probability of
a student getting accepted to a particular university or a degree course in a college by studying the
relationship between the estimator variables, such as GRE, GMAT, or TOEFL scores.
 Identifying spam emails: Email inboxes are filtered to determine if the email communication is
promotional/spam by understanding the predictor variables and applying a logistic regression
algorithm to check its authenticity.
Types of logistic regression

There are three types of logistic regression models, which are defined based on categorical response.
 Binary logistic regression
 Multinomial logistic regression
 Ordinal logistic regression

 Binary logistic regression


– In this approach, the response or dependent variable is dichotomous in nature—i.e. it has only two
possible outcomes (e.g. success/failure, 0/1, or true/false). Within logistic regression, it is one of the
most common classifiers for binary classification.
– Some popular examples of its use include
• predicting if an e-mail is spam or not spam or if a tumor is malignant or not malignant.
• Deciding on whether or not to offer a loan to a bank customer: Outcome = yes or no.
• Evaluating the risk of cancer: Outcome = high or low.
• Predicting a team’s win in a football match: Outcome = yes or no.
Types of logistic regression

 Multinomial logistic regression


– A categorical dependent variable has three or more discrete outcomes in a multinomial
regression type. This implies that this regression type has more than two possible outcomes.
– ; however, these values have no specified order such as "cat", "dogs", or "sheep“
Examples:
– Let’s say you want to predict the most popular transportation type for 2040. Here, transport type
equates to the dependent variable, and the possible outcomes can be electric cars, electric trains,
electric buses, and electric bikes.
– Predicting whether a student will join a college, vocational/trade school, or corporate industry.
– Estimating the type of food consumed by pets, the outcome may be wet food, dry food, or junk
food.
Types of logistic regression

 Ordinal logistic regression


– This type of logistic regression model is applied when the response variable has three or more
possible outcome, but in this case, these values do have a defined order.
• Examples: Ordered types of dependent variables represent,
• "low", "Medium", or "High“, grading scales from A to F or rating scales from 1 to 5.
• Formal shirt size: Outcomes = XS/S/M/L/XL
• Survey answers: Outcomes = Agree/Disagree/Unsure
• Scores on a math test: Outcomes = Poor/Average/Good

outcome gular defined order thaklei sheta ordinal


Key Advantages of Logistic Regression

1. Easier to implement machine learning methods: A machine learning model can be effectively
set up with the help of training and testing. The training identifies patterns in the input data (image) and
associates them with some form of output (label). Training a logistic model with a regression algorithm
does not demand higher computational power. As such, logistic regression is easier to implement,
interpret, and train than other ML methods.
2. Suitable for linearly separable datasets: A linearly separable dataset refers to a graph where a
straight line separates the two data classes. In logistic regression, the y variable takes only two values.
Hence, one can effectively classify data into two separate classes if linearly separable data is used.
3. Provides valuable insights: Logistic regression measures how relevant or appropriate an
independent/predictor variable is (coefficient size) and also reveals the direction of their relationship or
association (positive or negative).
Key properties of the logistic regression equation

Typical properties of the logistic regression equation include:


 Logistic regression’s dependent variable obeys ‘Bernoulli distribution’
 Estimation/prediction is based on ‘maximum likelihood.’
 Logistic regression does not evaluate the coefficient of determination (or R squared) as observed in
linear regression’. Instead, the model’s fitness is assessed through a concordance.
 For example, KS or Kolmogorov-Smirnov statistics look at the difference between cumulative events
and cumulative non-events to determine the efficacy of models through credit scoring.
Key properties of the logistic regression equation

While implementing logistic regression, one needs to keep in mind the following key assumptions:
1. The dependent/response variable is binary or dichotomous
 The first assumption of logistic regression is that response variables can only take on two possible
outcomes – pass/fail, male/female, and malignant/benign.
 This assumption can be checked by simply counting the unique outcomes of the dependent variable.
If more than two possible outcomes surface, then one can consider that this assumption is violated.
2. Little or no multicollinearity between the predictor/explanatory variables
 This assumption implies that the predictor variables (or the independent variables) should be
independent of each other. Multicollinearity relates to two or more highly correlated independent
variables. Such variables do not provide unique information in the regression model and lead to
wrongful interpretation.
 The assumption can be verified with the variance inflation factor (VIF), which determines the
correlation strength between the independent variables in a regression model.
Key properties of the logistic regression equation

3. Linear relationship of independent variables to log odds


 Log odds refer to the ways of expressing probabilities. Log odds are different from probabilities. Odds
refer to the ratio of success to failure, while probability refers to the ratio of success to everything
that can occur.
 For example, consider that you play twelve tennis games with your friend. Here, the odds of you
winning are 5 to 7 (or 5/7), while the probability of you winning is 5 to 12 (as the total games played
= 12).
 the order of observations. The plot helps in determining the presence or absence of a random
pattern. If a random pattern is present or detected, this assumption may be considered violated.
Key properties of the logistic regression equation

4. Prefers large sample size


 Logistic regression analysis yields reliable, robust, and valid results when a larger sample size of the dataset
is considered.
 This assumption can be validated by taking into account a minimum of 10 cases considering the least
frequent outcome for each estimator variable. Let’s consider a case where you have three predictor
variables, and the probability of the least frequent outcome is 0.30. Here, the sample size would be (10*3) /
0.30 = 100.
5. Problem with extreme outliers
 Another critical assumption of logistic regression is the requirement of no extreme outliers in the dataset.
 This assumption can be verified by calculating Cook’s distance (Di) for each observation to identify
influential data points that may negatively affect the regression model. In situations when outliers exist, one
can implement the following solutions:
 Eliminate or remove the outliers
 Consider a value of mean or median instead of outliers, or
 Keep the outliers in the model but maintain a record of them while reporting the regression results
Key properties of the logistic regression equation

6. Consider independent observations


 This assumption states that the dataset observations should be independent of each other. The
observations should not be related to each other or emerge from repeated measurements of the
same individual type.
 The assumption can be verified by plotting residuals against time, which signifies the order of
observations. The plot helps in determining the presence or absence of a random pattern. If a random
pattern is present or detected, this assumption may be considered violated.
Thank You

33
Derivation of Cost Function:

 Now, we will derive the cost function with the help of the chain rule as it allows us to calculate
complex partial derivatives by breaking them down.
 Step-1: Use chain rule and break the partial derivative of log-likelihood.

 Step-2: Find derivative of log-likelihood w.r.t p


 Step-3: Find derivative of ‘p’ w.r.t ‘z’

 Step-4: Find derivate of z w.r.t θ

 Step-5: Put all the derivatives in equation 1


 Hence the derivative of our cost function is:

 Now since we have our derivative of the cost function, we can write our gradient descent algorithm
as:
– If the slope is negative (downward slope) then our gradient descent will add some value to our
new value of the parameter directing it towards the minimum point of the convex curve.
Whereas if the slope is positive (upward slope) the gradient descent will minus some value to
direct it towards the minimum point.
Deriving the Gradient Descent formula for Logistic Regression
Deriving the Gradient Descent formula for Logistic Regression
Deriving the Gradient Descent formula for Logistic Regression
How Does Gradient Descent Work?

 Gradient descent is an optimization algorithm used to minimize the cost function of a model.
 The cost function measures how well the model fits the training data and is defined based on the difference between
the predicted and actual values.
 The gradient of the cost function is the derivative with respect to the model’s parameters and points in the direction of
the steepest ascent.
 The algorithm starts with an initial set of parameters and updates them in small steps to minimize the cost function.
 In each iteration of the algorithm, the gradient of the cost function with respect to each parameter is computed.
 The gradient tells us the direction of the steepest ascent, and by moving in the opposite direction, we can find the
direction of the steepest descent.
 The size of the step is controlled by the learning rate, which determines how quickly the algorithm moves towards the
minimum.
 The process is repeated until the cost function converges to a minimum, indicating that the model has reached the
optimal set of parameters.
 There are different variations of gradient descent, including batch gradient descent, stochastic gradient descent, and
mini-batch gradient descent, each with its own advantages and limitations.
 Efficient implementation of gradient descent is essential for achieving good performance in machine learning tasks. The
choice of the learning rate and the number of iterations can significantly impact the performance of the algorithm.

You might also like