What is the difference between a
What is a regression
linear regression and a logistic
A regression analysis is a method for
regression?
modeling relationships between
variables.
In a linear regression, the dependent
It makes it possible to infer variable is a metric variable, e.g. salary
or predict a variable or electricity consumption.
based on one or more
other variables.
In a logistic regression, the
dependent variable is a
The variable we want to infer or
predict is called the dependent
dichotomous variable.
variable or criterion.
What is a dichotomous variable?
Dichotomous variables are variables
with only two values.
For example:
Whether a person buys or does
not buy a particular product
or
whether a disease is
present or not
The variables we use for prediction
are called independent
variables or predictors.
How can logistic Our data set might look like this:
regression be used
Here we have the
independent variables
With the help of logistic
regression, we can determine what
has an influence on whether a certain
disease is present or not. Age Gender Smoker status Disease
22 female Non-smoker 1
25 female Smoker 1
18 male Smoker 0
45 male Non-smoker 0
12 female Smoker 0
43 male Smoker 1
23 male Smoker 0
33 male Smoker 1
… … … …
and here the dependent
variable with 0 and 1.
We could study the influence of We could now investigate what influence the
age, gender and smoking status independent variables have on the disease.
on that particular disease. If there is an influence, then we can predict
how likely a person is to have a certain disease.
In this case 0 stands for not diseased
and 1 for diseased
Now, of course, the
question arises:
Why do we need logistic
regression in this case?
Why can't we just use linear
and the probability for the occurrence
of the characteristic 1 (=characteristic regression?
present) is estimated.
A quick recap: A linear regression would now simply
In linear regression, this is put a straight line through the points.
our regression equation:
We have the the 1
dependent variable independent variables
and the regression coefficients. x
We can now see, that in the case of
linear regression, values between
However, we now have a
dependent variable that is either
plus and
0 or 1.
y
y
1
1
0
0
x
No matter which value we have for the x
independent variables, only 0
or 1 results. minus infinity can occur.
However, the goal of logistic No matter where we are on the x-axis,
regression is to estimate the
probability of occurrence.
1
The value range for the prediction
should therefore be between 0 and 1.
1/2
y
-∞ 0 +∞
1
between minus and plus infinity only
values between 0 and 1 result.
0
And that is exactly
x
what we want!
So we need a function that only
takes values between 0 and 1!
The equation for the logistic
And that is exactly what the function looks like this:
logistic function does.
1
1/2 The logistic function is now
used by the logistic regression.
-∞ 0 +∞
For z, the equation of the linear
regression is now simply inserted.
This gives us this equation:
Thus, the probability that the
dependent variable is 1 is given by:
What does this look
like for our example
In our example,
the probability of having a certain disease
is a function of age, gender and smoking status.
For z, the equation of the linear regression
is now simply inserted.
This gives us this equation:
Thus, the probability that the dependent
variable is 1 is given by:
What does this look like for our example
In our example,
the probability of having a certain disease
is a function of age, gender and smoking status.
Now we need to determine the coefficients
so that our model best represents the given data.
To solve this problem, the so-called
maximum likelihood method is used.
For this purpose, there are good numerical
methods that can solve the problem efficiently.