0% found this document useful (0 votes)

11 views33 pages

Lecture 7 Logistic Regression

The document compares linear regression and logistic regression, highlighting that linear regression is suitable for continuous outcomes while logistic regression is appropriate for binary outcomes. It explains the logistic function, how to interpret coefficients, and provides examples of modeling binary outcomes using logistic regression. Additionally, it discusses variable selection methods for multiple regression, including forward selection, backward elimination, and stepwise selection.

Uploaded by

NURASYIKIN shamsuddin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views33 pages

Lecture 7 Logistic Regression

Uploaded by

NURASYIKIN shamsuddin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

ASC399

LOGISTIC
REGRESSION
Linear Regression vs Logistic
Regression
• Linear regression models have a particular form.
• The regression formula is the equation for a
straight line.
• Among the properties of a straight line is that it
goes on forever, continuously in both directions.
• These properties make linear regression models
well-suited to estimating continuous quantities
that can take a wide range of values.
Linear Regression vs Logistic
Regression
• The same properties that make linear
regression models appropriate for modeling
unbounded, continuous targets make them
unsuitable for modeling binary outcomes such
as yes/no or good/bad.
• Logistic regression is a regression model
suitable for modeling binary outcomes
Modeling Binary Outcomes
• Modeling a binary outcome tries to answer
"What is the probability that this record
belongs to class one?“
• Because probabilities are numbers, the
modeling binary outcome is an estimation
task.
Logistic Function
• The goal is to estimate the probability that an
event occur, p.
• The first step is to transform that probability, p
into odds, by taking the ratio of p over 1-p.
Recall that odds and probability say exactly
the same thing, but while probabilities are
restricted to the range 0 to 1, odds go from 0
to infinity.
p
Odds 
1-p
Y – Cancer Outcome (1-
Improved, 0-Otherwise) log odds will
allow the
p = P(Improved) relationships
shown
Odds = p/(1-p) above to
become
X1 – Survival Rating by linear.
Physician
Logistic Function
• Setting up the log odds as the target variable
for the regression looks makes the regression
equation look like:

 p 
logit  ln    o  1 X
1 p 
• This is the logistic function
• A method called maximum likelihood is used,
to find the best-fit line for logistic regression.
Logistic Function
• Thus odds now becomes:

 p    o  1 X 
Odds     e
 1-p 
• Solving for the probability p requires a bit of
algebra. The result of which is:
  o  1 X 
1 e
p    o  1 X 
p   X 
1 e 1 e o 1
Interpreting the Coefficients
• We can use either the original or
exponentiated logistic coefficients for
interpretation. The two types of logistic
coefficient differ in that they reflect the
relationship of the independent variable with
the two forms of the dependent variable, as
shown here:
Logistic Coefficient Reflects Changes in ...
Original Logit (logged odds)
Exponentiated Odds
Coefficients of Metric (Interval/Ratio)
Independent Variables
Value
Exponentiated Coefficient (eb) .20 .50 1.0 1.5 1.8
eb – 1.0 -.80 -.50 0.0 .50 .80
Percentage change in odds -80% -50% 0% 50% 80%
Model’s predicted probability of Lower Zero Higher
occurrence
For any positive change in the Decrease None Increase
independent variable (X), the odds will

• If eb = 0.20:
– A one-unit change in X will reduce the odds of Y by 80%
– Thus reverse relationship
Coefficients of Metric (Interval/Ratio)
Independent Variables
Value
Exponentiated Coefficient (eb) .20 .50 1.0 1.5 1.8
eb – 1.0 -.80 -.50 0.0 .50 .80
Percentage change in odds -80% -50% 0% 50% 80%
Model’s predicted probability of Lower Zero Higher
occurrence
For any positive change in the Decrease None Increase
independent variable, the odds will

• If eb = 1.5:
– A one-unit change in X will increase the odds of Y by 50%
– Thus direct relationship
Coefficients of Metric (Interval/Ratio)
Independent Variables
Value
Exponentiated Coefficient (eb) .20 .50 1.0 1.5 1.8
eb – 1.0 -.80 -.50 0.0 .50 .80
Percentage change in odds -80% -50% 0% 50% 80%
Model’s predicted probability of Lower Zero Higher
occurrence
For any positive change in the Decrease None Increase
independent variable, the odds will

• If eb = 1.0:
– A one-unit change in X will not change the odds of Y
– Thus no relationship between X and Y
Coefficients of Nonmetric
(Categorical/Dummy) Independent
Variables
• Dummy variables represent a single category of a
nonmetric variable.
• It takes on just the values of 1 or 0, indicating the
presence or absence of a characteristic.
• Any time a dummy variable is used, it is essential
to note the reference or omitted category.
– Eg. Gender – 1 (Male), 0 (Female) => X1 = 1 or 0, the
omitted or reference category is Female
– Eg. Race – Malay, Chinese, Indian => X1 = 1 (Malay),
X2 = 2 (Chinese), the omitted or reference category is
Indian
Coefficients of Nonmetric
(Categorical/Dummy) Independent
Variables
• If the nonmetric variable is gender, the two
possibilities are male and female.
• The dummy variable can be defined as
representing males (i.e., value of 1 if male, 0 if
female) or females (i.e., value of 1 if female, 0
if male).
• Whichever way is chosen, however,
determines how the coefficient is interpreted.
Coefficients of Nonmetric
(Categorical/Dummy) Independent
Variables
• Let's assume that a 1 is given to females, making
the exponentiated coefficient represent the
percentage of the odds ratio of females
compared to males (Reference category).
• If the exponentiated coefficient is 1.25, then
females have 25 percent higher odds compared
to/than males (1.25 -1.0 = .25).
• Likewise, if the exponentiated coefficient is .80,
then the odds for females are 20 percent less (.80
– 1.0 = -.20) compared to/than males.
Example
A researcher is interested in how variables, such as GRE (Graduate Record Exam scores),
GPA (grade point average) and prestige of the undergraduate institution, effect
admission into graduate school. The outcome variable, admit/don't admit, is binary.

This data set has a binary response (outcome,

dependent) variable called admit, which is
equal to 1 if the individual was admitted to
graduate school, and 0 otherwise. There are
three predictor variables: gre, gpa, and rank.
We will treat the variables gre and gpa as
continuous. The variable rank takes on the
values 1 through 4. Institutions with a rank of 1
have the highest prestige, while those with a
rank of 4 have the lowest.
 p 
ln   5.5414  .00226GRE  .804GPA  1.5514 Rank1  .876 Rank 2  .2112 Rank 3
1 p 
X b e^b (e^b)-1 % change Interpretation
(odds ratio) in odds
GRE 0.00226 1.002263 0.002263 0.23 1 unit change in GRE will increase the odds of
admission to graduate school by 0.23%
GPA 0.804 2.234461 1.234461 123.45 1 unit change in GPA will increase the odds of
admission to graduate school by 123.45%
Rank 1 1.5514 4.718071 3.718071 371.81 Having attended an undergraduate institution with a
rank of 1, will increase the odds of admission to
graduate school by 371.81% compared to
If attended an undergraduate with a rank of 4
Rank 2 0.876 2.401275 1.401275 140.13 Having attended an undergraduate institution with a
rank of 2, will increase the odds of admission to
graduate school by 140.13% compared to
If attended an undergraduate with a rank of 4
Rank 3 0.2112 1.235159 0.235159 23.52 Having attended an undergraduate institution with a
rank of 3, will increase the odds of admission to
graduate school by 23.52% compared to
If attended an undergraduate with a rank of 4
 p 
ln   5.5414  .00226GRE  .804GPA  1.5514 Rank1  .876 Rank 2  .2112 Rank 3
 1  p 

Odds :
p ( 5.5414 .00226 GRE .804 GPA1.5514 Rank 1.876 Rank 2 .2112 Rank 3)
e
1 p

1
p  ( 5.5414 .00226 GRE .804 GPA1.5514 Rank 1.876 Rank 2 .2112 Rank 3)
1 e
X b obs 62 obs 64 obs 66 obs 79

GRE 0.00226 560 680 600 540

GPA 0.804 3.32 3.85 3.59 3.12

Rank 1 1.5514 0 0 0 1

Rank 2 0.876 0 0 1 0

Rank 3 0.2112 0 1 0 0

p(y=1): 0.1671 0.3323 0.3958 0.4351

Yhat: 0 0 0 0

1
p
1 e  ( 5.5414.00226GRE .804GPA1.5514Rank1.876Rank 2 .2112Rank3 )
Variable Selection for Multiple
Regression
• Forward Selection
– Forward selection starts with a set of input variables,
where the variable will be added to the model one at
a time.
– The first step is to create a separate regression model
for each input variable; if there are n input variables,
then the first step considers n different models with
one input variable. The variable whose model scores
best on some test becomes the first variable included
in the forward selection model.
– At each step, each variable that is NOT in the model is
tested for inclusion in the model. The most
singnificant of these variables is added to the model.
Variable Selection for Multiple
Regression
• Backward Elimination
– The backward elimination approach to variable
selection begins by creating a multiple regression
model using all n input variables.
(starts with fitting a model with all the input variables)
– Then using a statistical test, the worse/least significant
variable is removed/dropped from the model. Then
the model is refit without it. This process continues
until all remaining variables in the model are
statistically significant OR some stopping criterion,
such as a minimum number of variables desired, is
reached.
Variable Selection for Multiple
Regression
• Stepwise selection method
– Stepwise selection method approach combines
both forward selection and backward elimination.
Stepwise selection method allows the variable
added in earlier on in the model to be dropped
out and variables that are dropped at one point to
be added back in the model.
Example
Epping-Jordan, Compas, & Howell (1994)
We were interested in looking at cancer
outcomes as a function of psychological
variables—specifically intrusions and avoidance
behavior.

Variables
Outcome : 1 = Improved 0 = Worse
SurvRate : higher scores = better prognosis
Intrus
Avoid
Model Generation
Using all inputs

After removing intrus (not sig, p>.05)

Interpret the coefficients

X b e^b (e^b)-1 % change Interpretation

in odds
Suvrate -.082 0.92 -0.08 -8 1 unit change in Suvrate will decrease the
odds of Improved by 8%
Avoid .133 1.14 0.14 14 1 unit change in Avoid will increase the
odds of Improved by 14%
Write down the estimated equation

1
p    o  1 X 
1 e
1
p 1.196.082SURVRATE.133 AVOID 
1 e
Estimate the status of cancer outcome for 3 new observations
1
p 1.196.082SURVRATE.133 AVOID 
1 e
X b Obs Obs Obs
114 120 118
Suvrate -.082 15 91 14

Avoid .133 19 17 23

P (Y=1) 0.923 0.017 0.957

(Improved) 6 89 2
Y (predict) 1 0 1
Another Example
Mean of our response variable attending
self-help group (FYI)
• The sample mean of Y is the sum of the
number of successes (yes to attend) divided
by the sample size, n
• The sample mean is the proportion of
successful outcomes
• Thus, 44 said yes and n = 400, thus mean
proportion of yes is .11 or 11%
Odds ratio and % in odds change by age

• Age  =-.0586 and p-value (0.0072)<.01 (beta negative).

Thus, log odds of attending a self-help group decrease
as a person gets older
• Exp  = .9431 … the odds ratio [exp  <1] thus odds
decrease
• % change (in this case a reduction in) in odds of
attending for each additional year of age is
100 (exp  - 1) = 100 (.9431 – 1) = -0.0569 X 100 =
-5.69 % less likely each year one ages
Predicted probability of attending by age
• A point estimate for age 80 would be:
 p = e(-.0586)(80) / 1 + e(-.0586)(80) = .00912
 The probability of those 80 years of age attending a help
group is (0.9%) ~ 1%
• A point estimate for age 40 would be:
 p = e(-.0586)(40) / 1 + e(-.0586)(40) = .0875
 The probability of those 40 years of age attending a help
group is 8.75% ~ 9%
 1 X 1 
e
p  X 
1 e 1 1
Odds ratio and % change in odds by gender
• Gender  =1.2540 and p-value (0.0163) <.05. Thus, log
odds of attending a self-help group among females is
greater (reference category is male and beta is positive)
• Exp  = 3.5043 … odds of attending are 3.5 times as
large for females as they are for males [exp  >1]
• % change (in this case an increase in) in odds of
attending when a person is female is 100(exp  -
1)=100(3.50 – 1) = 250 % compared to male
Predicted probability of attending by gender

• A point estimate for females would be:

e(1.254)(1) / 1 + e(1.254)(1) = .77
Thus, the probability of attending among
females is 77%
• A point estimate for males would be:
e(1.254)(0) / 1 + e(1.254)(0) = .5
Thus, the probability of attending among
males is 50%

Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
Logistic Regression Analysis in R
No ratings yet
Logistic Regression Analysis in R
6 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
11 pages
Logit and Spss
No ratings yet
Logit and Spss
37 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Lecture 4
No ratings yet
Lecture 4
45 pages
Multinomial Logistic Regression
No ratings yet
Multinomial Logistic Regression
17 pages
Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Regression With Categorical Variables
No ratings yet
Regression With Categorical Variables
28 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Log Reg
No ratings yet
Log Reg
32 pages
Day 4
No ratings yet
Day 4
29 pages
Logistic Regression
100% (3)
Logistic Regression
41 pages
Regresion Logistica
No ratings yet
Regresion Logistica
71 pages
PSYC8010 Topic 9 Logistic Regression R
No ratings yet
PSYC8010 Topic 9 Logistic Regression R
47 pages
Logistic Regression Insights
No ratings yet
Logistic Regression Insights
54 pages
Logistic Regression Using SPSS Level1 MASH
No ratings yet
Logistic Regression Using SPSS Level1 MASH
7 pages
SAS Categorical Data Analysis Guide
100% (1)
SAS Categorical Data Analysis Guide
16 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
29 pages
Logistic Regression for Researchers
100% (2)
Logistic Regression for Researchers
51 pages
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
No ratings yet
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
45 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Assignment 2
No ratings yet
Assignment 2
11 pages
Binary Logistic Regression Guide
No ratings yet
Binary Logistic Regression Guide
30 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
L9 Logistical Regression Models Updated
No ratings yet
L9 Logistical Regression Models Updated
10 pages
5.1) Binary Logistic Regression
No ratings yet
5.1) Binary Logistic Regression
32 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
M8 Logreg
No ratings yet
M8 Logreg
10 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Module 2-Supervised Learning
No ratings yet
Module 2-Supervised Learning
74 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
12 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
4 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
19 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
SPSS Logistic Regression Analysis
No ratings yet
SPSS Logistic Regression Analysis
6 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Thesis Using Logistic Regression
100% (2)
Thesis Using Logistic Regression
7 pages
Regression Logistic 4
No ratings yet
Regression Logistic 4
51 pages
Week 8 - Logistic Regression
No ratings yet
Week 8 - Logistic Regression
67 pages
Logistic Regression ADA Xid-2911285 1 0SwZFA4qav
No ratings yet
Logistic Regression ADA Xid-2911285 1 0SwZFA4qav
98 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Ch4 Classifications24
No ratings yet
Ch4 Classifications24
42 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
Logistic Regression
No ratings yet
Logistic Regression
98 pages
Lecture 2.3.1
No ratings yet
Lecture 2.3.1
50 pages
79 LogisticReg - Cleaned
No ratings yet
79 LogisticReg - Cleaned
4 pages
Logistic Regression
0% (1)
Logistic Regression
71 pages
Logistic Regression (2022)
No ratings yet
Logistic Regression (2022)
44 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
CHAPTER 1 Arithmetic N Geometric Sequence
No ratings yet
CHAPTER 1 Arithmetic N Geometric Sequence
26 pages
Chapter 3.1 - Trade and Cash Discount
No ratings yet
Chapter 3.1 - Trade and Cash Discount
8 pages
Chapter 2 Introduction To Risk Management
No ratings yet
Chapter 2 Introduction To Risk Management
36 pages
Topic 4 Annuity
No ratings yet
Topic 4 Annuity
43 pages
Topic 4 Annuity
No ratings yet
Topic 4 Annuity
43 pages
Aeta Isarog
No ratings yet
Aeta Isarog
2 pages
VRV Xpress UsersManual PDF
No ratings yet
VRV Xpress UsersManual PDF
98 pages
Materials Management Master Data Transactions
No ratings yet
Materials Management Master Data Transactions
4 pages
Michelle Mismash Cover Letter Publish
No ratings yet
Michelle Mismash Cover Letter Publish
1 page
Teaching Plan For Breast Feeding
No ratings yet
Teaching Plan For Breast Feeding
4 pages
Mett-Bio Metallurgical Testing and Services
No ratings yet
Mett-Bio Metallurgical Testing and Services
8 pages
SHAYEARI - DUTTAOut of Business, Text2021-06-16out of Business, R.K. Narayan
100% (1)
SHAYEARI - DUTTAOut of Business, Text2021-06-16out of Business, R.K. Narayan
4 pages
Lausanne Peace Treaty
No ratings yet
Lausanne Peace Treaty
2 pages
RPNCE Study Guide April 2018 - Final
No ratings yet
RPNCE Study Guide April 2018 - Final
29 pages
Life Is Beautiful Analysis
No ratings yet
Life Is Beautiful Analysis
6 pages
Contoh Soal Mta Yang Lainnya PDF
100% (1)
Contoh Soal Mta Yang Lainnya PDF
65 pages
Legal Responses to Climate Change
No ratings yet
Legal Responses to Climate Change
26 pages
Week 2 Lesson 2
No ratings yet
Week 2 Lesson 2
26 pages
F7 - Interns-Performance-Appraisal-Form-updated
No ratings yet
F7 - Interns-Performance-Appraisal-Form-updated
2 pages
Furumoto New History of Psychology 1989 PDF
No ratings yet
Furumoto New History of Psychology 1989 PDF
26 pages
Childcare Enrollment & Questionnaire Form
No ratings yet
Childcare Enrollment & Questionnaire Form
3 pages
9 Stars
100% (1)
9 Stars
14 pages
Limit Fungsi Aljabar - Mathematics - Quizizz
No ratings yet
Limit Fungsi Aljabar - Mathematics - Quizizz
4 pages
Summer Training Project Report
No ratings yet
Summer Training Project Report
9 pages
Unit 8 Making Use of Electricity: Laboratory Activity 8.1 (p.81) 1 A B C 2 A B C 3 A B C
No ratings yet
Unit 8 Making Use of Electricity: Laboratory Activity 8.1 (p.81) 1 A B C 2 A B C 3 A B C
14 pages
SC Election Dispute: Jurisdiction Denied
No ratings yet
SC Election Dispute: Jurisdiction Denied
10 pages
ANGL - TA - Writing An Essay
No ratings yet
ANGL - TA - Writing An Essay
2 pages
Why Buildings Are Sentient and Evil (Extended Phenotype & Living Systems Theory) Agora Road's Macintosh Cafe
No ratings yet
Why Buildings Are Sentient and Evil (Extended Phenotype & Living Systems Theory) Agora Road's Macintosh Cafe
19 pages
Third-Order Nonlinearities in Semiconductors
No ratings yet
Third-Order Nonlinearities in Semiconductors
3 pages
Workbook Openmind LevelLLL
No ratings yet
Workbook Openmind LevelLLL
80 pages
Guide To Nursing S Social Policy Statement Understanding The Profession From Social Contract To Social Covenant 1st Edition Marsha Diane Mary Fowler Download
100% (2)
Guide To Nursing S Social Policy Statement Understanding The Profession From Social Contract To Social Covenant 1st Edition Marsha Diane Mary Fowler Download
151 pages
Sat B Inggris Kelas 4 Semester 2
No ratings yet
Sat B Inggris Kelas 4 Semester 2
2 pages
Kashish Kumar: Work Experience Skills
No ratings yet
Kashish Kumar: Work Experience Skills
1 page
Assignment (Villarica v. SSS)
100% (1)
Assignment (Villarica v. SSS)
2 pages
Sure 2016 Student Information Session
No ratings yet
Sure 2016 Student Information Session
17 pages

Lecture 7 Logistic Regression

Uploaded by

Lecture 7 Logistic Regression

Uploaded by

ASC399

This data set has a binary response (outcome,

GRE 0.00226 560 680 600 540

GPA 0.804 3.32 3.85 3.59 3.12

p(y=1): 0.1671 0.3323 0.3958 0.4351

After removing intrus (not sig, p>.05)

X b e^b (e^b)-1 % change Interpretation

P (Y=1) 0.923 0.017 0.957

• Age  =-.0586 and p-value (0.0072)<.01 (beta negative).

• A point estimate for females would be:

You might also like