KEMBAR78
Ch13slides Generalized Linear Models | PDF | Logistic Regression | Linear Regression
0% found this document useful (0 votes)
89 views24 pages

Ch13slides Generalized Linear Models

1) The document discusses generalized linear models (GLMs), which extend linear regression by allowing the response variable to have a general distribution from the exponential family and linking the mean response to predictors through link functions. 2) Logistic regression is presented as a specific GLM where the response has a Bernoulli distribution and the logit link function is used. This models the probability of an event as a function of predictors. 3) Parameter estimation in GLMs is typically done by maximum likelihood, fitting the model parameters to maximize the likelihood of observing the sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views24 pages

Ch13slides Generalized Linear Models

1) The document discusses generalized linear models (GLMs), which extend linear regression by allowing the response variable to have a general distribution from the exponential family and linking the mean response to predictors through link functions. 2) Logistic regression is presented as a specific GLM where the response has a Bernoulli distribution and the logit link function is used. This models the probability of an event as a function of predictors. 3) Parameter estimation in GLMs is typically done by maximum likelihood, fitting the model parameters to maximize the likelihood of observing the sample data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

San José State University

Math 261A: Regression Theory & Methods

Generalized Linear Models (GLMs)

Dr. Guangliang Chen


This lecture is based on the following textbook sections:

• Chapter 13: 13.1 – 13.3

Outline of this presentation:

• What is a GLM?

• Logistic regression

• Poisson regression
Generalized Linear Models (GLMs)

What is a GLM?
In ordinary linear regression, we assume that the response is a linear
function of the regressors plus Gaussian noise:

y = β0 + β1 x1 + · · · + βk xk + 
|{z} ∼ N (x0 β, σ 2 )
| {z }
linear form x0 β N (0,σ 2 ) noise

The model can be reformulate in terms of

• distribution of the response: y | x ∼ N (µ, σ 2 ), and

• dependence of the mean on the predictors: µ = E(y | x) = x0 β

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 3/24
Generalized Linear Models (GLMs)

beta=(1,2)

5
4
3
β0 + β1 x b

y
2
y

1
0
−1
0.0 0.2 0.4 0.6 0.8 1.0
x x

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 4/24
Generalized Linear Models (GLMs)

Generalized linear models (GLM) extend linear regression by allowing


the response variable to have

• a general distribution (with mean µ = E(y | x)) and

• a mean that depends on the predictors through a link function g:

That is,
g(µ) = β 0 x
or equivalently,
µ = g −1 (β 0 x)

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 5/24
Generalized Linear Models (GLMs)

In GLM, the response is typically assumed to have a distribution in the


exponential family, which is a large class of probability distributions that
have pdfs of the form f (x | θ) = a(x)b(θ) exp(c(θ) · T (x)), including

• Normal - ordinary linear regression

• Bernoulli - Logistic regression, modeling binary data

• Binomial - Multinomial logistic regression, modeling general cate-


gorical data

• Poisson - Poisson regression, modeling count data

• Exponential, Gamma - survival analysis

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 6/24
Generalized Linear Models (GLMs)

In theory, any combination of the response distribution and link function


(that relates the mean response to a linear combination of the predictors)
specifies a generalized linear model.

Some combinations turn out to be much more useful and mathematically


more tractable than others in practice.

Response distribution Link function g(µ) Use


Normal Identity µ   OLS
µ
Bernoulli Logit log 1−µ Logistic regression
Poisson Log log(µ) Poisson regression
Exponential/Gamma Inverse −1/µ Survival analysis

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 7/24
Generalized Linear Models (GLMs)

Applications:

• Logistic Regression: Predict the likelihood that a consumer of an


online shopping website will buy a specific item (say, a camera)
within the next month based on the consumer’s purchase history.

• Poisson regression: Modeling the number of children a couple has


as a function of their ages, numbers of siblings, income, education
levels, etc.

• Exponential: Modeling the survival time (time until death) of


patients in a clinical study as a function of disease, age, gender, type
of treatment etc.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 8/24
Generalized Linear Models (GLMs)

Logistic regression
Logistic regression is a GLM that combines the Bernoulli distribution (for
the response) and the logit link function (relating the mean response to
predictors):

µ
 
log = β0 x (y ∼ Bernoulli(p))
1−µ

Remark. Since µ = E(y | x) = p, we have

p
 
log = β0 x (y ∼ Bernoulli(p))
1−p
p p
where p: probability of success, 1−p : odds, log( 1−p ): log-odds.

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 9/24
Generalized Linear Models (GLMs)

Solving for µ (and also p), we obtain that


1 1
µ= = σ(β 0 x), s(z) = ,
1 + e−β0 x 1 + e−z
where s(·) is the sigmoid function, also called the logistic function.

Properties of the sigmoid function:


1.0
0.8

• s(0) = 0.5
0.6

• 0 < s(z) < 1 for all z


mu
0.4

• s(z) monotonically increases


0.2

as z goes from −∞ to +∞
0.0

−4 −2 0 2 4
z

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 10/24
Generalized Linear Models (GLMs)

For fixed β (model parameter) and

1.0
each given x (sampled location),

0.8
µ = p = s(z), z = β0 x

0.6
mu
0.4
has the following interpretations:

0.2
• mean response

0.0
−4 −2 0 2 4
z
E(y | x, β) = s(z)

Population model:
• probability of success:
y | x, β ∼ Bernoulli(p = s(β 0 x))
P (y = 1 | x, β) = s(z)
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 11/24
Generalized Linear Models (GLMs)

A sample from the logistic regression model, with p = s(−3 + 2x)

beta=(−3,2)
1.0
0.8
0.6
y
0.4
0.2
0.0

0 1 2 3 4
x

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 12/24
Generalized Linear Models (GLMs)

Parameter estimation via MLE

Given a data set (x1 , y1 ), . . . , (xn , yn ),

1.0
fitting a logistic regression model is

0.8
equivalent to choosing the value of

0.6
β such that the mean response

y
0.4
0.2
µ = s(β 0 x)
0.0 0 1 2 3 4
x

matches the sample as “closely” as


possible.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 13/24
Generalized Linear Models (GLMs)

Mathematically, the best β is usually found by maximizing the likelihood


of the sample:
n
Y
L(β | y1 , . . . , yn ) = f (y1 , . . . , yn | β) = f (yi | β)
i=1

where f (yi | β) is the probability function of the ith observation:



p , yi = 1
i
f (yi | β) = pyi i (1 − pi )1−yi =
1 − p yi = 0
i

and
1
pi =
1 + e−β0 xi
However, there is no closed-form solution, and the optimal β has to be
computed numerically.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 14/24
Generalized Linear Models (GLMs)

Prediction by logistic regression

Once the optimal parameter β̂ is found, the mean response at a new


location x0 is
1
E(y | x0 , β̂) =
1 + e−β̂0 x0

Note that this would not be our exact prediction at x0 (why?).

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 15/24
Generalized Linear Models (GLMs)

To make a prediction at x0 based on the estimates β̂, consider


1
y0 | x0 , β̂ ∼ Bernoulli(p̂0 ), p̂0 = .
1 + e−β̂0 x0

The prediction at x0 is

1, if p̂0 > 0.5
ŷ0 =
0, if p̂0 < 0.5

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 16/24
Generalized Linear Models (GLMs)

R scripts

x = c(162, 165, 166, 170, 171, 168, 171, 175, 176, 182, 185)
y = c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)
model ← glm(y∼x,family=binomial(link=’logit’))

p = model$fitted.values
# p = [0.0168, 0.0708, 0.1114, 0.4795, 0.6026, 0.2537, 0.6026, 0.9176,
0.9483, 0.9973, 0.9994]

beta = model$coefficients # beta = [-84.8331094 0.4985354]

fitted.prob ← predict(model,data.frame(x=c(168,170,173)),type=’response’)
# fitted.prob = [0.2537, 0.4795 0.8043 ]

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 17/24
Generalized Linear Models (GLMs)

p=1/(1+exp(−84.8331+0.4985 x))
1.0
0.8
0.6
p
0.4
0.2
0.0

160 165 170 175 180 185 190


x

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 18/24
Generalized Linear Models (GLMs)

Other models for binary response data

Instead of using the logit link function,


1
p=
1 + e−β0 x
to force the estimated probabilities to lie between 0 and 1:

y | x, β ∼ Bernoulli(p)

one could use

• Probit: p = Φ(β 0 x), where Φ is the cdf of standard normal.

• Complimentary log-log: p = 1 − exp(− exp(β 0 x))


Dr. Guangliang Chen | Mathematics & Statistics, San José State University 19/24
Generalized Linear Models (GLMs)

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 20/24
Generalized Linear Models (GLMs)

Poisson regression
Poisson regression is a GLM that combines the Poisson distribution (for the
response) and the log link function (relating mean response to predictors):

log (µ) = β 0 x (y ∼ Poisson(λ))

Remark. Since µ = E(y | x) = λ, we have


0
log λ = β 0 x or λ = eβ x

That is,
0
y | x, β ∼ Poisson(λ = eβ x )

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 21/24
Generalized Linear Models (GLMs)

beta=(1,−3)
80

sample
true model
fitted model
60
40
y
20
0

−1.0 −0.8 −0.6 −0.4 −0.2 0.0


x

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 22/24
Generalized Linear Models (GLMs)

R code

poisson.model ← glm(y∼x,family=poisson(link=’log’))

poisson.model$coefficients

(Intercept) x
1.003291 -3.019297

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 23/24
Generalized Linear Models (GLMs)

Summary and beyond


We talked about the concept of generalized linear models and its two
special instances:

• Logistic regression: logit link function + Bernoulli distribution

• Poisson regression: log link function + Poisson distribution

Note that parameter estimation for GLM is through MLE; prediction is


based on the mean (plus some necessary adjustments).

Further learning on logistic and multinomial regression:


http://www.sjsu.edu/faculty/guangliang.chen/Math251F18/lec5logistic.pdf

Dr. Guangliang Chen | Mathematics & Statistics, San José State University 24/24

You might also like