Risk Management
Lecture 2-2: Estimation and Hypothesis Testing
Chen Tong
SOE & WISE, Xiamen University
September 12, 2024
Chen Tong (SOE&WISE) Risk Management September 12, 2024 1 / 62
Estimation and Hypothesis Testing
This chapter includes:
1. Statistical Inference and Estimation
2. Unbiasedness, Efficiency and Consistency
3. Interval Estimation and Confidence Intervals
4. Hypothesis Testing
5. LLN and CLT
6. ∗ ∗ ∗ Maximum Likelihood Estimation (MLE)
Chen Tong (SOE&WISE) Risk Management September 12, 2024 2 / 62
1. Statistical Inference and Estimation
Chen Tong (SOE&WISE) Risk Management September 12, 2024 3 / 62
Statistical Inference
▶ We usually use a sample from a population (e.g. all working adults
in China urban labor market) to draw inferences about the properties
of this population.
▶ Since we do not know the parameters of interest in the population
(such as the expected value and the variance), we use the sample to
estimate these parameters.
▶ We want to use a procedure that allows us to estimate the
parameters "as precise as possible"
▶ We want to test whether certain hypotheses are in line with
information included in the sample
Chen Tong (SOE&WISE) Risk Management September 12, 2024 4 / 62
Estimator and Estimate
▶ Given a random sample {Y1 , Y2 , . . . , Yn } drawn from a population
distribution that depends on an unknown parameter θ, and estimator
W of θ is a rule that assigns each possible outcome of the sample a
value of θ
▶ The rule does not depend on the data actually obtained
▶ An estimator W of a parameter θ can be expressed as a function of
{Y1 , Y2 , . . . , Yn }
W = h (Y1 , Y2 , . . . , Yn )
▶ W is a random variable because it depends on Y1 , Y2 , . . . , Yn
▶ Given the actual data {y1 , y2 , . . . , yn }, we obtain the point estimate
w = h (y1 , y2 , . . . , yn )
Chen Tong (SOE&WISE) Risk Management September 12, 2024 5 / 62
Estimator and Estimate
▶ Example: Given the random sample {Y1 , Y2 , . . . , Yn }, a natural
estimator of the mean µ is the average Ȳ of a random sample:
n
1X
Ȳ = Yi
n
i=1
▶ For actual data outcomes {y1 , y2 , . . . , yn }, the estimate is the
average in the sample:
n
1X
ȳ = yi
n
i=1
Chen Tong (SOE&WISE) Risk Management September 12, 2024 6 / 62
Estimator and Estimate
▶ Since we want to use an estimator as precise as possible, we have to
define what "as precise as possible" means
▶ Most important criteria:
- Unbiasedness
- Efficiency
- Consistency
Chen Tong (SOE&WISE) Risk Management September 12, 2024 7 / 62
2. Unbiasedness, Efficiency and Consistency
Chen Tong (SOE&WISE) Risk Management September 12, 2024 8 / 62
Unbiasedness
▶ An estimator W of θ is unbiased if
E (W ) = θ
⇒ If we could indefinitely draw random samples on Y from the
population and compute an estimate each time and then average
these estimates over all random samples, we would obtain θ
Chen Tong (SOE&WISE) Risk Management September 12, 2024 9 / 62
Unbiasedness
▶ It can be shown that Ȳ is an unbiased estimator of µ :
" n # " n # n
1X 1 X 1X
E (Ȳ ) = E Yi = E Yi = E (Yi )
n n n
i=1 i=1 i=1
n
1X 1
= µ = nµ = µ
n n
i=1
▶ For hypothesis testing, we need to estimate the variance σ 2 from a
population with mean µ
Chen Tong (SOE&WISE) Risk Management September 12, 2024 10 / 62
Unbiasedness
▶ Letting {Y1 , Y2 , . . . , Yn } denote the random sample from the
population with E (Y ) = µ and Var(Y ) = σ 2 , the estimator of σ 2 is
given by the sample variance
n
1 X 2
S2 = Yi − Ȳ
n−1
i=1
▶ It can be shown that S 2 is unbiased estimator for σ 2 :
E S 2 = σ2
Chen Tong (SOE&WISE) Risk Management September 12, 2024 11 / 62
Chen Tong (SOE&WISE) Risk Management September 12, 2024 12 / 62
Efficiency
▶ There are usually more than one unbiased estimator. They could
have a different sampling distribution
▶ If the sampling distribution of an estimator is more dispersed, it is
more likely that we will obtain a random sample that yields an
estimate very far from θ. ⇒ We need to rely on the variance of an
estimator
▶ If W1 and W2 are two unbiased estimators of θ, W1 is efficient
relative to W2 when Var (W1 ) ≤ Var (W2 ) for all θ (with strict
inequality for at least one value of θ)
▶ Comparing variances is difficult if we do not restrict our attention to
unbiased estimators because we could always use a trivial estimator
with variance zero that is biased.
Chen Tong (SOE&WISE) Risk Management September 12, 2024 13 / 62
Chen Tong (SOE&WISE) Risk Management September 12, 2024 14 / 62
Sampling Variance
▶ Var(Ȳ ) is called the sampling variance because it is the variance
associated with the sampling distribution
▶ The sampling variance is a constant, not a random variable
▶ The variance of the sample average Ȳ is given by
n
! n
1X 1 X 1 σ2
Var(Ȳ ) = Var Yi = 2 Var (Yi ) = 2 nσ 2 =
n n n n
i=1 i=1
⇒ Var(Ȳ ) gets smaller if the sample size n increases
Chen Tong (SOE&WISE) Risk Management September 12, 2024 15 / 62
Consistency
▶ We can study the asymptotic properties of estimators for large
sample sizes, i.e. we can approximate the features of the sampling
distribution of an estimator for large sample sizes n
▶ We usually want to know the distance of an estimator from the
"true" parameter if the sample size increases indefinitely
▶ Let Wn be an estimator of θ based on the sample Y1 , Y2 , . . . , Yn of
size n Then, Wn is a consistent estimator of θ if for every ε > 0
(even a very small one),
P (|Wn − θ| > ε) → 0 as n → ∞
▶ Alternative notation: plim (Wn ) = θ (the probability limit of Wn is
θ) ⇒ The distribution of Wn becomes more concentrated about θ if
n increases
Chen Tong (SOE&WISE) Risk Management September 12, 2024 16 / 62
Consistency
▶ Unbiased estimators are not necessarily consistent, but those whose
variances shrink to zero as the sample size increases are consistent
⇒ If Wn is an unbiased estimator of θ and Var (Wn ) → 0 as
n → ∞, then plim (Wn ) = θ
▶ The sample average of Ȳ is consistent: Since Var Ȳn = σ 2 /n for
any sample size n ⇒ Var Ȳn → 0 as n → ∞, ⇒ Ȳ is a consistent
estimator of µ
Chen Tong (SOE&WISE) Risk Management September 12, 2024 17 / 62
Chen Tong (SOE&WISE) Risk Management September 12, 2024 18 / 62
3. Interval Estimation and Confidence Intervals
Chen Tong (SOE&WISE) Risk Management September 12, 2024 19 / 62
Interval Estimation and Confidence Intervals
▶ A point estimate provides no information about how close the
estimate is "likely" to be to the population parameter
▶ We cannot know how close an estimate for a particular sample is to
the population parameter because the population value is unknown
▶ However, we can obtain an interval estimate that contains the
population parameter with a certain probability
Chen Tong (SOE&WISE) Risk Management September 12, 2024 20 / 62
▶ Suppose the population has a normal distribution N µ, σ 2 and let
{Y1 , Y2 , . . . , Yn } be a random sample from this population. Then
the sample average has a normal distribution: Ȳ ∼ N µ, σ 2 /n
▶ The standardized sample average Z̄ is given by
Ȳ − µ
Z̄ = √ , with Z̄ ∼ N(0, 1)
(σ/ n)
Chen Tong (SOE&WISE) Risk Management September 12, 2024 21 / 62
▶ We may obtain a confidence interval about Z̄ by choosing a certain
confidence level (typically chosen: 95% or 99% )
▶ In general, this confidence level is 1 − α, where α is called
significance level
▶ Formally, we look for critical values −dα/2 and dα/2 , so that
P −dα/2 < Z̄ < dα/2 = 1 − α
Chen Tong (SOE&WISE) Risk Management September 12, 2024 22 / 62
Interval Estimation and Confidence Intervals
▶ Since the event
Ȳ − µ
−dα/2 < √ < dα/2
(σ/ n)
is identical to the event
√ √
Ȳ − dα/2 σ/ n < µ < Ȳ + dα/2 σ/ n,
it follows that
√ √
P Ȳ − dα/2 σ/ n < µ < Ȳ + dα/2 σ/ n = 1 − α
√ √
▶ The random interval Ȳ − dα/2 σ/ n, Ȳ + dα/2 σ/ n contains the
population mean µ with a probability 1 − α
Chen Tong (SOE&WISE) Risk Management September 12, 2024 23 / 62
Interval Estimation and Confidence Intervals
▶ We obtain an interval estimate by plugging in the sample outcome
of the average, ȳ and the sample standard deviation s :
√ √
ȳ − dα/2 s/ n, ȳ + dα/2 s/ n ,
q
1
Pn 1
Pn 2
with ȳ = n i=1 yi and s = n−1 i=1 (yi − ȳ )
▶ Unfortunately, the interval estimate does not preserve the confidence
level 1 − α, because s depends
on the particular
√ sample ⇒√ In
other
words, the random interval Ȳ − dα/2 σ/ n, Ȳ + dα/2 σ/ n no
longer contains µ with probability 1 − α if we replace σ with a
random variable S
Chen Tong (SOE&WISE) Risk Management September 12, 2024 24 / 62
Interval Estimation and Confidence Intervals
▶ Solution: We consider a standardized sample average that has a t
distribution with n − 1 degrees of freedom,
Ȳ − µ
√ ∼ tn−1
S/ n
where S is the sample standard deviation
√ √
⇒ P Ȳ − cα/2 S/ n < µ < Ȳ + cα/2 S/ n = 1 − α,
where cα/2 is the critical value of the t distribution
√
▶ The confidence interval may be written as Ȳ ± cα/2 (S/ n)
Chen Tong (SOE&WISE) Risk Management September 12, 2024 25 / 62
Chen Tong (SOE&WISE) Risk Management September 12, 2024 26 / 62
Interval Estimation and Confidence Intervals
Example:
▶ Changes in worker productivity on "scrap rates" for a sample of
Michigan manufacturing firms
▶ Was there a significant change in the scrap rate between 1987 and
1988?
▶ n = 20 ⇒ The critical value for a 95% confidence interval for
n − 1 = 19 degrees of freedom is 2.093, which is the 97.5th
percentile in a t19 distribution (see Wooldridge, page 825)
√
▶ ȳ = −1.15, se(ȳ ) = s/ n = .54
▶ The confidence interval for the mean change in scrap rates µ is
[ȳ ± 2.093 se(ȳ )]
⇒ The 95% confidence interval is [−2.28, −.02]
⇒ The average change in scrap rates is statistically significant (i.e.
significantly different from zero) at a significance level of 5%
Chen Tong (SOE&WISE) Risk Management September 12, 2024 27 / 62
4. Hypothesis Testing
Chen Tong (SOE&WISE) Risk Management September 12, 2024 28 / 62
▶ We often want to test a certain hypothesis to learn something about
the "true" value θ
▶ Formally, we want to test whether θ is significantly different from a
certain value µ0 ,
H0 : θ = µ0
▶ Since µ0 = 0 is the most common hypothesis, H0 is called null
hypothesis
▶ The alternative hypothesis is
H1 : θ ̸= µ0
Chen Tong (SOE&WISE) Risk Management September 12, 2024 29 / 62
▶ If the value µ0 does not lie within the calculated confidence interval,
then we reject the null hypothesis
▶ If the value µ0 lies within the calculated confidence interval, then we
fail to reject the hypothesis
▶ In both cases, there is a certain risk that our conclusion is wrong
▶ We can reject the null hypothesis when it is in fact true (Type I
error)
▶ We can fail to reject the null hypothesis when it is actually false
(Type II error)
▶ We usually address these problems by the choice of the significance
level α
▶ We will never know with certainty whether we committed an error
Chen Tong (SOE&WISE) Risk Management September 12, 2024 30 / 62
▶ Testing hypotheses about the mean µ from a N(µ, σ) distribution is
straightforward:
▶ Null hypothesis:
H0 : µ = µ0
▶ Alternative hypotheses:
A. H1 : µ > µ0 (one-sided hypothesis)
B. H1 : µ < µ0 (one-sided hypothesis)
C. H1 : µ ̸= µ0 (two-sided hypothesis)
Chen Tong (SOE&WISE) Risk Management September 12, 2024 31 / 62
Testing hypotheses about the mean µ from a N(µ, σ) distribution is
straightforward:
A. One-tailed test: µ > µ0 : We reject H0 in favor of H1 when the value
of the sample average, ȳ , is "sufficiently" greater than µ0 :
ȳ −µ0
1. Calculate the t-statistic: t = se(ȳ )
2. Compare t with the critical value c for a significance level of 5%
⇒ If n is large: c = 1.645
⇒ If t > c, we reject H0 at a significance level of 5%
⇒ If t < c, we fail to reject H0 at a significance level of 5%
Chen Tong (SOE&WISE) Risk Management September 12, 2024 32 / 62
B. One-tailed test: µ < µ0 : We reject H0 in favor of H1 when the value
of the sample average, ȳ , is "sufficiently" smaller than µ0
⇒ If t < −c, we reject H0 at a significance level of 5%
⇒ If t > −c, we fail to reject H0 at a significance level of 5%
C. Two-tailed test: µ ̸= µ0 : We reject H0 in favor of H1 when the value
of the sample average, ȳ , is far from µ0 in absolute value
⇒ If |t| > c, we reject H0 at a significance level of 5%
⇒ If |t| < c, we fail to reject H0 at a significance level of 5%
Chen Tong (SOE&WISE) Risk Management September 12, 2024 33 / 62
Chen Tong (SOE&WISE) Risk Management September 12, 2024 34 / 62
Chen Tong (SOE&WISE) Risk Management September 12, 2024 35 / 62
A. One-tailed test: µ > µ0 : If the significance level is α = .05 = 5%,
then the critical value c is the 100(1 − α) = 95th percentile in the tn−1
distribution
B. One-tailed test: µ < µ0 : If the significance level is α = .05 = 5%,
then the critical value −c is the 100α = 5th percentile in the tn−1
distribution
C. Two-tailed test: µ ̸= µ0 : If the significance level is α = .05 = 5%,
then the critical value c is the 100(1 − α/2) = 97.5th in the tn−1
distribution
Chen Tong (SOE&WISE) Risk Management September 12, 2024 36 / 62
p-value
To provide additional information, we could ask the question: What is
the largest significance level at which we could carry out the test and still
fail to reject the null hypothesis?
⇒ We can consider the p-value of a test:
▶ Calculate the t-statistic t
▶ The largest significance level at which we would fail to reject H0 is
the significance level associated with using t as our critical value:
p-value = 1 − Φ(t),
where Φ(·) denotes the standard normal cdf (we assume that n is
large enough to treat the test statistic as having a standard normal
distribution)
Two-tailed test:
p-value = 2(1 − Φ(|t|))
Chen Tong (SOE&WISE) Risk Management September 12, 2024 37 / 62
Chen Tong (SOE&WISE) Risk Management September 12, 2024 38 / 62
General approach:
1. State H0 (µ = µ0 )
2. State H1 (µ < µ0 , µ > µ0 or µ ̸= µ0 )
3. If necessary, calculate ȳ , s and se(ȳ )
ȳ −µ0
4. Calculate the t-statistic: t = se(ȳ )
5. Find the critical value which depends on (i) the significance level,
(ii) the alternative hypothesis (i.e. one-tailed or two-tailed test) and
(iii) the degrees of freedom (n − 1)
6. Compare t with c for a given significance level α Reject or fail to
reject the null hypothesis. The interpretation depends on H1 :
µ > µ0 (one-tailed test): reject H0 if t > c µ < µ0 (one-tailed test):
reject H0 if t < −c µ ̸= µ0 (two-tailed test): reject H0 if |t| > c
7. (If
requested:) Calculate the p-value
1 − Φ(t) if one-tailed test
=
2(1 − Φ(|t|)) if two-tailed test
Chen Tong (SOE&WISE) Risk Management September 12, 2024 39 / 62
Example
1. H0 : µ = 0
2. H1 : µ ̸= 0
3. ȳ = −1.15, se(ȳ ) = .54
ȳ −0 −1.15
4. t = se(ȳ ) = 0.54 ≈ −2.13
5. critical value:
(i) significance level: 5%
(ii) two-tailed test
(iii) degrees of freedom: n − 1 = 20 − 1 = 19
⇒ c = 2.093
6. Reject H0 if |t| > c : 2.13 > 2.093 ⇒ We reject H0 at a significance
level of 5%
7. The smallest significance level at which we would reject H0 ?
⇒ p-value = 2[1 − Φ(|t|)] = 2[1 − Φ(|2.13|)] ≈ .033
⇒ We would still reject H0 at a significance level of 3.3%
Chen Tong (SOE&WISE) Risk Management September 12, 2024 40 / 62
5. LLN and CLT
Chen Tong (SOE&WISE) Risk Management September 12, 2024 41 / 62
The Law of Large Number (LLN) for i.i.d data
▶ For n obervations of i.i.d (independent and identically distributed)
data X1 , X2 , X3 , ... Xn , with E(X ) = µ and Var(X ) = σ 2 < ∞,
then we have
n
1X P
Xi −→ E(X ) = µ
n
i=1
▶ How to proof it?
Chen Tong (SOE&WISE) Risk Management September 12, 2024 42 / 62
Proof of LLN for i.i.d data
▶ (Chebyshev inequality) For ∀ε > 0,
σ2
P{|X − µ| ≥ ε} ≤
ε2
▶ How to proof it?
|X − µ|2
Z Z
P{|X − µ| ≥ ε} = f (x)dx ≤ f (x)dx
|X −µ|≥ε |X −µ|≥ε ε2
Z
1
≤ |X − µ|2 f (x)dx
ε2
σ2
=
ε2
Chen Tong (SOE&WISE) Risk Management September 12, 2024 43 / 62
Proof of LLN for i.i.d data (cont.)
▶ Using Chebyshev inequality, we have
Var(X̄ ) σ2
P{|X̄ − µ| ≥ ε} ≤ =
ε2 nε2
that means
lim P{|x̄ − µ| ≥ ε} = 0
n→∞
▶ The key is the computation of Var(X̄ ): for i.i.d X , we have
n
! n
!
1X 1 X Var(Xi )
Var(X̄ ) = Var Xi = 2 Var Xi =
n n n
i=1 i=1
▶ What is the case for dependent data?
Chen Tong (SOE&WISE) Risk Management September 12, 2024 44 / 62
The Central Limit Theorem for i.i.d data
▶ For n obervations of i.i.d (independent and identically distributed)
data X1 , X2 , X3 , ... Xn , with E(X ) = µ and Var(X ) = σ 2 < ∞,
then we have
√ X̄ − µ d
n −→ N(0, 1)
σ
▶ How to proof it? (self-reading)
Chen Tong (SOE&WISE) Risk Management September 12, 2024 45 / 62
Proof of CLT for i.i.d data
▶ The characteristic function (CF) of Y is defined as
√
φ(t) = E(e itY ), i = −1
▶ Define Yn as
n n
√ X̄ − µ n(X̄ − µ) X (Xi − µ) X ηi
Yn = n = √ = √ = √
σ nσ nσ nσ
i=1 i=1
▶ The CF of Yn is
n
t
φ(t) = E e itYn = ϕ
√
σ n
where ϕ(t) is the CF of ηi
Chen Tong (SOE&WISE) Risk Management September 12, 2024 46 / 62
Proof of CLT for i.i.d data (cont.)
▶ Expanding ϕ(t) at zero using Taylor-expansion
2 2 !
ϕ′′ (0)
t ′ it it it
ϕ √ = ϕ(0) + ϕ (0) √ + √ +o √
σ n σ n 2! σ n σ n
2 !
t2
it
=1+0− +o √
2n σ n
therefore,
2
2
2 !!(− 2n
t2
)× − t2
t it t2
φ(t) = 1− +o √ = e − 2 , n → +∞
2n σ n
t2
and e − 2 is the CF of N(0, 1).
Chen Tong (SOE&WISE) Risk Management September 12, 2024 47 / 62
▶ To derive the LLN and CLT for serially dependent data (or time
series data), we need more concepts and assumptions on the
stochastic process, including:
▶ The Mean and Autocovariance
▶ Stationarity
▶ Ergodicity
Chen Tong (SOE&WISE) Risk Management September 12, 2024 48 / 62
6. Maximum Likelihood Estimation (MLE)
Chen Tong (SOE&WISE) Risk Management September 12, 2024 49 / 62
Motivating Examples: Time Invariant Model
▶ Time Invariant Model
yt = σzt
where σ is the scale parameter and zt ∼ N(0, 1). Thus
yt ∼ N 0, σ 2 . The density function of yt is
y2
1
f (yt ; θ) = √ exp − t 2
2πσ 2 2σ
Chen Tong (SOE&WISE) Risk Management September 12, 2024 50 / 62
Motivating Examples: Count Model
▶ Consider a time series of counts from a Poission distribution
θy exp[−θ]
f (y ; θ) = , y = 0, 1, 2, 3 . . . .
y!
where θ > 0 is an unknown parameter.
Chen Tong (SOE&WISE) Risk Management September 12, 2024 51 / 62
Motivating Examples: Linear Regression Model
▶ Consider the regression model
yt = βxt + σzt , zt ∼ iid N(0, 1)
where
xt is an explanatory variable that is independent of zt and
θ = β, σ 2 . The distribution of y conditional on xt is still normal
with mean βxt and variance σ 2
" #
2
1 (yt − βxt )
f (yt | xt ; θ) = √ exp −
2πσ 2 2σ 2
Chen Tong (SOE&WISE) Risk Management September 12, 2024 52 / 62
Motivating Examples: Autoregressive Model
▶ A first-order autoregressive model, denoted AR(1), is
yt = ρyt−1 + ut , ut ∼ iidN 0, σ 2
with |ρ| < 1, and θ = ρ, σ 2 . The distribution of yt conditional on
yt−1 is normal with mean ρyt−1 and variance σ 2
" #
2
1 (yt − ρyt−1 )
f (yt | yt−1 ; θ) = √ exp −
2πσ 2 2σ 2
Chen Tong (SOE&WISE) Risk Management September 12, 2024 53 / 62
Joint Probability Distribtion
▶ The joint probability pdf for a sample of T observation is
f (y1 , y2 , . . . yT ; ψ)
with ψ is the vector of parameters.
Chen Tong (SOE&WISE) Risk Management September 12, 2024 54 / 62
Joint Probability Distribtion
▶ Independent Case: yt are independent
T
Y
f (y1 , y2 , . . . yT ; θ) = f (yt ; θ)
t=1
▶ Dependent Case
T
Y
f (y1 , y2 , . . . yT ; θ) = f (y1 ; θ) f (yt | yt−1 , yt−2 . . . , y1 ) .
t=2
we have
f (yt | yt−1 , yt−2 . . . , y1 ) = f (yt | Ft−1 )
Chen Tong (SOE&WISE) Risk Management September 12, 2024 55 / 62
Maximum Likelihood Framework
▶ Maximum Likelihood Principle:
▶ The maximum likelihood estimator is to find value of θ which is
"most likely" to have generate the observed data.
θ = argmin f (y1 , y2 , . . . yT ; θ)
▶ Maximum Likelihood Framework
▶ Log-likelihood function
▶ Gradient
▶ Hession
Chen Tong (SOE&WISE) Risk Management September 12, 2024 56 / 62
Log-likelihood function:
▶ Log-likelihood function:
ln LT (θ) = log f (y1 , y2 , . . . yT ; θ)
▶ The maximum likelihood estimator (MLE) of θ is defined as the
value of θ, denoted θ̂, that maximizes the log-likelihood function
ln LT (θ).
Chen Tong (SOE&WISE) Risk Management September 12, 2024 57 / 62
Log-likelihood function
▶ Poisson Distribution: Let {y1 , y2 . . . yT } be iid observations from a
Poisson distribution
θy exp[−θ]
f (y ; θ) = , y = 0, 1, 2, 3 . . . .
y!
The log-likelihood function for the sample is
T
X
ln LT (θ) = [yt ln θ − θ − ln (yt !)]
t=1
Chen Tong (SOE&WISE) Risk Management September 12, 2024 58 / 62
Gradient (or Score)
▶ Differentiating ln LT (θ) with respect to a (K × 1) parameter vector
θ yields a (K × 1) gradient vector, also known as the score, given by
∂ ln LT (θ)
∂θ1 T
∂ ln LT (θ) X
GT (θ) = = ... = gt (θ)
∂θ ∂ ln LT (θ) t=1
∂θK
∂ ln ℓt (θ)
where gt (θ) = ∂θ , and ℓt (θ) = log f (yt |Ft−1 )
▶ In most cases, the maximum likelihood estimator θ̂ satisfied the
necessary condition G
∂ ln LT (θ)
GT (θ̂) = =0
∂θ θ=θ̂
Chen Tong (SOE&WISE) Risk Management September 12, 2024 59 / 62
Gradient (or Score)
▶ Normal Distribution
" # " #
∂ ln LT (θ) 1
PT
(y − µ)
GT (θ) = ∂µ
∂ ln LT (θ) = T
σ2
1
PT t
t=1
2
∂σ 2
− 2σ 2 + 2σ 4 t=1 (yt − µ)
Thus we have
T
1 X
µ̂ = yt = ȳ
T t=1
T
1 X 2
σ̂ 2 = (yt − ȳ )
T t=1
Chen Tong (SOE&WISE) Risk Management September 12, 2024 60 / 62
Hession
▶ The Hession matrix
T
∂ 2 ln LT (θ) X
HT (θ) = = ht (θ)
∂θ∂θ′ t=1
∂ 2 ln LT (θ) ∂ 2 ln LT (θ) ∂ 2 ln LT (θ)
∂θ1 ∂θ1 ∂θ1 ∂θ2 ... ∂θ1 ∂θK
∂ 2 ln LT (θ) ∂ 2 ln LT (θ) ∂ 2 ln LT (θ)
...
HT (θ) = ∂θ2 ∂θ1 ∂θ2 ∂θ2 ∂θ2 ∂θK
... ... ... ...
∂ 2 ln LT (θ) ∂ 2 ln LT (θ) ∂ 2 ln LT (θ)
∂θK ∂θ1 ∂θK ∂θ2 ∂θK ∂θK
▶ The second-order condition to ensure that the MLE maximizes the
log-likelihood function is that the Hession matrix HT (θ̂) should be
negative definite.
Chen Tong (SOE&WISE) Risk Management September 12, 2024 61 / 62
Asymptotic Poperties: Consistency
▶ If f (y ; θ) is correctly specified, then under some suitable regularity
conditions, the MLE is consistent.
plim θ̂ = θ0
Chen Tong (SOE&WISE) Risk Management September 12, 2024 62 / 62