CHAPTER 2
STATISTICAL ESTIMATIONS
Introduction
The objective of statistics is to make inferences about
population based on information contained in a sample.
Inference:- is the process of making interpretations or
conclusions from sample data for the totality of the
population.
✓ The process of reaching a conclusion by inferring.
✓ A conclusion reached on the basis of evidence and
reasoning.
✓ Specifically decision making and prediction.
✓ Each of us faces daily personal decisions and situations
that requires predictions concerning the future.
Cont’d …
Example 1: Whether inflation will be rising in the next 6
months.
Example 2: Effectiveness of new drug etc.
One aspect of inferential statistics is estimation, which is
the process of estimating the value of the parameter from
information obtained from a sample.
The inferences that these individuals make should be
based on relevant facts which we call data (observation).
Population are characterized by numerical descriptive
measures called parameters N, 𝜇, 𝜎 2 .
Cont’d …
Typical population parameters are: mean, median, the
standard deviation and a proportion 𝜋 .
Methods for decision making about parameters fall in to
one or more categories or in statistics there are two ways
though which inference can be made.
i) Estimation
ii) Hypothesis Testing
➢ Data analysis is the process of extracting relevant
information from the summarized data.
Cont’d …
✓ These methods answers two questions:-
i) What is the value of the population parameter?
ii) Does the population parameter satisfy a specific
condition or is given parameter fulfill the condition?
Cont’d …
Example: i) What is average age of Management Students?
σ 𝑋𝑖
ത
𝑋=
𝑛
ii) Is average of Management students less than 20 years?
𝐻𝑜 : 𝜇 = 20 Vs 𝐻1 : 𝜇 < 20
An important question in estimation is that of sample
size. How large should the sample be in order to make an
accurate estimate?
This question is not easy to answer since the size of the
sample depends on several factors, such as the accuracy
desired and the probability of making a correct estimate.
Cont’d …
Statistical Estimation
This is one way of making inference about the
population parameter where the investigator does not
have any prior notion about values or characteristics of
the population parameter.
is one of problem of statistics inference.
is estimating population parameter.
❑ There are two ways estimation or it can be divided in to
two.
Cont’d …
i) Point Estimation:- deals with computing a single value
(statistic) from the sample data to estimate a population
parameter.
✓ It is a procedure that results in a single value as an
estimate for a parameter.
✓ Is a specific numerical value (single value) estimate of
population parameter (specifies a specific value such as;
𝜇 = 87).
✓ The best point estimate of the population mean and
proportion is the sample mean and the sample proportion
respectively.
Estimator and Estimate
An Estimator: is the sample statistic used to estimate
population parameters.
✓ is a rule that tells you how to calculate an estimate based
on information in the sample and that is generally
σ 𝑋𝑖
expressed as a formula. 𝑋 =ത =𝜇
𝑛
An Estimate: is the value of estimator or the value
sample mean.
✓ is a specific (single) value observed value of the
statistic. eg. 𝑋ത = 20
Cont’d …
✓ is the different possible values which an estimator can
assumes.
For instance, the sample mean is an estimator of
population mean, from this sample mean is estimator
and the value of sample mean is estimate.
Point Estimate: a single value used to estimate a
parameter.
Interval Estimate: a range of values used to estimate a
parameter.
Point Estimation Estimation of the Population Mean: μ
➢ Another term for statistic is point estimate, since we are
estimating the parameter value.
➢ A point estimator is the mathematical way we compute
the point estimate.
➢ For instance, sum of 𝑋𝑖 over n is the point estimator
used to compute the estimate of the population means, 𝜇
σ 𝑥𝑖
ത
That is 𝑋 = is a point estimator of the population
𝑛
mean.
Qualities of Good Estimator
Goodness of an estimator is evaluated by observing its
behavior in repeated sampling.
A good estimator is the one which provides an estimate
with the following qualities.
1. Unbiasedness:- An estimate is said to be unbiased
estimate of a given parameter when the expected value of
that estimator can be shown to be equal to the parameter
being estimated.
✓ An estimator of a parameter is said to be unbiased if the
mean of the sample distributions of a statistic is equal to
the corresponding true value of the population parameters,
otherwise, it is said to be biased.
Cont’d …
Let S – the sample statistic
E S = 𝜃 − 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
→ 𝑖𝑓 𝑆 = 𝑋, ത 𝐸 𝑆 = E 𝑋ത = 𝜇, then 𝑋ത is unbiased
estimator of population mean (𝜇).
→ 𝑖𝑓 𝐸 𝑆 ≠ 𝜃, then S is biased estimator.
2. Efficiency:- An efficient estimate is one which has the
smallest standard error among all unbiased estimators.
✓ If you compare two statistics from a sample of the size
and try to decide which one is more efficient estimate or
you should pick the statistic that has the smaller standard
error or standard deviation of the sampling distribution.
Cont’d …
If 𝑆1 and 𝑆2 are both unbiased estimators of the
population parameters (𝜃) and if the variance of 𝑆1 is
less than the variance of 𝑆2 , then 𝑆1 is more efficient than
that of 𝑆2 .
i.e., 𝐸(𝑆1 ) = 𝐸(𝑆2 ) = 𝜃,𝜎𝑆1 2 < 𝜎𝑆2 2 ⇔ 𝑆1 is more efficient than that of
𝑆2 .
3. Consistency:- A point estimator is consistent if its value
tends to close the population parameters as sample size
increase.
✓ The standard deviation of an estimate is called the
standard error of the estimate.
✓ The large standard error shows the more error in the
estimate.
Cont’d …
✓ The standard deviation of an estimate is a commonly
used index of the error entailed in estimating a population
parameter based on the information in a random sample
size n from the entire population.
4. Sufficiency:- This is another quality of a good estimator.
✓ A point estimator is sufficient estimator parameter 𝜃 if
‘S’ exhausts (holds) all the information about the
population parameter 𝜃 .
✓ An estimator is sufficient if it makes so much use of the
information in the sample that no other estimator could
extract from the sample additional information about the
population parameter being estimated.
ii) Interval Estimation
Interval Estimation:- refers to the method of finding an
interval that way contains the corresponding population
parameter.
✓ It is the procedure that results in the interval of values as
an estimate for a parameter, which is interval that
contains the likely values of a parameter.
✓ An interval estimate specifies a range of values for
parameters such as: 84 < 𝜇 < 90.
✓ Is an interval range of values used to estimate the
parameter. This estimate may or may not contain the
value of the parameter being estimated.
Cont’d …
✓ It deals with identifying the upper and lower limits of a
parameter.
✓ The limits by themselves are random variable.
✓ An interval estimate of population parameter 𝜃, consists
of two bounds in which the population parameter 𝜃 will
𝑈 − 𝑢𝑝𝑝𝑒𝑟
be lie L ≤ 𝜃 ≤ 𝑈, where, ቅ boundary.
𝐿 − 𝑙𝑜𝑤𝑒𝑟
Cont’d …
ഥ is exactly
✓ Thus instead of saying sample mean 𝑋
equal to the population mean we obtain an interval by
subtracting a number from 𝑋ഥ and by adding the same
number to 𝑋ഥ , then we state that this interval contain
the population mean µ.
✓ To know the number we should subtract or add firstly
take two consideration.
a) The standard deviation 𝛿𝑥ҧ of the sample mean 𝑋.ത
b) The level of confidence to be attached to the interval.
Confidence level and confidence interval
An interval estimate with a specific level of confidence
or each interval is constructed with regard to a given
confidence level and are called confidence interval.
The confidence level is the percent of the time the true
value will lie in the interval estimate given or the
probability associated with the confidence interval, states
how much confidence we have that this interval contains
the true population parameter and denoted by α.
Degrees of Freedom is the number of data values which
are allowed to vary once a statistic has been determined.
Confidence Interval Estimation of the Population Mean
ത possesses nearly all the qualities of a good
✓ Although 𝑋
estimator, because of sampling error, we know that it's not
likely that our sample statistic will be equal to the population
parameter, but instead will fall into an interval of values.
✓ We will have to be satisfied knowing that the statistic is "close
to" the parameter. That leads to the obvious question, what is
"close"? X
✓ We can phrase the latter question differently: How confident
can we be that the value of the statistic falls within a certain
"distance" of the parameter? Or, what is the probability that the
parameter's value is within a certain range of the statistic's
value? This range is the confidence interval.
Cont’d …
➢ The confidence level is the probability that the value of
the parameter falls within the range specified by the
confidence interval surrounding the statistic.
➢ There are different cases to be considered to construct
confidence intervals.
➢ To determine Confidence Interval (CI) for population
mean (𝜇) we shall consider three cases:
Case 1: If sample size is large or if the population is
normal with known variance (when 𝜎 2 𝑜𝑟 population
variance) is known and sample drawn from normal
population.
Cont’d …
➢ Recall the Central Limit Theorem, which applies to the
sampling distribution of the mean of a sample. Consider
samples of size n drawn from a population, whose mean
𝜇 is and standard deviation 𝜎 is with replacement and
order important.
𝟐 𝟐
ഥ
↑ 𝒏 ≥ 𝟑𝟎 𝑿~𝑵(𝝁, 𝝈 Τ ഥ
𝒏) – By CLT: 𝑿~𝑵(𝝁,
𝝈 Τ𝒏)
➢ To allows us to use the normal distribution curve for
computing CIs.
ഥ −𝝁
𝑿
𝒁= 𝝈 - has normal distribution with mean 0 and variance 1.
ൗ 𝒏
Cont’d …
⇒ 𝜇 = 𝑋ത ± Z 𝜎ൗ 𝑛
ത ± 𝜀, where 𝜀 is a measure of
➢ The 1 − 𝛼 100% 𝐶𝐼 for mean is 𝑋
error.
⇒ 𝛆 = Z 𝛔ൗ 𝐧
➢ For the interval estimator to be good the error should be small.
How it be small?
By making n large
Small variability
Taking Z small
➢ The sampling error usually decreases with increase in sample size
(number of units selected in the sample).
Cont’d …
➢ To obtain the value of Z, we have to attach this to a
theory of chance. That is, there is an area of size 1 − 𝛼
such P 𝑍 > 𝑍𝛼Τ2 = 𝛼Τ2
𝑾𝒉𝒆𝒓𝒆:
𝛼 = 𝑖𝑠 𝑡ℎ𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑙𝑖𝑒𝑠 𝑜𝑢𝑡𝑠𝑖𝑑𝑒 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝑍𝛼Τ2 = stands for the standard normal variable to the right of which
𝛼Τ probabilities lies, i.e 𝑃(−𝑍𝛼 < 𝑍 < 𝑍𝛼 ) = 1 − 𝛼
2 Τ2 Τ2
𝑋ത − 𝜇
= 𝑃 −𝑍𝛼ൗ2 < 𝜎 < 𝑍𝛼ൗ2 = 1 − 𝛼
ൗ 𝑛
= 𝑃 𝑋ത − 𝑍𝛼ൗ2 𝜎ൗ < 𝜇 < 𝑋ത + 𝑍𝛼ൗ2 𝜎ൗ =1−𝛼
𝑛 𝑛
ഥ − 𝐙𝛂ൗ 𝐒ൗ ,
⇒𝐏 𝐗 ഥ + 𝐙𝛂ൗ 𝐒ൗ
𝐗 𝐢𝐬 𝐚 𝟏𝟎𝟎 𝟏 − 𝛂 % 𝐂𝐈 𝐟𝐨𝐫 𝛍
𝟐 𝐧 𝟐 𝐧
Cont’d …
Case 2: If the sample size is large and the 𝜎 2 ( population
variance) is unknown
➢ But usually is not known, in that case we estimate by its
point estimator S2
➢ In this case, we will be used sample variance as
population variance.
➢ The 1 − 𝛼 100% 𝐶𝐼 𝑓𝑜𝑟 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 𝑤𝑖𝑙𝑙 𝑏𝑒:
𝑋ത ± 𝜀, 𝜀 = 𝑍𝛼ൗ 𝑆ൗ
2 𝑛
ഥ − 𝐙𝛂Τ 𝐒ൗ
⇒ (𝐗 , ഥ + 𝐙𝛂Τ 𝐒ൗ ) is a 𝟏 − 𝛂 𝟏𝟎𝟎% 𝐂𝐈 for 𝛍
𝐗
𝟐 𝐧 𝟐 𝐧
Cont’d …
❖ Here are the z values corresponding to the most
commonly used confidence levels.
❖ 90%, 95%, 99% - The most common Confidence Interval
Cont’d …
Case 3: If sample size is small (𝑛 ≤ 30) and the 𝜎 2 (population
variance), is not known and parent population is normal
➢ If the random sample is drawn from normal distribution or
approximately normal.
2
σ𝑛
𝑖=1 𝑋𝑖 1 σ𝑛
𝑖=1 𝑋𝑖
𝑋ത = , 𝑆2 = (σ𝑛𝑖=1 𝑋𝑖
− - then sampling
𝑛 𝑛−1 𝑛
distribution of sample mean is t-distribution:
ത
𝑋−𝜇
𝑡= 𝑆 , it has t-distribution with 𝑛 − 1degrees of freedom
ൗ 𝑛
ഥ − 𝒕𝜶Τ 𝑺ൗ
= (𝑿 , ഥ + 𝒕𝜶Τ 𝑺ൗ
𝑿 is a 𝟏 − 𝜶 𝟏𝟎𝟎% 𝑪𝑰 for 𝝁
𝟐 𝒏 𝟐 𝒏
Cont’d …
➢ The (1-α) 100% confidence interval for µ is:
𝑋ത ± 𝑍𝛼 𝛿𝑋ത if δ is known and for all sample size
2
ത ± 𝑍𝛼 𝑆𝑋ത if δ is not known and sample size is large (n ≥ 30)
𝑋
2
ത ± t 𝛼 𝑆𝑋ത , if δ is not known and sample size is small (n < 30),
𝑋
2
𝛿 𝑆
Where 𝛿𝑋ത = and 𝑆𝑋ത =
𝑛 𝑛
The value of Z used here is read from the standard normal
distribution table for the given confidence level and the t-
value is obtained from the t- distribution table for n-1
degrees of freedom and the given confidence level.
Cont’d …
Hence the width of confidence interval depends on
value of z and sample size n. as confidence level
decrease and sample size increase, the confidence
interval decrease.
The unit of measurement of the confidence interval
is the standard error. This is just the standard
deviation of the sampling distribution of the statistic.
Cont’d …
Example1:- Suppose a particular species of under story
plants is known to have a variance in heights of 16cm2. If
this species is sampled with the heights of 25 plants
averaging 15cm, find the 95% confidence interval for the
population mean.
Solution:
Given, n = 25, 𝑋ത = 15cm, δ2 = 16cm2, δ = 4cm and
𝑍𝛼 = 𝑍0.05 = Z0.025 = 1.96 from table
2 2
4
CI = 𝑋ത ± 𝑍𝛼 𝛿𝑋ത = (15 ± 1.96( )
2 25
CI = (13.432, 16.568)
Cont’d …
We are 95% confident that the values of population mean
µ lie in between 13.432 and 16.568
Example 2:- An investigator wanted to estimate the mean
nitration level for all plants living in Amazon forest he took
a sample of 25 plants and found that the mean nutrition
levels for all plants are approximately normally distributed.
If sample mean and sample standard deviation of 25 plants
are 186 and 12 respectively. Construct a 95% confidence
interval for the population mean µ.
Cont’d …
Solution:
Given, n =25, 𝑋ത = 186, S = 12, c. l. = 95%,
12
𝑆𝑋ത =
𝑆 = = 2.40
𝑛
25
24
df = n-1 = 25-1 =24 and 𝑡0.025 = 2.064, (from t-table).
CI = 𝑋ത ± t 𝛼 𝑆𝑋ത = 186 ± 2.064(2.4) = 186 ± 4.95
2
CI = (181.05, 190.05)
We have 95% confident that the mean of nutrition level
for all plants lies in between (181.05, 190.05)
Exercises:
1. From a normal sample of size of 25 a mean of 32 was
found. Given that the population standard deviation is 4.2.
Find.
a) A 95% CI for the population mean (𝜇)
b) A 99% CI for the population mean (𝜇)
2. A sample survey conducted in large city showed that 324
families spent on average of 3942 Birr on food. Past
experience showed that the S.D for food expenditure of
families of city was 450 Birr. Then:
a) Find the standard error of sample mean.
b) Construct 95.44% CI for the population mean ( 𝜇 )
expenditure.
Cont’d …
3. An entomologist sprayed 120 adult melon flies with specific
law concentration of the lathion and observed their survival
time the mean and standard deviation were found to be 18.3 and
5.2 days respectively. Then construct the 99% CI for the
population mean (𝜇) of survival time:
4. In a psychological depth perception that a random sample of
14 airline pilots were asked to judge the difference between two
markers of the other and of a laboratory. The sample data are
(recorded in a fact below)
2.7, 2.4, 2.6, 2.4, 1.9, 1.3, 1.9, 2.2, 2.5, 2.3, 1.8, 2.5, 2.0, 2.2, we
sample data to construct 95% of CI for population mean (𝜇) of
the average recorded distance for the psychological test.
Point and Interval Estimation of the Population Proportion
The population and sample proportions are denoted by P
and 𝑃 , respectively and calculated as P (population
𝑋 𝑥
proportion) = and 𝑃 (sample proportion) =
𝑁 𝑛
For large sample
The sampling distribution of the sample proportion is
approximately normal.
is equal to
The mean µ𝑝ො of the sampling distribution of 𝑃
the population proportion P.
𝒑𝒒
The standard deviation, 𝛿𝑃 is equal to , Where q =1-p
𝒏
Cont’d …
In the case of proportion a sample is considered to be
large if np and nq are both greater than 5.
If p and q are not known, then n𝑝Ƹ and n𝑞ො should each be
greater than 5 for the sample to be large.
When estimating the value of population proportion we
do not know the value of p and q.
Consequently we cannot compute 𝛿𝑃 .
Therefore in the estimation of the population proportion,
we use the value of 𝑆𝑝ො as an estimate of 𝛿𝑃 .
Cont’d …
The value of 𝑆𝑝ො which gives a point estimate of 𝛿𝑃 is
calculated as
𝑝ො𝑞ො
𝑆𝑃 =
𝑛
𝑝Ƹ Is the point estimator of the corresponding population
proportion P.
✓ The margin of error associated with this point estimation
is calculated by Margin of error = ±1.96𝑆𝑃
✓ The (1-α) 100% confidence Interval for the population
proportion P is 𝑝Ƹ ±Z𝑆𝑃
Cont’d …
Example1: An epidemiologist wish’s to determine the rate
of breast cancer in women 60 to 65 years old in Ireland.
She surveys a random sample of 5000 women in this age
group and determines that exactly 50 have had this form of
cancer sometime during their life time.
𝑥 50
So 𝑃 = = = 0.01
𝑛 5000
She now has an estimate of the population rate of breast
cancer and needs a way of expressing her confidence in this
value.
Cont’d …
Solution:
𝑃 Is an unbiased estimator of P. since 𝑝Ƹ is actually a sample
mean then the variance of ෝ𝑝 is
𝑝ො𝑞ො 0.01(0.99)
𝑆𝑃 = =
𝑛 500
0.01(0.99)
CI = 𝑝Ƹ ±Z𝑆𝑃 = (0.01±1.96 )
500
= (0.01- 0.003, 0.01+0.003) = (0.007, 0.013)
We have 95% confident that the population proportion is
lies between 0.007 and 0.013.
Cont’d …
Example 2: Suppose you wanted to evaluate the number of
jars of jelly that have less than the correct amount of
product and you randomly sampled 100 jars of the
production line. Suppose that 23 jars were found to have
some type of fill problem. Now you want to estimate the
proportion of jars of jelly that have some type of fill
problem. You will accept a 90% confidence level.
Solution:
23
P= , with 90% confidence level Z = 1.645
100
CI = 𝑝Ƹ ±Z𝑆𝑃
CI = (0.161, 0.299)
Sample Size Determination
➢ One reason why we usually conduct a sample survey and
not a census is that almost always we have limited
resources at our disposal.
Determining the sample size for the estimation of µ
Given the confidence level and the standard deviation of
the population, the sample size that will produce a
predetermined maximum error E of the confidence
interval estimate of µ is
𝒁 𝟐 𝜹𝟐
n = if 𝛿 2 is known unless use 𝑆 2 instead of 𝛿 2
𝑬𝟐
Determining the sample size for the estimation of P
Given the confidence level and the value of p and q the
sample size that will produce a predetermined
maximum error of the confidence interval estimate of P
is:
𝒁𝟐 𝑷𝒒 𝑃𝑞
n= 𝟐 , Where E = Z𝛿𝑃 = Z
𝑬 𝑛
The Student’s t-Distribution
➢ The t-distribution is used in a variety of statistical
studies, include the student’s t-test for determining
the statistical significance of a difference in two
sample means, the generation of confidence intervals
for a difference in two population means, and linear
regression analysis.
➢ A family of continuous random variables.
➢ It is also bell-shaped curve.
Cont’d …
✓ Similar to the normal distribution except that:
The population variance is not known so that is
estimated from samples.
Sample size, n, is less than 30.
Table values are read using n-1 degrees of
freedom.
It differs from standard normal by area of curve is
t-distribution is greater than one (1).
The shape of curve in t-distribution determined by
sample size (degree of freedom).
Applications of t-distribution
➢ The t-distribution has the following important
applications in testing the hypotheses for small samples.
✓ To test significance of a single population mean, when
population variance is unknown.
✓ To test the two population means when population
variances are equal and unknown.
✓ The t-distribution is most useful for small sample sizes,
when the population standard deviation is not known, or
both. As the sample size increases, the t-distribution
more similar to a normal distribution.
Characteristics of t-distribution
➢ The t-distribution, like the normal distribution:-
is bell-shaped and symmetric
it has heavier tails
it tends to produce values that fall far from its mean
Are used in statistic to estimate significance