Statistical Analysis in Finance
Session 3:
Sampling and Estimation
Dr. Nemanja Radić
www.cranfield.ac.uk/som
Statistical Analysis in Finance
Content :
Sessions 1 & 2: Probability and Probability Distributions
SESSION 3: SAMPLING AND ESTIMATIONS
Session 4: Hypothesis Testing
Session 5: Problem Solving
Sessions 6 & 7: Regression Analysis
Session 8: Regression Models with Dummy Variables
Sessions 9 &10: Problem Solving and Exam Revision
2
Statistical Analysis in Finance
Reading:
Statistical Techniques in Business and Economics
(17/E) by Douglas A. Lind, William G. Marchal and
Samuel A. Wathen 2017. McGraw-Hill. Chapters 8
and 9 .
Intended Learning Outcomes
• Understand simple random sampling, sampling
distribution and sampling error.
• Understand Central Limit Theorem and its
importance.
• Be familiar with techniques of point estimation.
• Be able to estimate confidence intervals for a
variety of data.
4
Sampling
• Population - consists of all members of a specified
group.
• Population Parameter is unknown.
• Sample - a subset of the population.
• Sample Statistic is calculated from sample and
used to make inferences about the population.
Most Commonly Used Probability
Sampling Methods
• Simple Random Sample:
• A sample selected so that each item or person in the population has
the same chance of being included.
• Systematic Random Sampling:
• The items or individuals of the population are arranged in some
order. A random starting point is selected and then every kth
member of the population is selected for the sample.
6
Most Commonly Used Probability Sampling
Methods (cont’d)
• Stratified Random Sampling
• A population is first divided into subgroups, called strata, and a
sample is selected from each stratum. Useful when a population can
be clearly divided in groups based on some characteristics.
• Cluster Sampling
• A population is divided into clusters using naturally occurring
geographic or other boundaries. Then, clusters are randomly
selected and a sample is collected by randomly selecting from each
cluster.
Stratified versus Cluster Sampling
• Stratified Sampling • Cluster Sampling
• Sample consists • Sample consists
of elements from of elements from
each group. the selected
groups.
• Preferred when
the objective is to • Preferred when
increase the objective is to
precision. reduce costs.
8
Selecting Samples in Finance
• Investment analysts commonly work with both time-series and
cross-sectional data.
• No economic basis for how long a time series should be.
• May need to combine data from two different periods, such as
fixed and floating exchange rate regimes.
• As a consequence, we would not be sampling from a population
described by a single set of parameters.
• Whenever we sample cross-sectionally, certain assumptions must
be met if we wish to summarize the data in a meaningful way.
• For example, might choose to summarize company-level data by
industry.
9
Parameter versus Statistics
• Population is described by parameters.
• A parameter is a constant, whose value may be
unknown.
• Only one population.
• Sample is described by statistics.
• A statistic is a random variable whose value depends
on the chosen random sample.
• Statistics are used to make inferences about the
population parameters.
• Can draw multiple random samples of size n.
10
Sampling Error
The sampling error is the difference between a
sample statistic and its corresponding
population parameter.
Examples:
X -µ
s -s
s2 - s 2
p -p
11
Sampling Distribution of the Sample
Mean
• The sampling distribution of the sample mean is a probability
distribution consisting of all possible sample means of a given
sample size selected from a population.
• It is not to be confused with the sample distribution, i.e. the
distribution of values in a sample (notice the - ing in the
ending)
• To get the sampling distribution of a sample mean, we need
to first select all possible samples of the same size from the
population, calculate the mean from each sample, and then
construct the distribution of all the means we calculated.
12
Sampling Distribution of the Sample
Means – Example 1
A firm has seven production employees (considered the population). The
hourly earnings of each employee are given in the table below.
1. What is the population mean?
2. What is the sampling distribution of the sample mean for samples of
size 2?
3. What is the mean of the sampling distribution?
4. What observations can be made about the population and the
sampling distribution?
13
Central Limit Theorem
If all samples of a particular size are selected from any population, the
sampling distribution of the sample mean is approximately a normal
distribution. This approximation improves with larger samples.
• If the population follows a normal probability distribution, then for any
sample size the sampling distribution of the sample mean will also be
normal.
• If the population distribution is symmetrical (but not normal), the normal
shape of the distribution of the sample mean emerge with samples as small
as 10.
• If a distribution that is skewed or has thick tails, it may require samples of
30 or more to observe the normality feature.
• The mean of the sampling distribution ( µ x ) equal to μ and the variance
equal to σ2/n.
19
Sampling Methods and the Central Limit
Theorem
20
Point Estimate
• A point estimate is a single value (point) derived from a
sample and used to estimate a population value.
X ® µ
s ® s
s2 ® s 2
p ® p
21
Confidence Interval (C.I.)
• A CI estimate is a range of values constructed from sample data so
that the population parameter is likely to occur within that range at a
specified probability.
• The specified probability is called the degree of confidence,
symbolised as 1 – α.
• α denotes the probability of error, also known as the level of
significance. This is the allowed probability that the estimation
procedure will generate an interval does not contain the true
parameter.
• If we let α = 5%, we are (1 – α)% [(e.g. 95% )] confident that a single
95% C.I. contains the population mean.
• We are justified in making this statement because we know that
95% of all possible C.I. constructed in the same manner will
contain the population mean.
22
Construction of C.I.
• A (1 – α)% confidence interval for a parameter has the
following structure:
• Point estimate ± Reliability factor x standard error
• Point estimate is value of sample statistic
• Reliability factor = a number based on the assumed
distribution of the point estimate and the degree of confidence
(1 – α) for the C.I.
• The standard error (standard deviation of the sample means)
of the sample statistic providing the point estimate.
23
Factors affecting confidence interval
estimates
The width of a confidence interval are determined by:
1.The sample size, n.
2.The variability in the population, usually σ
estimated by s.
3.The desired level of confidence.
24
Confidence Intervals for a Mean – σ Known
• A (1-α) % confidence for population mean μ when we are
sampling from a normal distribution with known variance
σ2 is given by
s
X± z
n
• We use the following reliability factors when we construct
C.I. Based on standard normal distribution:
• Confidence Intervals (C.Is):
• 90%, a = 0.10, z = 1.65.
• 95%, a = 0.05, z = 1.96.
• 99%, a = 0.01, z = 2.58.
25
C.I. for a Mean – σ Unknown
• If we are sampling from a population with
unknown variance
Then a (1-α) % C.I. for the population mean μ is
given by:
s
X± t
n
where the number of df for t is n-1 and n is the sample size
28
The t-distribution
• It is, like the z distribution, a continuous distribution, defined by a
single parameter known degrees of freedom, df.
• It is, like the z distribution, bell-shaped and symmetrical.
• There is not one t distribution, but rather a family of t distributions. All
t distributions have a mean of 0, but their standard deviations differ
according to the sample size, n.
• The t distribution is more spread out and flatter at the center than
the standard normal distribution As the sample size increases,
however, the t distribution approaches the standard normal distribution
29
Comparing the z and t Distributions
when n is small, 95% Confidence Level
t distribution has a grater spread. the value of t for a given level of
confidence is larger in magnitude. t distribution is flatter or more spread out. 30
Confidence Interval for the Mean
– Example 3
A tyre manufacturer wishes to Given in the problem :
investigate the tread life of its
tyres. A sample of 10 tyres driven n = 10
50,000 miles revealed a sample
mean of 0.32 inch of tread x = 0.32
remaining with a standard
deviation of 0.09 inch. s = 0.09
Construct a 95 percent
confidence interval for the
population mean.
Compute the C.I. using the
Would it be reasonable for the
manufacturer to conclude that t - dist. (since s is unknown)
after 50,000 miles the population
mean amount of tread remaining s
is 0.30 inches? X ± ta ,n -1
n
31
C.I. for a Proportion (π)
To develop a confidence interval for a proportion, we need to meet
the following assumptions.
1. The binomial conditions, discussed in last week, have been met.
Briefly, these conditions are:
a. The sample data is the result of counts.
b. There are only two possible outcomes.
c. The probability of a success remains the same from one trial
to the next.
d. The trials are independent. This means the outcome on one
trial does not affect the outcome on another.
2. The values np and n(1-p) should both be greater than or equal
to 5. This condition allows us to invoke the central limit theorem
and employ the standard normal distribution, that is, z, to complete
a confidence interval.
33
C.I. for a Proportion – σ Known
• A (1-α) % confidence interval of the population
proportion is given by
p (1 - p )
p± z
n
X
where p =
n
34
Selecting an appropriate sample size
There are 3 factors that determine the size of a
sample, none of which has any direct relationship to
the size of the population.
• The level of confidence desired.
• The margin of error the researcher will tolerate.
• The variation in the population being Studied.
36
Sample size for estimating the population
mean
s
E = z
n
2
æ z ×s ö
n=ç ÷
è E ø
Where:
n is the size of the sample.
Z is the standard normal value corresponding to the desired level of
confidence.
! is the population standard deviation.
E is the maximum allowable error.
37
Sample size for estimating a population
proportion
p (1 - p )
E= z 2
n æZö
n = p (1 - p )ç ÷
èEø
where:
n is the size of the sample
z is the standard normal value corresponding to
the desired level of confidence
π is the population proportion
E is the maximum allowable error
39