KEMBAR78
Statistics and Probability | PDF | Standard Deviation | Standard Error
0% found this document useful (0 votes)
14 views19 pages

Statistics and Probability

This document is a learning resource focused on statistics and probability, specifically covering random sampling techniques, parameters and statistics, and the sampling distribution of the sample mean. It outlines various sampling methods such as simple random, systematic, stratified, and cluster sampling, and explains how to compute population and sample statistics including mean, variance, and standard deviation. The document also provides examples and formulas for calculating these statistical measures, emphasizing the importance of sampling in research.

Uploaded by

ainaferrera21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views19 pages

Statistics and Probability

This document is a learning resource focused on statistics and probability, specifically covering random sampling techniques, parameters and statistics, and the sampling distribution of the sample mean. It outlines various sampling methods such as simple random, systematic, stratified, and cluster sampling, and explains how to compute population and sample statistics including mean, variance, and standard deviation. The document also provides examples and formulas for calculating these statistical measures, emphasizing the importance of sampling in research.

Uploaded by

ainaferrera21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

STATISTICS AND PROBABILITY

QUARTER
1II

Begin

The hands are one of the greatest assets of the human body. No other beings in the world has hands that can grasp, hold,
move, and manipulate objects like human hands. Through our hands, we may learn, create and accomplish. Hence, the hand in this
learning resource signifies that you as a learner is capable and empowered to successfully achieve the relevant and essential
competencies and skills at your own pace and time. Your academic success lies in your own hands!
This module was designed to provide you with fun and meaningful opportunities for guided and independent learning at your
own pace and time. You will be enabled to process the contents of the learning resource while being an active learner.

After going through this module, you are expected to:


1. identify different random sampling techniques;
2. calculate the parameter or statistic of the given data;
3. solve problems involving sampling distribution of the sample mean;
4. identify regions under the t-distribution corresponding to different t-values;
5. identify percentiles using the t-table;
6. solve problems involving the length of a confidence interval; and
7. compute for the length of the confidence interval;

Identifying the Different Random Sampling Techniques


In research, collecting data can either be done in the entire population or the subset of this population called sample. If a
researcher opts to use sample rather than a population, he must take considerations on the number of samples and how these samples
can be chosen out of his target population.
A population includes all of its elements from a set of data. The size of the population is the number of observations in the
population. For example, if ABSCBN network has 11,000 employees having the required blood type in a certain study, then we have a
population of size 11,000.
Sample consists of one or more data drawn from the population. It is a subset, or an incomplete set taken from a population of
objects or observations. Taking samples instead of the population is less time-consuming and cost-effective. Although sampling has
advantages, it can also be a source of bias and inaccuracy.
Random Sampling is a sampling method of choosing representatives from the population wherein every sample has an equal
chance of being selected. Accurate data can be collected using random sampling techniques.

Let us analyze the situations given above.


1. A researcher writes the name of each student on a piece of paper, mixes the papers in a bowl, and draws 7 pieces of
paper.
Situation 1 illustrates simple random sampling. The pieces of paper correspond to each student as elements of the population. All of
them have an equal chance of being selected as a sample by randomly picking 7 pieces of paper in a bowl.

2. A researcher selects every 7th students from a random list.


3. A researcher tells the class to count and then selects those students who count a multiple of 7 numbers
Situations 2 and 3 illustrate systematic random sampling because samples are being selected based on the kth consistent intervals.
Selecting every 7th student on the random list of names creates an equal chance for all of the students. The same thing happened in
selecting students who count multiple of 7 or 7, 14, 21, and so on.

4. A researcher separates the list of boys and girls, then draws 7 names by gender.
Situation 4 illustrates stratified random sampling because the students were divided into two different strata or groups, boys and girls.
With a proportional number for each group, samples will then be selected at random from these two groups.

5. A researcher surveys all students from 3 randomly selected classes out of 7 classes.
Situation 5 illustrates cluster sampling since all students are divided into clusters or classes, then 3 classes were selected at random out
of the 7 classes. All of the students of these three classes comprised the samples of the study. Take note that each cluster is mutually
homogeneous yet internally heterogeneous.

There are different types of random sampling.


a. A Simple random sampling technique is the most basic random sampling wherein each element in the population has an
equal probability of being selected. They are usually represented by a unique identification number that is written on
equal-sized and shaped papers and then selection of samples is possible through the lottery method. Random numbers
selected to decide which elements are included as the sample. The drawing number of paper is based on the desired
number of samples.
b. Systematic random sampling is a random sampling that uses a list of all the elements in the population and then elements
are being selected based on the kth consistent intervals. To get the kth interval, divide the population size by the sample
size

c. Stratified random sampling is a random sampling wherein the population is divided into different strata or divisions. The
number of samples will be proportionately picked in each stratum that is why all strata are represented in the samples.
d. Cluster sampling is a random sampling wherein population is divided into clusters or groups and then the clusters are
randomly selected. All elements of the clusters randomly selected are considered the samples of the study.

The sampling techniques that involve random selection are called probability sampling. Likewise, simple random, systematic, and
stratified and cluster sampling are all probability sampling techniques.
There are also sampling techniques that do not involve random selection of data. They are called non-probability sampling. An
example of this is convenience sampling wherein the researcher gathers data from nearby sources of information exerting minimal
effort. Convenience is being used by persons giving questionnaires on the streets to ask the passers-by.
Purposive sampling is also not considered a random sampling since the respondents are being selected based on the goal of the
studies of the researcher. If the study is about the students who are children of OFW, the researcher will get samples who are children
of OFW. This excludes other students from being a sample.

Computing for the Parameter and Statistic


A parameter is a measure that is used to describe the population while statistic is a measure that is used to describe the
sample. To understand more, let us discuss the two measures
Below are grades in Statistics of Grade 11 students during the third quarter.
94 85 88 79 78 75 89 91 84 77
Let us compute the population mean, population variance, and population standard deviation.

POPULATION MEAN
The mean is the sum of the data divided by the number of data. The mean is used to describe where the set of data tends to
concentrate at a certain point. Population mean is the mean computed based on the elements of the population or data. The symbol µ
(read as “mu”) is used to represent population mean. To compute for the population mean, we simply add all the data (X) and then,
∑X
divide it by the number of elements in the population (N). We apply the formula: µ =
N
where:
µ = the population mean

𝑁 = number of elements in the population


∑x = the summation of x (sum of the measures)
∑X
In our case, adding all the 10 grades will give us a sum of 840. We substitute the values obtained to the formula µ =
N
, therefore, µ = 840/10 = 84 Our computed population mean µ is 84.

POPULATION VARIANCE AND POPULATION STANDARD DEVIATION


Variance and standard deviation determine how to spread or to scatter each data on the set from the mean. Standard deviation

Population variance is the computed variance of the elements of the population. The symbol 𝜎2 (read as “sigma squared”) is
is simply the square root of the variance.

used to represent population variance.


To compute for the population variance, we apply the formula:

where:
X = given data

𝑁 = number of elements in the population


µ = the population mean

Population standard deviation is the computed standard deviation of the elements of the population. The symbol 𝜎 (read as
“sigma”) is used to represent population standard deviation.
To compute for the population standard deviation, we use the formula:

where:
X = given data

𝑁 = number of elements in the population


µ = the population mean

Consider the data given above, to solve for the population variance and population standard deviation, we have this table:

The third column is computed through subtracting the mean to the scores, while the fourth column is computed by squaring
the third column. Since there is a symbol ∑ or summation in the formula, we need to add the computed values in the fourth column.
Again, for the population mean,

For the population variance, we substitute the computed values to our formula, thus

For the population standard deviation, we can also substitute the computed values to the formula, or we can simply get the square root
of the variance.

Population mean (µ), population variance ( 2) and population standard deviation (𝜎) are what we called parameters.

STATISTIC
From the previous data of the population, suppose that we randomly select only 7 data out of the total 10 data in the
population.

Compute the sample mean, sample variance, and sample standard deviation. Here is the result:

SAMPLE MEAN
The sample mean is the average of all the data of the samples. The symbol 𝑥̅ (read as “x bar”) is used to represent the sample
mean. To compute for the sample mean, we simply add all the data and divide it by the number of elements in the sample (n). We

apply the formula: 𝑥̅=


∑x
n

𝑥̅= the sample mean


where:

𝑛 = number of elements in the sample


∑x = the summation of x (sum of the measures)

In our case, adding the 7 samples will give us a sum of 602. We substitute to the formula 𝑥̅= , therefore, 𝑥̅= 602/7 = 86.
∑x

Our computed sample mean 𝑥̅is 86.


n

In this example, there is a slight difference between the population mean and the sample mean. But notice that there is no
difference regarding the method in determining the value of the population mean and the sample mean. For the divisor, the population
mean µ uses N (population size) while sample mean x applies n (sample size).

SAMPLE VARIANCE AND SAMPLE STANDARD DEVIATION


Sample variance is the computed variance of the elements of the sample. S 2 is used to represent sample variance. To compute
for the sample variance, we apply the formula:

where:

𝑥̅= the sample mean


x = given data

𝑛 = number of elements in the sample

Sample standard deviation is the computed standard deviation of the elements of the sample. s is used to represent sample
standard deviation. To compute for the sample standard deviation, we use the formula:

where:

𝑥̅= the sample mean


x = given data

𝑛 = number of elements in the sample

As you would notice, the sample standard deviation is also the square root of the sample variance.

The fourth column is computed by subtracting the mean to the grades, while the last column is computed by squaring the third
column. Since there is a symbol ∑ or summation, we need to add the computed values.
Sample mean (𝑥̅), sample variance (s2) and sample standard deviation (s) are what we call statistic. Remember that
parameters are for population while statistics are for sample

Other examples of parameters and statistics are the population proportion and correlation coefficient. For population
proportion, we use “p” for sample and “P” for the population. In correlation coefficient, we use “r” for the sample and “𝜌” (read as
rho) for the population. These will be discussed in the latter part of this course.

Sampling Distribution of the Sample Mean


Sampling distribution of the sample means is a frequency distribution using the computed sample mean from all the possible
random samples of a particular sample size taken from the given population. Steps to follow in making a sampling distribution of the
sample mean:
1. Determine the number of sets of all possible random samples that can be drawn from the given population by using the
formula, NCn, where N is the population size and n is the sample size.
In our activity, we are given with a population of 1, 2, 3, 4, and 5 and sample size of 3, therefore we have

2. List all the possible random samples and solve for the sample mean of each set of samples.

3. Construct a frequency and probability distribution table of the sample means indicating its number of occurrence or the
frequency and probability.

Mean and Variance of the Sampling Distribution of the Sample Mean

𝜇 = ∑[𝑋̅ • 𝑃(𝑋̅)]
Mean of the Sampling Distribution of the Sample Mean

Variance of the Sampling Distribution of the Sample Mean

𝜇 = mean of the sampling distribution of the sample mean


Where:

𝜎 = variance of the sampling distribution of the sample mean


∑[𝑋̅ • 𝑃(𝑋̅)] = sum of the products of the sample mean and the probability of the sample mean
𝜇 = population mean
𝜎2= population variance
𝑛 = sample size
𝑁 =population size
𝑋̅ = sample mean

(𝑋̅ − 𝜇)2 = square of the difference between the sample mean and population mean
(𝑋̅) = probability of the sample mean

∑[𝑃(𝑋̅) • (𝑋̅ − 𝜇)2 ] = summation of the products of probability of the sample mean and the square of the difference between the

𝜇2= square of the population mean


sample mean and the population mean

𝑋̅2= square of the sample mean


𝑋̅2 • (𝑋̅)= product of the square of the sample mean and the probability of the sample mean
∑[𝑋̅2 • 𝑃(𝑋̅)] = sum of the product of the square of the sample mean and the probability of the sample mean

Example:
Mark is conducting a survey on grade 12 students of Nasyonalismo High School. He found out that there are only few students
who knew about the makers of the Philippine flag consisting of 1, 2, 3, 4, and 5 SHS students from 5 sections. Suppose that the sample
size of 2 sections were drawn from this population (without replacement), describe the sampling distribution of the sample means.

1. Compute the mean of the population using the formula µ = 𝛴𝑥/𝑁. The value equals to 3.0.
Solution:

µ = 𝛴𝑥/𝑁 =
(1+2+3+ 4+5)
= 3.00

Compute the variance of the population using the formula 𝜎 2 = (𝑥 − µ) 2/ 𝑁. a. Subtract each measurement by the
5
2.
computed population mean. b. Square the results obtained in (a) then add. Divide the sum by the frequency of measurement
to get the value of the population variance. The value equals to 2.0.

3. Determine the number of possible samples of size 2 (without replacement). Use the combination formula NCn where N is the
population size and n is the sample size.
N!
Use the formula (3) NCn = . Here N=5 and n=2.
n !(N −n)!
5C2 = 10 So, there are 10 possible samples of size 2 that can be drawn

4. List all possible samples and compute the corresponding means.

5. Construct the sampling distribution of the sample means.

6. Compute the mean of the sampling distribution of the sample means. Follow these steps:
a. Multiply each sample mean by the corresponding probability.

of the sample means.) 𝜇 = ∑[ ̅̅𝑋̅ • 𝑃(𝑋̅)]


b. Add the results obtained in a. (The sum of the values corresponding to that column is the mean of the sampling distribution
7. Compute the variance of the sampling distribution of the sample means using the formula 𝜎 = ∑[𝑃(𝑋̅) • (𝑋̅ − 𝜇)2 ].
Follow these steps:
a. Subtract the population mean from each sample mean.
b. Square the difference obtained in a.
c. Multiply each result in b by the corresponding probability. d. Add the results in c. (The sum is the variance of the sampling
distribution of the sample means.).

MIND THIS: The mean of the population is equal to the sampling distribution of sample mean

Defining Sampling Distribution of the Sample Mean for Normal Population


Activity
Instruction: Read and analyze the following situations. Determine whether the following statements have a known or unknown

the mean by writing the symbol 𝜎𝑥̅ , when the population variance is known and the symbol 𝑠𝑥̅when the population variance is
population variance. Write your answer on the space provided. Identify also the formula to be used to estimate the standard error of

unknown

Solution:
1. Known – This is a population data. Although the value of the variance is not given, you can still determine the population

mean and population variance of the data using the formulas 𝜇 = ,𝜎 =


∑ X 2 Σ ( x−µ)2
respectively. So, this situation is an
N
example of the sampling distribution when the variance is known. Since the population variance can be calculated, used 𝜎x =
N

σ
in the computation of the standard error of the mean.

Unknown- The population variance is unknown. The given values are the population mean 𝜇 = 15, the sample standard
√n
2.

deviation 𝑠 = 6, and the sample size 𝑛 = 16. Use the formula 𝑠x =


s
for the standard error of the mean.
√n
3. Known- the population variance is given with the value of the square of the standard deviation 𝜎2 = 152 . Apply the formula

𝜎x =
σ
in computing for the standard error of the mean.

Unknown- The population variance is unknown. The only given values are the population mean 𝜇 = 92.78 and the sample
√n
4.

size 𝑛 = 10. Use the formula 𝑠x =


s
for the standard error of the mean.
√n
5. Known – This is a population data, you can determine the population mean and variance of the data using the formulas 𝜇 =

,𝜎 =
∑ X 2 Σ ( x−µ)2
. So, this situation is an example of the sampling distribution where the population variance can be
N N
computed. Apply the formula 𝜎x =
σ
to solve for the standard error of the mean.
√n
Based on the previous activity, you learned from the situations presented that by analysis, you can easily find out if the given problem

the standard deviation of the sampling distribution of the mean is computed using the formula 𝜎𝑥̅= 𝜎 √ , while the formula 𝑠𝑥̅= 𝑠
provides the value of the population variance or if the population variance is unknown. Also, when the population variance is known,

√𝑛 is used to estimate the standard error of the mean when the population variance is unknown.

1. Population variance 𝜎2 is known The population has a mean μ and variance of 𝜎2 , the distribution of the sample mean is (at
Distribution of the Sample Mean for Normal Population

least approximately) normal and standard error of the mean 𝜎x = , where σ is the population standard deviation and 𝑛 is the
σ

sample size. To determine the probability of a certain event, we can use the 𝑧 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 by transforming the mean of the
√n

x ̅ −μ
sample data to an approximately normal variable , using the relation 𝑧 = σ . This distribution is best applied for large sample

sizes, say 𝑛 ≥ 30.


√n

2. Population variance 𝜎2 is unknown The standard error of the mean becomes 𝑠x = , where 𝑠 is the point estimate of 𝜎
s

(population standard deviation) or the sample standard deviation and 𝑛 is the sample size. To estimate the population parameters, we
√n

x ̅ −μ
can use the 𝑡 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 by using the formula 𝑡 = s . Remember that as n the sample size is very large, the standard

deviation 𝑠 is almost indistinguishable from the population standard deviation 𝜎 and therefore 𝑡 and 𝑧 distributions are essentially
√n

identical. Remember that, we use the 𝑡 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 for small sample size, say 𝑛 < 30.

Illustrating the Central Limit Theorem

If the population is normally distributed, the sampling distribution of the sample mean is also normally distributed. But what if the
population is not normal?
That is where the Central Limit Theorem addressed this question. The distribution of the sample mean tends toward the normal
distribution as the sample size increases, regardless of the distribution from which we are sampling. As a simple guideline, the sample
mean can be considered approximately normally distributed if the sample size is at least 30 (n ≥ 30 ). If the sample size is sufficiently
large, the Central Limit Theorem can be used to answer the sample mean in the same manner that a normal distribution can be used to
answer questions about individual samples. This also means that even if the population is not normally distributed, or if we don’t
know of its distribution, the Central Limit Theorem allows us to conclude that the distribution of the sample mean will be normal if
the sample size is sufficiently large. It is generally accepted that a sample size of at least 30 is large enough to conclude that the
Central Limit Theorem will ensure a normal distribution in the sampling process regardless of the distribution of the original
population. Further, we can continue to use the z conversion formula in our calculations. This time we will use the formula,
x ̅ −μ
𝑧= σ
√n
Why it is important to know the Central Limit theorem?
Suppose that the average age of the people living in a Barangay is 34 with a standard deviation of 4. If 100 residents of a certain
Barangay decided to take summer outing after COVID-19 pandemic and Enhanced Community Quarantine has been lifted for
bonding and relaxation, what is the probability that the average age of these residents is less than 35?

Solution:
It is not given that the population is normally distributed but since n > 30, then you can assume that the sampling distribution of
the mean ages of 100 barangay residents is normal according to the Central Limit Theorem.
The Central Limit Theorem describes the normality of the distribution of the sample mean taken from a population that is not
normally distributed.

𝜇 = 34, 𝜎 = 4, 𝑋̅ = 35, 𝑛 = 100


Step 1: Write the given data.

Step 2: Convert the raw score to the standard score using the formula.
x ̅ −μ
𝑧= σ
√n

Step 3: Use the Z table to find P (Z < 2.5).


( < 2.5) = 0.9938
Therefore, the probability that the random sample of 100 persons has an average of fewer than 35 years is 0.9938 or 99.38%
Consider the illustrations below for a better understanding of the Central Limit Theorem. Still refer to our previous example:

Suppose that the average age of the people living in a Barangay is 34 with a standard deviation of 4. One hundred (100) residents
of a certain Barangay decided to take summer outing after COVID-19 pandemic and Enhanced Community Quarantine has been lifted
for bonding and relaxation.
If we make a relative histogram of samples with various sample sizes, it would look like the histograms below.

calculate the sample mean, the histogram of the illustration comes to be normally distributed. And that is where the Central Limit
Theorem is used to make better inferences.

Solving Problems Involving Sampling Distribution of the Sample Mean


A sampling distribution of the sample mean is a frequency distribution of the sample mean computed from all possible random
samples of a specific size n taken from a population. The probability distribution of the sample mean is also called the sampling
distribution of the sample mean. The standard distribution of the sampling distribution of the sample mean is also known as the
standard error of the mean.
1. What is the probability that a randomly selected senior high school student will complete the examination in less than 48
minutes?
To answer the problem, follow the steps:

𝜇 = 50.6 𝜎 = 6 X = 48
Step 1. Identify the given information:

Step 2. Identify what is asked for: P(X ˂ 48)


Step 3. Identify the formula to be used:
X−μ
The problem is dealing with an individual data obtained from the population so the formula to be used is z = to convert
σ
48 to standard score.
Step 4. Compute for the Probability:
X−μ
z=
σ
48−50.6
z=
6
z = - 0.43
Find P(X< 48) by getting the area under the normal curve.
P(X˂ 48) = P(z ˂ -0.43) = 0.3336

Therefore, the probability that a randomly selected college student will complete the examination in less than 48 minutes is
0.3336 or 33.36%

2. If 49 randomly selected senior high school students take the examination, what is the probability that the mean time it takes
the group to complete the test will be less than 48 minutes?

𝜇= 50.6 𝜎 = 6 𝑋̅= 48 𝑛= 49
Step 1: Identify the given information:

Step 2: Identify what is asked. P(𝑋̅ ˂ 48)


Step 3: Identify the formula to be used. The problem is dealing with data about the sample mean or n observations, so the
x ̅ −μ
formula to be used to standardize 48 is 𝑧 = σ
√n
Step 4: Compute for the Probability:

Find P(𝑋̅ ˂ 48) by getting the area under the normal curve.
P(𝑋̅ ˂ 48) = P(z ˂ -3.03) = 0.0012

The probability that 49 randomly selected senior high school students will
complete the test in less than 48 minutes is 0.0012 or 0.12%

3. If 49 randomly selected senior high school students take the examination, what is the probability that the mean time it takes
the group to complete the test will be more than 51 minutes?

𝜇= 50.6 𝜎 = 6 𝑋̅= 51 𝑛= 49
Step 1: Identify the given information:

Step 2: Identify what is asked. P(𝑋̅ > 51)


Step 3: Identify the formula to be used. The problem is dealing with data about the sample mean or n observations, so the
x ̅ −μ
formula to be used to standardize 51 is 𝑧 = σ
√n
Step 4: Compute for the Probability:

Find P(𝑋̅ > 51) by getting the area under the normal curve.
P(𝑋̅ > 51) = P(z > 0.47)
= 1 −P(𝑧 < 0.47)
= 1 – 0.6808
= 0.3192
The probability that 49 randomly selected senior high students will
complete the test in more than 51 minutes is 0.3192 or 31.92%

4. If 49 randomly selected senior high school students take the examination, what is the probability that the mean time it takes
the group to complete the test is between 47.8 and 53 minutes?

𝜇= 50.6 𝜎 = 6 𝑋̅= 47.8 and 53, 𝑛 = 49


Step 1: Identify the given information:

Step 2: Identify what is asked. P(47.8 < 𝑋̅ < 53)


Step 3: Identify the formula to be used. The problem is dealing with data about the sample mean or n observations, so the
x ̅ −μ
formula to be used to standardize 47.8 and 53 is 𝑧 = σ
√n
Step 4: Compute for the Probability:

Find P(𝑋̅ > 47.8) by getting the Find P(𝑥̅< 53) by getting the
area under the normal curve. area under the normal curve
P(𝑥̅> 47.8) = P(z > - 3.27) = 0.0005 P(𝑥̅< 53) = P(z < 2.8) = 0.9974

To find the probability that 49 randomly selected senior high school


students will complete the test between 47.8 and 53 minutes, subtract
the smaller area from the bigger area under the normal curve.

That is 0.9974– 0.0005 =0.9969 or 99.69%

According to the Central Limit Theorem, the sampling distribution of a statistic (like a sample mean, 𝑥̅) will follow a normal
Illustrating the t-Distribution

distribution, as long as the sample size (𝑛) is sufficiently large. Therefore, when we know the standard deviation of the population, we
can compute a z-score and use the normal distribution to evaluate probabilities with the sample mean.
But sample sizes are sometimes small, and often we do not know the standard deviation of the population. When either of these
problems occurs, the solution is to use a different distribution.

Student’s t-distribution

small (𝑖. 𝑒. 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 < 30) and/or when the population variance is unknown. It was developed by William Sealy Gosset in
The Student’s t-distribution is a probability distribution that is used to estimate population parameters when the sample size is

1908. He used the pseudonym or pen name “Student” when he published his paper which describes the distribution. That is why it is
called “Student’s tdistribution”. He worked at a brewery and was interested in the problems of small samples, for example, the
chemical properties of barley. In the problem he analyzed, the sample size might be as low as three.
Suppose you are about to draw a random sample of n observations from a normally distributed population, you previously learned
that,
x ̅ −μ
𝑧= σ

where 𝑧 is the z-score, 𝑥 is the sample mean, 𝜇 is the population mean, 𝜎 is the population standard deviation and 𝑛 is the sample
√n

size, have the standard normal distribution. (Note that if we are standardizing a single observation, the value of n is 1. Hence, the
. You can use this concept to construct a confidence interval for the population mean, 𝜇. But in practice,
X−μ
formula becomes z =

you encounter a problem, and that problem is that you don’t know the value of the population standard deviation, 𝜎. The standard
σ

deviation for the entire population 𝜎 is a parameter and you don’t typically know its value, so you can’t use that in your formula. If
that happens, you could do the next best thing, instead of using the “population” standard deviation, 𝜎; you are going to use your
x ̅ −μ x ̅ −μ
“sample” standard deviation s, to estimate it. And instead of σ , you are going to have s where s is your sample
√n √n

You must take note of the change in the formula. The quantity 𝜎 is a constant but you don’t know its value, so you used s which is
standard deviation.

a statistic and this statistic s has a sampling distribution and its value would vary from sample to sample. And so, the quantity
x ̅ −μ
s would no longer have the standard normal distribution. This quantity is labeled as t because it has a t-distribution. When you
√n
x ̅ −μ
are sampling from a normally distributed population, the quantity t = s has the t-distribution with n-1 degrees of freedom.
√n
Note that the number of degrees of freedom is one less than the sample size. So, if the sample size n is 25, the number of degrees of
freedom is 24. Similarly, at t distribution having 16 degrees of freedom, the sample size is 17.
x ̅ −μ
What does the t-distribution look like? If you look at the statistic s , it looks like a z-statistic which has standard normal

distribution except that you replaced the population standard deviation, 𝜎, by the sample standard deviation s. You are estimating a
√n

parameter with a statistic, so there is a greater variability. Hence, your t-distribution is going to look like the normal distribution
except with greater variance.

You have here a plot of standard normal distribution in black and t-distributions with 3, 5, 20, and 30 degrees of freedom in red,
green, violet, and blue respectively. You can see that both the z-distribution and t-distributions are symmetric about 0 and bell-shaped.
But the t-distributions have heavier tails (more area in the tails) and lower peaks.
The exact shape of the t-distribution depends on the degrees of freedom. The figure above tells you that as the degrees of freedom
increase, the t-distribution tends toward the standard normal distribution. At 30 degrees of freedom, the blue curve might look very
close to the normal curve. But if you look very closely, you would see that the t-distribution still has slightly heavier tails and slightly
lower peak. But if you let those degrees of freedom continue to increase, the t-distribution is going to get closer and closer to the
standard normal distribution.
Properties of t-distribution
The t-distribution has the following properties:
1. The t-distribution is symmetrical about 0. That means if you draw a segment from the peak of the curve down to the 0 mark
on the horizontal axis, the curve is divided into two equal parts or areas. The t- scores on the horizontal axis will be divided
also with half of the t-scores being positive and half negative.
2. The t-distribution is bell-shaped like the normal distribution but has heavier tails. That means it is more prone to producing
values that fall far from the mean. The tails are asymptotic to the horizontal axis. (Each tail approaches the horizontal axis but
never touches it.)

3. The mean, median, and mode of the t-distribution are all equal to zero.
where 𝑣 is the number of degrees of freedom. As the number of
v
4. The variance is always greater than 1. It is equal to
v−2
degrees of freedom increases and approaches infinity, the variance approaches 1. Using the formula, if the number of degrees
10 10
of freedom is 10, the variance is = = 1.25
10−2 8
5. As the degrees of freedom increase, the t-distribution curve looks more and more like the normal distribution. With infinite
degrees of freedom, tdistribution is the same as the normal distribution.
6. The standard deviation of the t-distribution varies with the sample size. It is always greater than 1. Unlike the normal
distribution, which has a standard deviation of 1.
7. The total area under a t-distribution curve is 1 or 100%. One can say that the area under the t-distribution curve represents the
probability or the percentage associated with specific sets of t-values.

Identifying Percentiles Using the t-Table


You have learned from the previous lesson the different properties of t-distribution, some of which are very essential in this lesson.
What is the total area under the t-distribution curve? Yes, it should be equal to 1 or 100%. The area under the t-distribution curve also
represents the probability associated with specific sets of t-values. That means given the t-value, you can compute for the area or
probability with the use of a table or software. A t-value or t-statistic tells us how many standard deviations from the mean is the given
score. The set of t-values are usually written below the horizontal axis of the t-distribution curve. Another property of t-distribution is
that the exact shape of the t-distribution depends on the degrees of freedom. Remember that the lesser the degree of freedom, the
lower is its peak and the thicker is its tails. As the degree of freedom increases, the tails become flatter, and the peak becomes higher.
That means, given the area or the probability, the t-value depends on the number of degrees of freedom. For example, with the given
area of 0.05 on the right tail of t-distribution, the t-value is 2.015 with 5 degrees of freedom. But with 20 degrees of freedom, the t-
value is equal to 1.725. The t-Table

In finding the areas and percentiles for a t-distribution you need to familiarize yourself with the t-table. You are going to use
a table that is different from the ztable you used in finding the area under the normal curve.
Below is an example of a t-table. It is a right-tailed t-table because the given areas in this table are areas on the right tail of
the t-distribution. Some t-tables are slightly different in format. Look at the t-table below. In the first column in the leftmost part, you
have the degrees of freedom. It ranges from 1 down to ∞. While the first row in the upper part of the t-table represents the area under
the right tail of the t-distribution. Some of the given areas are from 0.25 down to 0.0005. The rest of the entries in the body of the table
are the values of the variable t (t-values).
By looking at the table, you can see that the t-value for an area of 0.10 in the right tail of the t-distribution with 10 degrees of
freedom is 1.372. This is the intersection of the row containing the 10 degrees of freedom and the column containing the area of 0.10.
Similarly, the area to the right tail of a t-distribution with 15 degrees of freedom corresponding to the t-value of 2.249 is 0.02.
Focus on the row containing 15 degrees of freedom, then look for the t-value of 2.249. The column that you need is the column
containing the area of 0.02.

Identifying Percentiles
1. Using the t-Table A percentile is a value on a t-distribution that is
less than the probability in the given percentage. For example, the
90th percentile of the t-distribution is that tvalue whose left tail
probability is 90% and whose right-tail probability is 10%. Since
the area under the t-distribution curve also represents the
probability, the 90th percentile of the t-distribution is the t-value
whose area on its left tail is 0.90 and whose area on its right tail is
0.10.

2. Find the 95th percentile of a t-distribution with 6 degrees of freedom.


You can plot in a t-distribution and draw what you are looking for.
The 95th percentile is the value of the variable t that has an area of
95% or 0.95 to the left. That value is somewhere roughly near the t-
value of 2. You don’t need to get exactly where it is on t-
distribution when you are drawing it at this point, you are just
looking for the rough idea where it is.

And since the area of the entire curve is 1, this implies that the area
to the right of the 95th percentile is 0.05. Hence, the 95th percentile
is the value of the variable t that has an area of 0.05 to the right.
That means finding the 95th percentile is looking for the t-value
with an area to the right of 0.05 under a t-distribution with 6
degrees of freedom.

So, you are going to focus on the 6


degrees of freedom row, and in the
column containing the area to the
right of 0.05. (The appropriate row
and column were highlighted in red).

From the figure, you can see that the value


that you need is 1.943. Hence the 95th
percentile is 1.943. That means the t-value of
1.943 has 95% of the area to the left of it, or
0.95. Also, you can say that the t-value of
1.943 has an area of 0.05 to its right. And so,
using the t-table, you will find that the 95th
percentile is 1.943.

3. Find the 5th percentile of a t-distribution with 6 degrees of freedom.


The 5th percentile is the value of the variable t that has an area of 5% or 0.05 to the left. And since the area of the entire curve
is 1, you are convinced that the area to the right of the 5th percentile is 0.95. Hence, the 5th percentile is the value of the
variable t that has an area of 0.95 to the right. Therefore, finding the 5th percentile is the same as finding for the t-value with
an area to the right of 0.95 under a tdistribution with 6 degrees of freedom.
But if you look at the given areas in the first row of the t table, there is no entry for an area of 0.95. There is no way you can
find an area of 0.95 because your table is a right-tailed t table. That means it is set to display only the areas under the right tail
of the t distribution.

Also, if you look at your illustration of the 5th percentile below you will
realize that the t-value that you are looking for lies between -1 and -2. Hence
its value should be a negative number. But if you observe the body of the table
where t-values are located, you cannot find any negative t-value. The table
gives only positive values of t.

At this point, you need to recall one of the properties of the t-distribution that
it is symmetric about zero. That means the right tail of the distribution is
exactly the mirror image of its left tail. So, you can easily find the values in the left tail by relying on this “symmetry–about–
zero” property. Hence, if you are going to find the value of t such that the area to the left of it is 0.05, recall that the area to
the right of 1.943 is also 0.05

Therefore, you can say that since the t-distribution is symmetric about 0, the t-value with an area to the left of 0.05 must be -
1.943. So, you will find that the 5th percentile is –1.943.
4. What is the area to the right of 2.4 under a t-distribution with 7 degrees of freedom?

Remember that in the previous example, you found t-values using the given areas under the t-distribution curve. But in this
example, you will be doing the opposite because in this problem you are given a t-value and you need to find the area to the
right of the t-distribution with 7 degrees of freedom.

You can illustrate the problem with the figure shown below. The t-value of 2.4 is somewhere between 2 and 3, and you are
going to find the area to the right of it.

So, looking back at the table, you need to focus on the 7 degrees of freedom line. You will observe that the t-value of 2.4
cannot be found in this row but you do find these two values 2.365 and 2.517 that surround 2.4 (The t-value 2.4 is between
2.365 and 2.517).

The table tells you that the area to the right of 2.365 is 0.025 and the area to the right of 2.517 is 0.02. You figure out earlier
that our t-value of 2.4 falls in between two values 2.365 and 2.517 and it tends to reason then, that the area to the right of 2.4 must be
between those two values 0.025 and 0.02.
So, using the table you found that the area to the right of 2.4 under the tdistribution with 7 degrees of freedom lies
somewhere between 0.02 and 0.025.
If you need to get the exact value, you need to use software that easily calculates the area under the t-distribution curve with
the given t-value and number of degrees of freedom. Using such software, you could find that the area to five decimal places is
0.02373.
What if you needed to use the t-table to find the area to the left of 2.4?
Since the area under the entire curve is 1, the area to the left of 2.4 is equal to 1 minus the area to the right of 2.4. So, based
on the table the area to the left of 2.4 under the t distribution with 7 degrees of freedom must lie somewhere between 0.98 and 0.975 (1
– 0.02 = 0.98 and 1 – 0.025 = 0.975). But since you already knew that the area to the right of 2.4 is 0.02373, you could find the exact
area to the left of 2.4 to five decimal places as 1 minus 0.02373 or 0.97627.
Identifying the Length of a Confidence Interval
What is the difference between the Confidence Level and Confidence interval?

The Confidence level of an interval estimate of a parameter is the probability that the interval estimate contains a parameter,
it describes what percentage of intervals from many different samples contains the unknown population parameter.

The confidence level has its corresponding coefficient which is called confidence coefficients. These coefficients are used to
find the margin of error, for instance, the table below shows the corresponding coefficient confidence level

Confidence interval or interval estimate is a range of values that is used to estimate a parameter. This estimate may or may

Lower limit < 𝜇< Upper limit


not contain the true parameter value. For instance, we write it in this form

Or

The Lower limit is obtained by using the formula LL= 𝑿̅ − 𝑬, while the Upper limit is obtained by using the formula UL=
(Lower limit, Upper Limit)

𝑿̅ + 𝑬, where E is the Margin of Error and 𝑿̅ is the sample mean


As mentioned earlier, the confidence coefficient is used on finding the margin of error, which is the range of values above
and below the sample statistic. For instance, Margin of error is obtained using the formula:

where,

𝑧a/2= confidence coefficient


n = sample size

𝜎 = population standard deviation


E = margin of error
But with this lesson, the margin of error will be given as well as the sample mean

Example:
A random sample of 46 scores from the examination of ABM learners is taken and it gives a sample mean of 78 with the
interval scores between 77.18 and 78.82 having a 90% level of confidence.

 Which of the following is the 𝑥̅in the given statement?


Let’s answer the questions!

Since it is given in the statement above, the sample mean is 78.


 What is the upper limit? What is the lower limit?
The upper limit is 78.82 while the lower limit is 77.18
 What is the margin of error in the given statement?

earlier the formula of the upper limit and the lower limit includes the Margin of error. LL= 𝑿̅ – E
As we can see, the Margin of error is not directly mentioned, but the lower limit and upper limit is there. As mentioned

The Lower limit and mean are given so…


77.18 = 78 – E
E = 78-77.18
E = 0.82

78.82 = 𝑿̅ + E
Let’s see if we can get the same value of E if we use the formula for the upper limit.

The Lower limit and mean are given. So, we have


78.82 = 78 + E
E = 78.82-78

 What is the confidence interval in the given statement? To find the confidence interval, we have to use Lower limit < 𝝁<
E = 0.82 Therefore, the margin of error is 0.82

77.18 < 𝝁< 78.82 or (77.18, 78.82)


Upper limit and substitute the given data. We have,

So, the Confidence interval is between 77.18 and 78.82.


 What is the confidence level? How will you conclude?
The confidence level is 90%. So, we are 90% confident that the mean score lies between 77.18 and 78.82.

Note: Sometimes, you just need to convert the formula to find what is missing.

Computing for the Length of the Confidence Interval


Confidence level is the likelihood measure of the confidence interval that is represented by a percentage that refers to all
possible samples that can be estimated to contain the true population parameter. The most common values of the level of confidence
are shown in the Zc table below.
Confidence coefficient or critical values expressed as zc. Confidence interval is also called the interval estimate. It is a range
of values that is used to have an approximate boundary or parameter. However, this approximate may or may not contain the correct
or true parameter value.

The margin of error is the range of values above and below the given statistical number or sample in a confidence interval.
To compute for the margin of error, use the formula given below:

Where, 𝒛a/2 means the critical values or confidence coefficients,  is the population standard deviation and n as the sample
size.
Consider the given example below.

Example 1:
Isabel owns a shoe store. She used 160 pairs of shoes as her samples for the different designs. The population standard
deviation of the price of the shoes is ₱75. Suppose that Isabel wants a 95% level of confidence to determine the mean price of all her
shoes she is selling. Compute for the margin of error of her estimate.

Solution:
Step 1: Write the given data. n = 160  = ₱75 95% confidence level where zc = 1.96
Step 2: Apply the formula and substitute the given data.

COMPUTING THE LENGTH OF CONFIDENCE INTERVAL


Computing for the margin of error and confidence interval were discussed in the previous module. We will use the formulas
presented in the previous module to compute the length of the confidence interval.

Lower limit < 𝑋̅< Upper limit


Recall that the confidence interval can be written in the form of

or

The lower limit is obtained by using the formula LL= 𝑿̅ − 𝑬, while the upper limit is obtained by using the formula UL= 𝑿̅
(Lower limit, Upper Limit)

+ 𝑬, where E is the Margin of Error and 𝑿̅ is the sample mean.


Meanwhile, the margin of error is obtained using the formula:

𝑧𝛼/2= confidence coefficient


where,

𝜎 = population standard deviation E = margin of error


n = sample size

Therefore, the length of the confidence interval (L) is simply:


L = UL – LL
where, UL is the upper limit and LL is the lower limit of the confidence interval.
However, if the sample mean is not given, we cannot compute for the upper and lower limit of the confidence interval, the
following formula can be used:

where E is the margin of error, 𝑧𝛼/2 is the confidence coefficient, 𝜎 is the population standard deviation, n is the sample
size and 𝑛 0

Example 2:
The population of Sulu Hornbill (one of the endangered bird species in the Philippines) has a standard deviation of 40.
Compute for the length of the confidence interval for a 90% confidence level having a sample size of 150 and a sample mean of 65.
Solution:

n = 150 𝑥̅= 65  = 40 90% confidence level where zc = 1.645


Step 1: Write the given data.

Step 2: Compute for the margin of error.

UL = 𝑿̅ + 𝑬 LL = 𝑿̅ − 𝑬
Step 3: Compute for the upper and lower limit of the confidence interval.

= 𝟕𝟎. 𝟑𝟕 = 𝟓𝟗. 𝟔𝟑
= 65 + 5.37 = 65 − 5.37

Step 4:Compute for the length of the confidence interval.


L = UL – LL
= 70.37-59.63
= 10.74
Knowing that the length of the confidence interval is just twice the margin of error, a shorter solution can be used using the

formula: L = 𝟐𝒛a/2(
σ
) = 2E. Therefore, L = 2 (5.37) =10.74
√n
Example 3:
Jennifer wanted to know the average price of shoes that her customer purchased. She sampled 160 pairs of shoes that were
sold and found out that the mean average price is ₱800 with a standard deviation of ₱75. Construct a 95% confidence interval for the
mean price of all shoes that were sold. Compute for the length of the confidence interval.
Solution:

n = 160 𝑥̅= ₱800  = ₱75 95% confidence level where zc = 1.96


Step 1: Write the given data,

Step 2: Compute for the margin of error,

E = 𝒛a/2(
σ
)
√n
E = (1.96)( 75/√160 )
E= (1.96)(5.929) (use three decimal places for partial answer)

Step 3: Compute for the upper and lower limit of the confidence interval. UL = 𝑿̅ + 𝑬 LL = 𝑿̅ − 𝑬 = 800 + 11.62 = 800 − 11.62 =
E = 11.62 (round off final answer to two decimal places)

811.62 = 788.38
Step 4: Write the confidence interval.
788.38< 811.62 or (788.38, 811.62)
This means that we are 95% confident that the true mean lies between 788.38 and 811.62. In the context of this problem, Jennifer can
state that she is 95% confident that the average price of a pair of shoes purchased by her customers lies between ₱788.38 to ₱811.62 or
₱788 to ₱812 when rounded to the nearest peso.
Step 5: Compute for the length of the confidence interval.
L = UL – L
= 811.62 – 788.38

Or since 𝐸 = 11.62, 𝐿 = 2(11.62) = 23.24


= 23.44
Statistics and Probability

Name: _____________________________ Date: __________


Strand/Section: ________________________ Score: _________

I Direction: Solve each of the following problems. All answers should be in two-decimal places.

1. The IQs of Grade 11 students in MAKATAO NATIONAL HIGH SCHOOL were measured and found to be normally distributed
with a mean of 98 and a standard deviation of 8
a. If a student from the school is chosen at random, what is the probability that his score is higher than 110?
________________________________________________________________________
b. What is the probability that a random sample of 4 students will have an average of above 110?
________________________________________________________________________

2. The mean annual salary of all the frontlines (nurses, medical technologists, radiologic technologists, phlebotomists) in the
Philippines is Php 42,500. Assume that this is normally distributed with standard deviation Php 5,600. A random sample of 25 health
workers is drawn from this population, find the probability that the mean salary of the sample is:
a. between Php 40,400 and Php 45,000?
_________________________________________________________________________
b. greater than Php 41,000?
________________________________________________________________________

II Directions: Answer the following problems.


1. Find the values of t for which the area on the right tail of the t-distribution is 0.05 and the number of degrees of freedom is
equal to: a. 15 b. 28

2. Find the 99th percentile of the t-distribution with 18 degrees of freedom.

3. Find the 90th percentile of the t-distribution if the sample size is 25.

III Directions: Compute the length of the confidence interval for estimating the population mean using a sample size of 300 and with a
standard deviation 84. Use a 92% confidence level.
Write all the given data based on the problem statement and show the complete solution.
n = __________ formula to be used: ________________
 = __________
solution: zc = __________

I Directions: Identify the random sampling technique used in each item.


________________1. You are given a list of all graduating students in your school. You decide to survey every 10th student on the
list and ask them the organization that they belong.
________________2. You wish to make a comparison of the gender differences in Mathematics performance. You divide the
population into two groups, male and female, and randomly pick respondents from each of the group.
________________3. You assign numbers to the members of the population and then use draw lots to obtain your samples to answer
your survey on the most popular festivals in the country.
________________4. You randomly pick five out of fifteen barangays to conduct your survey in your municipality or city about their
best environment-friendly practices.
________________5. You write the names of each student in pieces of paper, shuffles, and then draw eight names to answer a survey
on their ethical media practices.

You might also like