STAT 11613 Fundamentals of Statistics
Chapter 1
Introduction to Statistics
Definition of Statistics:
Statistics is a science dealing with the collection, classification, analysis, interpretation of
numerical data for drawing conclusions on the basis of their probability.
Some Basic Terminology
Population
By saying population, it means the entire set of all organisms, objects or events belong to a study.
Each member of the population must be clearly defined so it can be known with certainty whether
or not any given individual or event belongs to that population.
Sample
A sample is any subgroup or subset of a population. Most of the population in real, situations are
so large that it is impractical to observe all of its members and scientists therefore resort to observe
a relatively small number, termed a sample, which serves to represent that population. The
characteristics of many populations can never be known in the sense of having been directly
observed, but rather they are inferred form measures taken on samples.
Sample
Any subgroup
Population
Entire set of all organisms,
DSCS 1 2022
STAT 11613 Fundamentals of Statistics
Parameter
A parameter is a numerical term that summarizes or describes a population.
e.g. Population mean
Statistic
A statistic is a numerical term that summarizes or describes a sample. Statistics are obtained from
samples and are used to estimate population parameters. A parameter is a purely descriptive term,
but a statistic is both a descriptive term (because it describes a sample characteristic), and an
estimate of the corresponding population characteristic.
e.g. sample mean
Population
Sampling Inference
Sample
Exercise:
1. Identify the population and the parameter of interest in following research studies.
(a) Finding the average z-score of STAT 11514 students?
(b) Proportion of dengue patients died in 2017?
(c) Finding the average GPAs of undergraduates who got university colors 2019?
2. An election will be held next week and, by polling a sample of the voting population, we
are trying to predict whether the JVP or Good governess candidate will win in the
Provincial Election. Which of the following methods of selection is likely to yield a
representative sample?
(a) Poll all people of voting age leaving a dinner dance in Five Star Hotel in Galle face.
(b) Obtain a copy of the voter registration list, randomly choose 100 names, and
question them.
DSCS 2 2022
STAT 11613 Fundamentals of Statistics
(c) Use the results of a television call-in poll, in which the station asked its listeners to
call in and name their choice.
(d) Choose names from the telephone directory and call these people.
(e) Poll people who participate rally organized by JVP.
Simple Random Sample:
A simple random sample of size n is a sample that has been selected from a population in such a
way that each possible sample of size n has an equally likely chance of being selected.
Example: To understand the nature of a simple random sample, think of a lottery. Imagine that
10,000 lottery tickets have been sold and that 5 winners are to be chosen. What is the
fairest way to choose the winners? The fairest way is to put the 10,000 tickets in a drum,
mix them thoroughly, and then reach in and one by one draw 5 tickets out. These 5
winning tickets are a simple random sample from the population of 10,000 lottery
tickets. Each ticket is equally likely to be one of the 5 tickets drawn. More importantly,
each collection of 5 tickets that can be formed from the 10,000 is equally likely to
comprise the group of 5 that is drawn. It is this idea that forms the basis for the definition
of a simple random sample.
Exercise:
(1) A physical education professor wants to study the physical fitness levels of students at her
university. There are 20,000 students enrolled at the university, and she wants to draw a
sample of size 100 to take a physical fitness test. She obtains a list of all 20,000 students,
numbered from 1 to 20,000. She uses a computer random number generator to generate
100 random integers between 1 and 20,000 and then invites the 100 students corresponding
to those numbers to participate in the study. Is this a simple random sample?
(2) A quality engineer wants to inspect rolls of wallpaper in order to obtain information on the
rate at which flaws in the printing are occurring. She decides to draw a sample of 50 rolls
of wallpaper from a day’s production. Each hour for 5 hours, she takes the 10 most recently
produced rolls and counts the number of flaws on each. Is this a simple random sample?
(3) Suppose there are 850 students in a school from which a sample of 10 students is to be
selected. The students are numbered from 1 to 850. Since the population runs into three
digits, used random numbers that contain three digits in a random number table. All
numbers exceeding 850 are ignored because they do not correspond to any serial number
in the population. If the same number occurs again, the repetition is ignored. Following
these rules, select 10 students for the sample.
DSCS 3 2022
STAT 11613 Fundamentals of Statistics
Data and raw data
In science, observation is tantamount to measurement, and measures are usually expressed as
numerical values that are termed data. Data may be defined as any recorded observations, although
for our purposes these will invariably take numerical form. The data are plural and refers to a group
of observations; any particular observation is called a datum. When observations are recorded and
gathered together they are termed the raw data. These are the observations/measures just they were
obtained and have not had anything done to them.
How to collect data?
Observational studies:
An observational study measures the characteristics of a population by studying individuals in a
sample, but does not attempt to manipulate or influence the variables of interest.
Example: surveys, average daily temperature in Colombo, relationship between class
attendances and final exam score, Studies to determine the effect of cigarette smoking
on the risk of lung cancer.
Designed experiments/ Controlled experiments:
These experiments are designed to determine the effect of changing one or more factors on the
values of a response. In designed experiments researchers make purposeful changes in controllable
variables and then observe characteristics and take measurements on experimental units. (These
are called controlled experiments, since values of the factors are under the control of the
experimenter.)
Example: the effect on yield was compared for three different varieties of tea and four different
concentrations of fertilizer.
Experimental units: The individuals or items on which the experiment is performed.
Plots of Tea
Response variable: The variable of interest to be measured in the experiment
Yield
Explanatory Variable (factor): A variable whose effect on the response variable is of
interest in the experiment.
Variety of Tea and Concentration of Fertilizer
When designed and conducted properly, controlled experiments can produce reliable information
about cause-and-effect relationships between factors and response.
Probably the biggest difference between observational studies and designed experiments is the
issue of association versus causation. Since observational studies don't control any variables, the
results can only be associations. Because variables are controlled in a designed experiment, we
can have conclusions of causation.
DSCS 4 2022
STAT 11613 Fundamentals of Statistics
Exercise
1. A study considered a random sample of adults and asked them about their bedtime habits.
The data showed that people who drank a cup of tea before bedtime were more likely to go
to sleep earlier than those who didn't drink tea. What type of a study is this?
a) Observational study
b) Design of experiment
2. A research study considered a group of adults and randomly divided them into two groups.
One group was asked to drink tea every night for a week, while the other group was asked
not to drink tea in that week. Researcher then compared when each group felt asleep. What
type of a study is this?
a) Observational study
b) Design of experiment
Variable
A variable is any observable/measurable property of organisms, objects or events such that
individuals may differ in the amount or kind, of this property. The behavior or property under
investigation is considered as the variable of interest.
Variable
Qualitative Quantitative
Nominal Ordinal Discrete Continuous
Qualitative variable
A qualitative variable is a distinction of kind, and not amount. Qualitative measurement consists
of classification into categories such as when people are classified as being male or female. The
designation of male and female do not imply different amount of the variable of gender but rather
indicate different kinds or qualities of this variable. These variables also called as categorical
variables.
Nominal variable
Nominal variables have two or more categories without having any kind of natural order.
DSCS 5 2022
STAT 11613 Fundamentals of Statistics
Ordinal variable
An ordinal variable is a categorical variable for which the possible values are ordered.
Quantitative variable
A quantitative variable is one in which the number derived from the measurement reflects the
amount of the property in question. Height is a quantitative measurement. Height is expressed as
the number of measurement units such as centimeters and this numerical score corresponds to the
actual physical size of the object. Quantitative measurement is the assignment of numerical
quantity to the variable and is what we ordinarily understand the act of measurement to mean.
Continuous variable
A continuous variable is one that may assume any value between maximum and minimum
limits. Height is an example of a continuous variable, since within limits any value is
possible.
Discrete variable
A discrete variable is one that can only assume certain numerical values such as being
restricted to whole numbers.
Exercise: Classify variables in following examples:
(1) Following table contains results of some students along with their Z-score for the G.C.E.
A/L examination.
Reg_No St_Name Gender Mathematics Physics Chemistry Z_Score
PS/2018/001 A.M. Bandara M C C C 1.269
PS/2018/002 K.D.A.M. Abeyrathna M A B A 1.852
PS/2018/003 R. Wicramasinghe M B C C 1.360
PS/2018/004 W.P.R.Kumara M A A A 1.982
PS/2018/005 A.V.H.Udumulla F C S C 0.975
PS/2018/006 W.A.Nisansala F B C C 1.324
PS/2018/007 J.J. Thomson M B A C 1.425
PS/2018/008 A.K.M.Vidyarathna F A C S 1.011
PS/2018/009 K.L.Edirisinghe F B A C 1.580
PS/2018/010 A.G.H.Kawshalya F C B A 1.365
(2) Following table contains data of some vehicles belongs to a company
Number of Number of
Vehicle Type Make Model Colour Fuel Type
Passengers Km
Car Toyota Corolla Maroon Petrol 4 12000.85
Van Toyota Dolphin White Diesel 16 86254.14
DSCS 6 2022
STAT 11613 Fundamentals of Statistics
Car Nissan Leaf Blue Petrol 4 2350.62
Car Toyota Corolla Silver Petrol 4 15231.08
Cab Toyota Hilux Red Diesel 5 14235.82
Lorry Toyota DYNA White Diesel 2 74325.15
Bus Mitsubishi Rosa White Diesel 28 57825.22
Car Nissan Sylphy Silver Petrol 4 29625.43
Lorry Toyota DYNA Blue Diesel 2 115296.48
Lorry Toyota DYNA White Diesel 2 68549.75
Objective observation
All science may be observations but all observations are not science. Objectivity is the special
quality of scientific observation. An objective observation is one that is not in any way affected by
the opinions, values, or biases of the observer.
Subjective observation
A subjective observation is one that reflects the observer’s personal point of view; clearly, there
can be no science if the raw data are a matter of opinion.
Why do we use Statistics?
The statistical methods are used for two main purposes:
Description.
Inference.
Descriptive Statistics
Descriptive statistics consists of the techniques for organizing, summarizing and extracting
information from numerical data.
Inferential Statistics
Inferential statistics is the body of rules and procedures by which the general statements are made
about people or events. If the statements are made only about those individuals or events that have
been directly observed, science would be impractical. Statistics provides us with procedures for
making predictions based on observed data and for interpreting the outcome of experiments
designed to test predictions.
DSCS 7 2022
STAT 11613 Fundamentals of Statistics
Summarizing altogether, statistics may be considered as a study of techniques for
Collecting data,
Presenting data,
Summarizing data,
Analyzing data,
Interpreting data,
Communicating decisions based on the observed data.
How to carry out a proper statistical investigation
Write down the problem in study very clearly.
Write down your objective/s.
Define the parameter/s in your problem.
Decide the statistical method/s which, you are going to use for estimating the parameter/s.
Collect the data required only for your statistical analysis.
Analyze your data by using the method decided at the beginning.
Draw conclusion/s based on the analysis.
Raise questions Decide on variables Identify the
for which we that must be population of
seek answers measured to answer interest
questions
Answer questions Select a method
raise initially and for obtaining a
plan a course of representative
action sample
Obtain
Using theory of Calculate appropriate
information on
probability make summary statistics on
variables from the
inferences about each variable measured
selected sample
the population in the sample
DSCS 8 2022
STAT 11613 Fundamentals of Statistics
Exercise:
Determine the population, sample, variable/s and parameter/s of interest under study in the
following situations:
(1) Suppose you read an article in a local newspaper and found that they have mentioned that
the average college student plays 2 hours of video games per week. To test whether this
claim is true for your school, you randomly approach 20 fellow students and ask them how
long (in hours) they play video games per week. You find that the on average, a student
plays video games for 1 hour per week among those you questioned.
(2) Small farmers in a certain village registered at Farmers’ Corporative Organization, which
provides agricultural assistance to them. To get better return on their investment, the
Farmers’ Corporative Organization conducts a study on pineapple in an experimental field
to see how long it takes the fruit to mature (measured in days from the time of plantation)
with a particular fertilizer.
DSCS 9 2022