STA 132: Lecture Notes
Course Title: Laboratory For Inference
Course Code: STA132
Credit: 2
Status: Core (for Statistics Major); Required or Optional (for Non-statistics Major)
Course Lecturer: Professor W. B. Yahya
Course Contents:
Presentation and analysis of data. Curve fitting and goodness of-fit tests. Construction of
questionnaires and simple index numbers. Use of random numbers and statistical tables. 90h(P);
C
SOME DEFINITIONS
Statistics (As a field of study): Statistics is the science of collecting, organizing, summarizing,
analyzing, and making inferences from data.
The subject of statistics is divided into two broad areas which are the Descriptive and Inferential
statistics.
Population and Sample
A population consists of all elements that are being studied. For example, we may be interested
in studying the distribution of scores obtained by all the students that offered STA132 during the
2018/2019 academic session.
1
A sample is a subset of the population. For example, we may be interested in studying the
distribution of scores obtained by 100 randomly selected students that offered STA132 during the
2018/2019 academic session.
Since parameters are descriptions of the population, a population can have many parameters.
Similarly, a sample can have many statistics.
Data, Parameter and Statistics:
In order to obtain information, data are collected from variables used to describe an event. Data
are the values or measurements that variables describing an event can assume.
Data are individual pieces of factual information recorded and used for the purpose of analysis. It
is the raw information from which statistics are created. In other words, statistics are the
characteristic of or a fact about a sample.
Both populations and samples have characteristics that are associated with them. These
are called parameters and statistics, respectively.
A parameter is a characteristic of or a fact about a population. For instance, the average age of
students in Nigerian Universities is µ, say 29 years.
A statistic is a characteristic of or a fact about a sample. For instance the average age of randomly
selected n (e.g. 15) students of University of Ilorin is , say 26 years. Loosely speaking, Statistics
can be regarded as the results of data analysis. We talk of statistics after some computation have
taken place that provide some understanding of what the data means.
Difference Between Population and Sample
2
Example 1: Suppose is the (random) variable that describes the scores obtained students in
STA132 first CA test. Then, the scores obtained by 10 randomly selected students in this test
represented by , = 1, 2, … , 10 are given as follows:
: 20, 4, 21, 19, 12, 7, 8, 25, 17, 11.
Example 2: Sample Mean ( ) is a statistic.
The sample mean scores of students in Example 1 is determined by =∑ = 20.
Realizations: These are specific values of a random variable. For instance, in Example 1 above,
: 20, 4, 21, … are realizations which are all specific values of random variable .
Types of Variables/Data
The manner in which you analyze data depends on the type of data/variables that you are
evaluating. There are several different classifications that are used in classifying data.
Variable
A variable is an item of data that varies from one subject/unit/observation to another.
Examples of variables include quantities such as: gender, investment type, test scores, and
weight.
For instance, if represents the monthly salary of academic staff in Nigeria University, then
is a variable.
Note: Variables whose values are determined by chance are called random variables.
Types/Classifications of Variables
Qualitative: Non-numerical quality
Quantitative: Numerical
Discrete: counts
Continuous: measures
3
Qualitative Variable/Data
Qualitative data are data values that can be placed into distinct categories, according to some
characteristic or attribute.
This variable describes the quality of something in a non-numerical format. Example: Colour
(Red, Black, ….); Gender (Male, Female); Class of Degree (First Class, Second Class upper,
….);
Counts can be applied to qualitative data, but you cannot order (if purely nominal) or measure
this type of variable. Examples are gender, marital status, geographical region of an
organization, job title….
Example 3: Distribution of Colour of cars owned by academic staff in the Faculty of Physical
Sciences, University of Ilorin.
Car Colour Red Blue Yellow Green White
Frequency 20 54 12 56 3
Qualitative data is usually treated as Categorical Data.
With categorical data, the observations can be sorted according into non-overlapping
categories or by characteristics.
For example, shirts can be sorted according to color; the characteristic 'color' can have
non-overlapping categories: white, black, red, etc. People can be sorted by gender with
categories male and female.
Categories should be chosen carefully since a bad choice can prejudice the outcome.
Every value of a data set should belong to one and only one category.
Measurement Scale
4
Nominal: classifies with no ranking (e.g. color, investment type...)
Ordinal: classifies with ranking (e.g. product satisfaction, grades…)
Analyze qualitative data using:
Frequency tables, Contingency tables (for 2 variables)
Modes - most frequently occurring
Graphs: Bar Charts, Pie Charts, Pareto Charts
Quantitative Data
Quantitative or numerical data arise when the observations are frequencies or measurements
that are numeric.
Discrete Data
The data are said to be discrete if the measurements are integers (e.g. number of
employees of a company, number of incorrect answers on a test, number of
participants in a program…)
Continuous Data
The data are said to be continuous if the measurements can take on any value,
usually within some range (e.g. weight). Age and income are continuous quantitative
variables. For continuous variables, arithmetic operations such as differences and
averages make sense.
Analysis can take almost any form:
Create groups or categories and generate frequency tables.
Effective graphs include: Histograms, Stem-and-Leaf plots, Dot Plots, Box plots,
and XY Scatter Plots (2 variables).
All descriptive statistics can be applied.
Measurement Scale
Interval: ordered and difference between variables is meaningful (e.g. standardized
scores...)
Ratio: ordered and difference between variables is meaningful, true 0 in measuring
Note: Some “quantitative” variables can be treated only as ranks; they have a natural order, but
these values are not strictly measured (ordinal data). Examples are: 1) age group (taking the
values child, teen, adult, senior), and 2) Likert Scale data (responses such as strongly agree,
5
agree, neutral, disagree, strongly disagree). For these variables, the distinction between adjacent
points on the scale is not necessarily the same, and the ratio of values is not meaningful.
Analyze using:
Frequency tables
Mode, Median, Quartiles
Graphs: Bar Charts, Dot Plots, Pie Charts, and Line Charts (2 variables)