VARIABLE AND DATA
Introduction
• A variable is a characteristic that changes (or varies)
over time and/or for different individuals or objects
under consideration
• Examples: Hair colour, Height, white blood cells count,
time to failure of a computer component.
Cont
• Data is the information coming from observations, counts,
measurements, or responses about a variable
Introduction to Data
• Definition: Data is a collection of facts or information used for
analysis.
• Forms of Data: Numbers, text, images, sounds, .
Types of VARIABLES
• The two types of variables are Qualitative and Quantitative
• Qualitative variables are variables which measure a quality or
characteristic and are also called categorical variables.
• Examples include Hair colour (black, brown, blonde...), Make of car
(Toyota, Nissan, Honda, Ford...) and Gender (male, female).
• Quantitative variables are variables which measure a numerical quantity
• Discrete and continuous variables are the two types of quantitative
variables.
• A discrete variable is one which can assume only a finite or countable
number of values (integers only).
Categories of variables in data
collection
• 1. Independent Variable
• Definition: The variable that is manipulated or changed in an experiment.
• Example: Time spent studying.
• 2. Dependent Variable
• Definition: The variable that is measured or observed.
• Example: Test scores.
• 3. Control Variables
• Definition: Variables kept constant to ensure a fair test.
• Example: Classroom environment.
Levels of Measurement
• 1. Nominal Scale
• Definition: Categorizes data without a specific
order.
• Nominal scales are used for labeling variables,
without any quantitative value.
• Examples: Gender (Female = 1, Male = 2), Religious
Affiliation (Catholic = 1, Protestant = 2, Jewish = 3,
Muslim = 4, Other = 5) and Hair Colour (Brown = 1,
Black = 2, Blonde = 3, Gray = 4, Other = 5).
Levels of
Measurement…//CON’D
• 2. Ordinal Scale
• Definition: Categorizes data with a meaningful order but no
consistent difference between categories.
• This is a numeric scale in which we know not only the order,
but also the exact differences between the values.
• A good example of an interval scale is the Celsius temperature
in which a 10◦C difference has the same meaning anywhere
along the scale. For example, the difference between 10◦C
and 20◦C is the same as between 80 and 90 ◦
• Example: Customer satisfaction ratings (e.g., poor, fair, good
Continue
• Interval Scale
• Definition: Numeric scale with equal intervals but no true zero.
• Example: Temperature in Celsius.
• 4. Ratio Scale
• Definition: Numeric scale with a true zero point, allowing for comparison of
magnitudes.
• Ratio scales have all the attributes of interval scale variables. In addition, ratio
scales have absolute zero.
• Example: Height, weight.
FREQUENCY TABLE
• The frequency of a particular data value is the number of times the
data value occurs.
• example ;Present the following data in a frequency distribution table
• (2.4 ,2.4 ,2.4, 2.5, 2.6, 2.6 ,2.7, 2.7, 2.7, 2.7, 2.8, 2.8, 2.9, 3.0, 3.0, 3.0)
CLASS WORK
• Lets do relative frequency and cumulative frequency
DATA VISUALISATION
• Definition:
• Graphical data representation uses visual formats to present data clearly,
making complex information more digestible.
• Importance:
• These visual tools help identify trends, patterns, and anomalies quickly,
facilitating better communication of data insights.
• Using the example of the frequency table above draw a pie
chart,histogram and bargraphs
STATISTICAL INFERENCES
• Definition:
• Inferential statistics is a branch of statistics that makes the use of
various analytical tools to draw inferences about the population data
from sample
• statistical inference involves using data collected from a sample to
draw conclusions about a larger population. This process helps to
estimate population parameters and make predictions.
• Statistical inference is vital in research and decision-making across
various fields, including healthcare, business, and social sciences. It
enables analysts to understand trends, test theories, and evaluate
hypotheses.
DEFINITIONS
• Population:The entire set of individuals or items that we want to
learn about. This could be all people, animals, or objects in a defined
group.
• Example: For a political survey, the population might consist of every eligible
voter in a country.
• Considerations: Often large and challenging to access fully.
• Sample: A smaller group selected from the population, intended to
represent the characteristics of the whole.
CONT
• Example: A sample might include 1,000 randomly chosen voters from the larger population
• Cost-effective:
• Collecting data from a sample is generally much less expensive than surveying an entire
population, making research feasible for many organizations.
• Time-saving:
• Data can be gathered and analyzed much quicker with a sample, allowing for faster decision-
making.
• Feasibility:
• In many scenarios, it’s not realistic to reach every individual in a population due to
time, budget, or logistical constraints. Sampling provides a manageable alternative
SAMPLING TECHNIQUE
• Simple random technique
• Stratified sampling ;population is divided into groups,depending with
their common characteristics
• Systematic sampling; for example every 10nth member being selected
• Cluster sampling ;for example northern region member
• Convenience sampling
Introduction to Normal
Distribution
• A normal distribution is a continuous probability distribution
characterized by its bell-shaped curve, where most observations
cluster around the central peak
• Many statistical methods, including hypothesis testing and confidence
intervals, assume that the data follows a normal distribution, making
this concept foundational for inferential statistics
Characteristics of Normal
Distribution
• The distribution is symmetrical about the mean, meaning the left and right
sides are mirror images. The mean, median, and mode are all located at the
center.
• The shape is determined by the mean (average) and the standard deviation
(measure of spread).
• Its basic foundation its mean,standard deviation,range
EXERCISE
1. Present the following data in a frequency distribution table;
2.4 2.4 2.4 2.5 2.6 2.6 2.7 2.7 2.7 2.7 2.8 2.8 2.9 3.0 3.0 3.0
Come up with relative frequency and cumulative frequency
2.Using the following data set of the scores. 61 85 100 86 72
75 81 90 100 64 87 73 90 87 93 83
Find the mean, Median ,Mode, variance and standard
deviation .
Graphical data presentation and
interpretation
• Categorical data can be presented on a pie chart or bar chart.
• A pie chart is the familiar circular graph that shows how the
measurements are distributed among the categories
• A bar chart shows the same distribution of measurements in
categories, with the height of the bar measuring how often a
particular category was observed
Example 1
• A bag contains 25 balls with different colours as in the table below.
Draw the pie chart and bar chart for the data
Cont…
Colour Frequency
Red 3
Blue 6
Green 4
Orange 5
Brown 3
Yellow 4
Steps to draw pie chart
• First come up with relative frequency
• Multiply the answer by 100
• Use the answer above using a protector to draw a pie chart
Below is a pie chart
Bar graph
Correlation
• Introduction to Correlation
• Definition: Correlation measures the strength and direction of a linear
relationship between two variables.
• Importance: Used in statistics to identify relationships that can inform
decisions and predictions