Chapter one
Introduction to Biostatistics
Samrawit .F (MSc. in Biostatistics)
Oct ,2024
Course objective
At the end of this session you will be able to understand the following:
Definition and classification of statistics
Rationale for studying biostatistics
Basic data types/variables
Qualitative & quantitative variables
Discrete Vs. continuous
Scales of measurement
The tools of statistics are employed in many fields such as business,
education, psychology, agriculture, economics, …….
when we employ the application of statistical methods to the fields
of biology and Health Sciences we use the term Biostatistics.
3
What is Biostatistics?
Statistics: is a field of study concerned with
1- collection, organization, summarization and analysis of data.
2- drawing of inferences about a body of data when only a part
of the data is observed.
It helps us use numbers to communicate ideas.
Why Study Statistics? Answers provided by statistical approaches
can provide the basis for making decisions or choosing actions.
4
Branches of Statistics
1. Descriptive statistics: is concerned with the organization,
presentation, and summarization of data.
Helps to identify the general features and trends in a set of
data and extracting useful information
Also very important in communicate the final results of a study
5
2. Inferential statistics: deals with techniques of making
conclusions about the population based on the information
obtained from a sample drawn from that population.
The inferences are drawn from particular properties of sample
to particular properties of population.
Inferential statistics builds upon descriptive statistics.
Includes: Making inferences, Estimation, hypothesis testing,
determining relationships, making predictions, etc.
6
Provides methods of organizing information
Assessment of health status
Resource allocation (planning)
Magnitude of association between exposure and outcome
Strong vs weak association between exposure and
outcome
7
Assessment of risk factors
Cause & effect relationship
Evaluation of new vaccine or drug
How effective is the vaccine (drug)?
Is the effect due to chance or some bias?
Drawing of inferences
Information from sample to population
Essential for understanding, appraisal and critique
of scientific literature
8
Data, Variables, population, Sample, parameter, Statistic
Data: are numbers which can be obtained from taking
measurements or can be obtained by counting or observation.
Numerical descriptions of things
The raw material for statistics.
9
Data
We may define data as figures. Figures result from
the process of counting or from taking a
measurement.
Example:
When a hospital administrator counts the number
of patients (counting).
When a nurse weighs a patient (measurement)
Sources of
data
Records Surveys Experiments
Comprehensive Sample
11
Source of Data
We search for suitable data to serve as the raw
material for our investigation.
Such data are available from one or more of the
following sources:
cont’d..
1- Routinely kept records.
Example:
Hospital medical records contain immense
amounts of information on patients.
Hospital accounting records contain a wealth of
data on the facility’s business activities.
cont’d...
- Surveys:
The source may be a survey, if the data needed is
about answering certain questions.
Example:
If the administrator of a clinic wishes to obtain
information regarding the mode of transportation
used by patients to visit the clinic, then a survey
may be conducted among patients to obtain this
information.
cont’d..
3- Experiments.
Frequently the data needed to answer a question
are available only as the result of an
experiment.
For example:
If a nurse wishes to know which of several
strategies is best for maximizing patient compliance,
she might conduct an experiment in which the
different strategies of motivating compliance are tried
with different patients.
Variable
• Variable: A characteristic which takes different values in
different persons, places, or things.
• Any aspect of an individual or object that is measured (e.g. BP)
or recorded (e.g. age, sex) and takes any value.
• There may be one variable in a study or many.
• E.g. A study of treatment outcome of TB
sex, weight (kg), smear result (Positive, negative or uncertain), culture
result (negative, positive), cured after 6 months (yes/no).
16
Types of variables
Quantitative Qualitative
Quantitative Variables Qualitative Variables
It can be measured in the Many characteristics are not
usual sense. capable of being measured.
Some of them can be ordered or
For example: ranked.
the heights of adult males,
the weights of preschool For example:
children, classification of people into socio-
the ages of patients seen in economic groups,
a social classes based on income,
dental clinic. education, etc.
1. Categorical variable: A variable which can not be measured in
quantitative form but can only be sorted by name or categories
• Not able to be measured as we measure height or weight
• The notion of magnitude is absent or implicit.
18
Categorical variable is divided into two:
1. Nominal:
• The simplest type of variable, in which the values fall into
un-ordered categories or classes
• Uses names, labels or symbols to assign each
measurement.
– Examples: Blood type, sex, race, marital status
19
2. Ordinal:
• Assigns each measurement to one of a limited number of
categories that are ranked in terms of order.
• Although non-numerical, can be considered to have a
natural ordering
– Examples: Patient status, cancer stages
20
2. Quantitative variable: A variable that can be measured or
counted and expressed numerically.
• Height, weight, # of children, etc.
• Has the notion of magnitude.
21
Quantitative variable is divided into two:
1. Discrete: It can only have a limited number of discrete values
(whole numbers).
– E.g. the number of episodes of diarrhoea a child has had in a
year. You can’t have 12.5 episodes of diarrhoea
• Characterized by gaps or interruptions in the values.
• Both the order and magnitude of the values matter.
• The values are not just labels, but are actual measurable quantities.
22
2. Continuous variable:
It can have an infinite number of possible values in any given
interval.
• Both the magnitude and the order of the values matter.
• Does not possess the gaps or interruptions
• E.g. Weight, Height, etc.
23
Types of variables &
scale of measurement
Quantitative variables Qualitative variables
(Numerical) (Categorical)
Interval Nominal
Ratio Ordinal
24
Scales of measurement
• All measurements are not the same.
• Measuring weight = eg. 40kg
• Measuring the status of a patient on scale = “improved”,
“stable”, “not improved”.
• There are four types of scales of measurement.
25
Nominal
The simplest type of scale of measurment
unordered categories
Uses names, labels or symbols to assign each
measurement.
numbers used to represent categories
averages are meaningless; look at
frequency/proportion in each category
dichotomous e.g. gender: male = 1, female = 0
polytomous e.g. blood type: O = 1, A = 2, B = 3, AB = 4
• If nominal data take only two possible values, they are
called dichotomous or binary.
• E.g. sex is dichotomous (male or female).
• Yes/no questions
– E.g., Is the patient cured from TB at 6 months of Rx?
27
Example of nominal
Scale:
Race/Ethnicity: • The numbers have NO
1. Black meaning
2. White • They are labels only
3. Latino
4. Other
28
Ordinal
ordered categories
numbers used to represent categories
order matters; magnitude does not
differences between categories are meaningless
Example of ordinal scale:
• The numbers have
• Pain level:
1. None LIMITED meaning
2. Mild 4>3>2>1 is all we know
3. Moderate apart from their utility as
4. Severe labels
30
Interval
The differences between observational units is
equal
The zero point is arbitrary and does not infer the
absence of the property being measured
E.g. Degrees Fahrenheit
The difference between 30 and 40 is the same as that
between 70 and 80 degrees. But 80 is not twice as hot as 40.
E.g. years:
The difference between 1993-1994 is the same as 1995-1996,
but year 0 was not the beginning of time.
31
Ratio
The most detailed and objectively interpretable of
the measurement scales.
Interval scale with an absolute zero-it has a true
zero point (absence of property being measured)
as well as equal intervals
E.g. Height, weight, money, age, time, speed, class size,
the Kelvin scale of temperature
32
A measurement on a higher scale can be transformed into one on
a lower scale, but not vice versa.
E.g., weight of a child= 3000gm, (ratio scale)
weight of a child= under weight, normal, over weight (Ordinal scale)
Weight of a child= normal, not normal (nominal scale)
33
34
Interval
Ordinal
Nominal
Ratio
Degree of precision in measuring
Dependent vs. Independent Variable
Dependent: The variable (s) we measure as
the outcome of interest , or response
Independent: The variable that explains the
dependent variable (s), or explanatory/
predictor variable.
Eg. Parasitic infections Anemia
35
Cont…
Independent variables
Precede dependent variables in time
Are often manipulated by the researcher
The treatment or intervention that is used in a study
Dependent variables
What is measured as an outcome in a study
Values depend on the independent variable
36
Variable
Qualitative Quantitative
(Categorical) (Numerical)
Nominal Ordinal Discrete Continuous
e.g. ethnic e.g. response e.g. # of e.g. height
group to treatment admissions
37
Population and sample
Population: refers to any well defined groups of subjects/objects who
share common characteristics.
A group of people, institutions or items that have something
in common for which we wish to draw conclusions at a
particular time.
E.g., All TB patients in Ethiopia, all hospitals in Hawassa
Population is generally large & difficult to study all of them.
38
Population and sample…
Sample:
A small group or subset of a population which
about information is actually obtained.
Samples are used to describe & make
inferences about
the populations from which they arise
Statistical methods are based on these samples
Samples should be selected using a
suitable method so that it can be
representative (random sample)
39
A Sample:
Random sample
Subjects are selected from a population so that each
individual has an equal chance of being selected
Random samples are representative of the source
population
Non-random samples are not representative
May be biased regarding age, severity of the condition,
socioeconomic status etc
Parameter and statistic
Parameter:
A numerical descriptive measure derived from the
data of a population.
They exist but the specific value is unknown
Statistic: A descriptive measure computed from
the data of a sample.
41
Parameter and statistic….
• Since the population is usually large and is
not actually observed, the parameters are
considered unknown constants.
• Statistical inferential methods are used to
make inferences/statements concerning the
unknown parameters, based on sample data.
42
GOAL OF STATISTICS
Mekele University College of Health Sciences
3/26/2025 44
Department of Public Health: Biostatistics
Types of statistical methods
Descriptive statistics
Describe the data by summarizing them
Inferential statistics
Techniques, by which inferences are drawn for
the population parameters from the sample
statistics
OR
sample statistics observed are inferred to the
corresponding population parameters
45
Measurement Examples
• Severe injury
ordinal
• Raw score on a statistics • Low income
exam ordinal
interval • CD4 count
• Room temperature in ratio
Kelvin
ratio • Year of birth
interval
• Nationality of HU
students • IQ scores
nominal interval
46
Exercise
I. Classify the below variables as quantitative and qualitative and
write in bracket as nominal, ordinal, discrete or continuous
A. Number of female students in your class
B. Marital status: 1=married, 2=single, 3=widowed, 4=divorced
C. Prognosis: 1=very good, 2=good, 3=fair, 4=bad, 5=very bad
D. First temperature following admission (F⁰)
E. Received oral medications: 1=yes, 2= no
G. Weight of infant at birth (gm)
H. Type of disease: 1= chronic 2= acute
47
Thank You!!