Spring semester, 2020-2021
General information
Instructors:
A.Prof.Dr. Nguyen Tan Khoi: ntkhoi@hcmiu.edu.vn (8 weeks)
Ms. Do Ngoc Phuc Chau: dnpchau@hcmiu.edu.vn (7 weeks)
Course assessment: (decided by A.Prof. Khoi)
Labwork assessment: 30% quizzes
70% lab exam
(this is given by Ms. Chau)
Course outline
- Chapter 1: Introduction
- Chapter 2: Descriptive statistics
- Chapter 3: Probability & Distribution of probability
- Chapter 4: Continuous distribution of probability
- Chapter 5: Hypothesis testing
- Chapter 6: ANOVA
- Chapter 7: Regression and correlation analysis
- Chapter 8: Normality test
- Chapter 9: Non-parametric tests
INTRODUCTION
WHAT IS BIOSTATISTICS?
statistics used in biological fields
So, What is statistics?
the process of converting data into information.
consists of various steps like generation of hypothesis,
collection of data, and application of analysis methods.
Then, Biostatistics teaches us how to summarize, analyze, and
draw meaningful inferences from data that then lead to
confirmations of hypotheses that relates to biological problem.
CATEGORIES of STATISTICS
How many pairs of shoes does each
student in our class own?
Descriptive Statistics
Collect
Organize
Summarize
Display
Analyze
Inferential Statistics
Predict and forecast
values of population
parameters
Test hypotheses about
values of population
parameters
TYPES of DATA
Qualitative data Quantitative data
(Categorical or Nominal) (Measurable or Countable)
Examples are- • Discrete variable
Color • Continuous variable
Gender Examples are-
Level of agreement Temperatures
Salaries
Number students in a group
SCALES of MEASUREMENT
Nominal Scale – groups or classes
Ordinal Scale – order matters
Interval Scale – difference or distance matters – has arbitrary zero value
Ratio Scale – ratio matters – has a natural zero value
DISPLAYING DATA
Discrete variables
Pie chart
Bar chart
Line graph
Continuous variables
Histogram
Frequency polygon
Ogive (or Cumulative frequency graph)
Stem-and-Leaf diagram
Scatter plot
SAMPLE and POPULATION
A population – consists of the set
of all measurements for which the
investigator is interested
A sample – is a subset of the
measurements selected from the
population
A census – is a complete ?? Population vs. Census ??
enumeration of every item in a
population
POPULATION or SAMPLE
What is average height of IU students?
Population Sample
- all about 7,000 IU students - 300 students
Impossible Possible
Impractical Easy to archive
Too costly Cheaper
Take long time Faster
Sampling and Simple random sample
• Sampling from the population is often done randomly,
such that every possible sample of equal size (n) will have
an equal chance of being selected
• A sample selected in this way is called a simple random
sample or just a random sample
• A random sample allows chance to determine its elements
How can we make random sample of 300 students from IU students?
EXPERIMENT, SET and EVENT
Set and Complement of set
Intersecting of sets
Union of sets
Mutually exclusive or disjoint sets
Partitions
DESCRIPTIVE STATISTICS
SUMMARY MEASURES
Measures of Central Tendency
Median
Mode
Mean
Measures of Viability
Range
Interquartile range
Variance
Standard deviation
Other measures
Skewness
Kurtosis
Does the girl own more shoes than the boy?
MEASURES of CENTRAL TENDENCY
Median – middle value when sorted in order of magnitude
Mode – most frequently-occurring value
Mean – average
Arithmetic Mean or Average
Population Mean Sample Mean
MEASURES of VARIABILITY or DISPERSION
Range – difference between maximum and minimum values
Interquartile range (IQR) – difference between third and
first quartile
Variance – average of the squared deviations from the means
Population Variance Sample Variance
Standard deviation (SD) – square root of the variance
Population SD Sample SD
* Find the 80th percentile:
To find the 80th percentile, determine the data
point in position (n + 1) x P / 100 of the data set
Position (20 + 1) x 80 / 100 = 16.8
The 80th percentile is located at the 16.8th position
of the data set. The 16th observation in the ordered
set is 32 and the 17th observation is also 33.
The 80th percentile will lie at 0.8 between the 16th
and 17th values the value is 16th + (17th – 16th) *
0.8 = 32 + (33 – 32) * 0.8 = 32.8
Quartile – the percentage points that break down the ordered data set
into quarters
• The first quartile, Q1, or lower quartile is the 25th percentile – the
point below which lie ¼ of the data
• The second quartile, Q2, or middle quartile is the 50th percentile –
the point below which lie ½ of the data. This is also called the median
• The third quartile, Q3, or upper quartile is the 75th percentile – the
point below which lie ¾ of the data
Example: finding percentile
Find the 50th, 80th and 90th
percentiles of this data set.
*** Sorting data (by Excel):
???
???
*** Determine the data point:
To find the 50th percentile (Median),
determine the data point in position (n + 1)
x P / 100 of the data set
Position (20 + 1) x 50 / 100 = 10.5
In Excel, input =(20+1)*50/100
*** Determine the percentile value:
* Find the 50th percentile:
The 50th percentile is located at the 10.5th position
of the data set.
The 10th observation in the ordered set is 22 and
the 11th observation is also 22.
The 50th percentile will lie halfway between the
10th and 11th values (which are both 22 in this
case) the value is 22
* Find the 80th percentile:
To find the 80th percentile, determine the data
point in position (n + 1) x P / 100 of the data set
Position (20 + 1) x 80 / 100 = 16.8
The 80th percentile is located at the 16.8th position
of the data set. The 16th observation in the ordered
set is 32 and the 17th observation is also 33.
The 80th percentile will lie at 0.8 between the 16th
and 17th values the value is 16th + (17th – 16th) *
0.8 = 32 + (33 – 32) * 0.8 = 32.8
* Find the 90th percentile:
To find the 90th percentile, determine the data
point in position (n + 1) x P / 100 of the data set
Position (20 + 1) x 90 / 100 = 18.9
The 90th percentile is located at the 18.9th position
of the data set. The 18th observation in the ordered
set is 49 and the 19th observation is also 52.
The 90th percentile will lie at 0.9 between the 18th
and 19th values
the value is 49+(52–49)*0.9=51.7
Example: finding Quartile
The interquartile range ????
OTHER MEASURES
Skewness – measure of the degree of
asymmetry of a frequency distribution
•Skewed to left
•Symmetric or unskewed
•Skewed to right
Kurtosis – measure of flatness or
peakedness of a frequency distribution
•Platykurtic (relatively flat)
•Mesokurtic (normal)
•Leptokurtic (relatively peaked)
RELATIONS between the MEAN and S.D.
Chebyshev’s Theorem
•Applies to any distribution, regardless of shape
•Places lower limits on the percentages of observations within a
given number of standard deviations from the mean
Empirical Rule
•Applies only to roughly mound-shaped and symmetric
distributions
•Specifies approximate percentages of observations within a given
number of standard deviations from the mean
Chebyshev’s Theorem
Empirical Rule
Homework
Bulimic Adolescents Healthy Adolescents
15.9 17.0 18.9 30.6 40.8
16.0 17.6 19.6 25.7 37.4
16.5 28.7 21.5 25.3 37.1
18.9 28.0 24.1 24.5 30.6
18.4 25.6 23.6 20.7 33.2
18.1 25.2 22.9 22.4 33.7
30.9 25.1 21.6 23.1 36.6
29.2 24.5 23.8