0% found this document useful (0 votes)

46 views43 pages

Exercise Book

Uploaded by

anhvh2410113

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views43 pages

Exercise Book

Uploaded by

anhvh2410113

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.

vn)

EXERCISES BOOK
1. Important concepts of descriptive statistics:
Measures of Central Tendency: These measures indicate the central or typical value of a dataset.
The commonly used measures are the mean, median, and mode.
Measures of Dispersion: These measures quantify the spread or variability of a dataset. Common
measures include the range, variance, standard deviation, and interquartile range.
Percentiles: Percentiles divide a dataset into hundredths and provide information about the relative
position of a particular value within the dataset. The median represents the 50th percentile.
Boxplots: Boxplots provide a visual summary of the dataset's distribution, including the median,
quartiles, range, and any potential outliers.
Histograms: Histograms display the frequency distribution of a continuous variable by dividing it
into intervals or bins. They provide insights into the shape and spread of the data.
Normal Distribution: The normal distribution, also known as the Gaussian distribution, is a
symmetrical probability distribution frequently encountered in biostatistics. It is characterized by its
mean and standard deviation.
Z-Score: The z-score measures the number of standard deviations a particular observation is from
the mean. It is used to compare and standardize values across different distributions.
Confidence Intervals: Confidence intervals provide a range of values within which a population
parameter is likely to fall. They account for sampling variability and provide a measure of the
uncertainty associated with the estimate.
Correlation: Correlation measures the strength and direction of the linear relationship between two
variables. It is often used to assess the association between variables in biostatistical studies.
Scatter Plots: Scatter plots visualize the relationship between two continuous variables. They help
identify patterns, trends, and the nature of the association between variables.
Formula to calculate descriptive statistics
1.1. Mean: The mean is the sum of all values in a dataset divided by the number of observations.
Formula: Mean = (x₁ + x₂ + x₃ + ... + xₙ) / n
where x₁, x₂, x₃, ..., xₙ are the individual data points and n is the number of observations.
1.2. Median: The median is the middle value in an ordered dataset. If the dataset has an odd
number of observations, the median is the middle value. If the dataset has an even number
of observations, the median is the average of the two middle values.
Formula: Median = (n + 1) / 2
where n is the number of observations.
1.3. Mode: The mode is a measure of central tendency that represents the most frequently
occurring value in a dataset. It is the value that has the highest frequency or probability density.
To calculate the mode in biostatistics, you can use the following concept and formula:

1
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Concept: The mode corresponds to the value in the dataset that occurs with the highest frequency. It
represents the peak or most common value in the distribution.
Formula: For a discrete dataset, the mode can be found by simply identifying the value with the
highest frequency. If multiple values have the same highest frequency, the dataset is considered
multimodal, meaning it has multiple modes.
For a continuous dataset, the mode can be estimated by finding the peak of the probability density
function (PDF) or the highest point on the histogram.
It's important to note that not all datasets have a mode. Some datasets may have a uniform
distribution where each value occurs with equal frequency, resulting in no distinct mode.
1.4. Percentiles: Percentiles divide a dataset into hundredths and provide information about the
relative position of a particular value within the dataset.
Formula: Percentile = (P/100) * (n + 1)
where P is the desired percentile and n is the number of observations.
1.5. Variance: Variance measures the variability or dispersion of a dataset. It quantifies how
spread out the data points are from the mean.
Formula: Variance = Σ(xi - μ)² / n
where xi represents each data point, μ is the mean, and n is the number of observations.
1.6. Covariance: Covariance measures the directional relationship between two variables. It
indicates whether the variables move together (positive covariance) or in opposite directions
(negative covariance).
Formula: Cov(X,Y) = Σ((xi - μx) * (yi - μy)) / n
where xi and yi are data points, μx and μy are the means of the respective variables, and n is the number
of observations.
1.7. Standard Deviation: The standard deviation is the square root of the variance. It provides a
measure of the average distance between each data point and the mean.
Formula: Standard Deviation = √(Σ(xi - μ)² / n)
1.8. Z-Score: The z-score measures the number of standard deviations an observation is from the
mean. It is used to standardize values and compare them across different distributions.
Formula: Z = (x - μ) / σ
where x is the individual data point, μ is the mean, and σ is the standard deviation.
1.9. Confidence Intervals: Confidence intervals (CI) provide a range of values within which a
population parameter is likely to fall. The formula for constructing a confidence interval
depends on the distribution of the data and the desired level of confidence (e.g., 95%, 99%).
Formula (for a sample mean with known population standard deviation): CI = x̄ ± (Z * σ / √n)
where x̄ is the sample mean, Z is the Z-score corresponding to the desired confidence level, σ is the
population standard deviation, and n is the sample size.

2
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

1.10. Correlation Coefficient (Pearson's Correlation): The correlation coefficient measures

the strength and direction of the linear relationship between two variables. The value ranges
from -1 to +1, where -1 indicates a perfect negative correlation, +1 indicates a perfect positive
correlation, and 0 indicates no correlation.
Formula: r = Σ((xi - x̄) * (yi - ȳ)) / √(Σ(xi - x̄)² * Σ(yi - ȳ)²)
where xi and yi are data points, x̄ and ȳ are the means of the respective variables.
1.11. Discrete Probability Distribution: Binomial Distribution
Concept:
The binomial distribution describes the probability of obtaining a specific number of successes in a
fixed number of independent Bernoulli trials.
Formula:
P(X = x) = (nCx) * px * (1 - p)(n - x)
R Code Example:
# Required library
Tính xác suất mà thuốc có hoặc không có hiệu quả trên 7 người/10
library(ggplot2)

# Parameters
n <- 10 # Number of trials n = số lần
p <- 0.5 # Probability of success

# Probability calculation x = số bệnh nhân cần xác định

x <- 0:n
prob <- dbinom(x, size = n, prob = p)

# Bar plot
binom_plot <- ggplot(data.frame(x, prob), aes(x, prob)) +
geom_bar(stat = "identity", fill = "blue") +
labs(x = "Number of Successes", y = "Probability") +
ggtitle("Binomial Distribution") +
theme_minimal()

binom_plot

3
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

1.12. Continuous Probability Distribution: Normal Distribution

Concept:
The normal distribution is a continuous probability distribution with a symmetric bell-shaped curve.
Formula:
PDF: f(x) = (1 / (σ * √(2π))) * exp(-(x - μ)2 / (2σ2))
R Code Example:
# Required library
library(ggplot2)

# Parameters
mu <- 0 # Mean
sigma <- 1 # Standard deviation

# Probability calculation
x <- seq(-4, 4, by = 0.1)
density <- dnorm(x, mean = mu, sd = sigma)

# Density plot
normal_plot <- ggplot(data.frame(x, density), aes(x, density)) +
geom_line(color = "blue") +
labs(x = "x", y = "Density") +
ggtitle("Normal Distribution") +
theme_minimal()

normal_plot

1.13. Poisson Distribution

Concept:
The Poisson distribution models the probability of a certain number of events occurring within a fixed
interval of time or space.
Formula:
P(X = x) = (exp(-λ) * λx) / x!

4
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

R Code Example:
# Required library
library(ggplot2)
Binomial distribution —> cho một bộ dữ liệu —> phân bố như thế nào

# Parameter
lambda <- 3 # Average rate of events

# Probability calculation
x <- 0:10
prob <- dpois(x, lambda)

# Bar plot
poisson_plot <- ggplot(data.frame(x, prob), aes(x, prob)) +
geom_bar(stat = "identity", fill = "blue") +
labs(x = "Number of Events", y = "Probability") +
ggtitle("Poisson Distribution") +
theme_minimal()

poisson_plot

Note: These R code examples demonstrate how to calculate and visualize the probability
distributions in biostatistics using the corresponding functions from the stats package in R. The
plots provide visual representations of the distributions to better understand the probabilities
associated with different outcomes.
_______________________________
____________Problems___________
Calculation of descriptive statistics
Problem 1:
A researcher is studying the heights of a sample of 50 individuals. The heights (in centimeters) are as
follows: 165, 170, 168, 172, 160, 175, 163, 169, 171, 166, 173, 167, 169, 160, 174, 168, 172, 167,
165, 170, 169, 171, 167, 170, 175, 170, 168, 165, 172, 166, 170, 171, 168, 173, 165, 172, 169, 160,
171, 173, 167, 172, 170, 169, 165, 168, 173, 166, 170, 174, 168.
Compute the mean, median, mode, range, variance, and standard deviation of the heights.
5
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Suggestions: thi như tự luận —> viết câu lệnh chép ra giấy

• Mean: Add up all the heights and divide by the number of individuals (50 in this case).
• Median: Arrange the heights in ascending order and find the middle value. If there's an even
number of values, take the average of the two middle values.
• Mode: Identify the height(s) that appear(s) most frequently in the data.
• Range: Find the difference between the maximum and minimum heights.
• Variance: Calculate the average squared deviation from the mean. It measures the spread of
data.
• Standard Deviation: Take the square root of the variance. It provides a measure of the average
distance between each data point and the mean.
Problem 2:
A study examined the blood pressure readings (in mmHg) of 30 participants. The blood pressure
values are as follows: 120, 118, 122, 124, 130, 126, 128, 124, 120, 122, 124, 126, 128, 130, 132, 134,
136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162.
Calculate the five-number summary (minimum, lower quartile, median, upper quartile, maximum)
and construct a box plot of the blood pressure data.
Suggestions:
• Five-Number Summary: The minimum value, lower quartile (25th percentile), median (50th
percentile), upper quartile (75th percentile), and maximum value.
• Box Plot: Construct a graphical representation that displays the five-number summary. It helps
visualize the distribution of the data, including outliers and skewness.
Problem 3:
A researcher is investigating the enzyme activity levels of a sample of 25 specimens. The enzyme
activity values (in units per minute) are as follows: 10, 12, 15, 8, 14, 9, 13, 11, 16, 12, 10, 11, 13, 15,
9, 8, 14, 12, 16, 11, 13, 10, 9, 14, 12.
Calculate the mean, median, and range of the enzyme activity levels. Also, compute the interquartile
range (IQR) and construct a box plot. = 50% dữ liệu ở giữa —> từ khoảng 1/4 thứ nhất đến khoảng giữa
Suggestions:
• Interquartile Range (IQR): The difference between the upper quartile and the lower quartile.
It represents the spread of the middle 50% of the data.
• Box Plot: Similar to Problem 2, construct a box plot to visualize the data and observe any
potential outliers.
Problem 4:
A study measures the body mass index (BMI) of 40 participants. The BMI values are as follows: 22.5,
24.8, 25.2, 26.7, 27.1, 28.3, 29.6, 30.2, 31.4, 32.0, 25.9, 27.3, 28.1, 29.4, 30.8, 32.2, 33.0, 34.5, 35.1,
36.7, 25.1, 26.7, 27.5, 29.0, 30.4, 31.9, 33.1, 34.2, 35.7, 37.2, 26.1, 27.7, 28.9, 30.3, 31.6, 33.4, 34.8,
36.2, 37.9, 39.0.
Compute the mean, median, and standard deviation of the BMI values. Also, determine the z-score
for an individual with a BMI of 32.8.

6
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Suggestions:
• Mean: Calculate the average of the BMI values.
• Median: Find the middle value when the BMI values are arranged in ascending order.
• Standard Deviation: Measure the spread of the BMI values around the mean.
• Z-Score: Determine the standardized value by subtracting the mean from an individual's BMI
and dividing by the standard deviation. It indicates how many standard deviations an
individual's BMI is away from the mean.
Problem 5:
A researcher measures the reaction times (in milliseconds) of a sample of 20 participants. The reaction
times are as follows: 250, 260, 255, 270, 275, 280, 290, 295, 305, 310, 320, 315, 300, 280, 275, 270,
265, 255, 250, 245.
Calculate the mean, median, and variance of the reaction times. Additionally, compute the coefficient
of variation (CV) and interpret its meaning in the context of the data.
• Mean: Calculate the average of the reaction times.
• Median: Find the middle value when the reaction times are arranged in ascending order.
• Variance: Measure the spread of the reaction times around the mean.
• Coefficient of Variation (CV): Divide the standard deviation by the mean and multiply by 100.
It represents the relative variability of the data, allowing comparison between datasets with
different units of measurement.
Note: In each of these problems, you would apply various descriptive statistics measures to
summarize and analyze the given data. It's important to understand the context of the data and
interpret the results accordingly. Descriptive statistics provide summary measures that help describe
the central tendency, variability, and shape of the data distribution. These measures include mean,
median, mode, range, variance, standard deviation, quartiles, box plots, z-scores, and coefficient of
variation. Make sure to use the appropriate formulas and techniques to calculate these statistics
accurately.
Solutions
Problem 1:
# Heights data
heights <- c(165, 170, 168, 172, 160, 175, 163, 169, 171, 166, 173, 167,
169, 160, 174, 168, 172, 167, 165, 170, 169, 171, 167, 170, 175, 170,
168, 165, 172, 166, 170, 171, 168, 173, 165, 172, 169, 160, 171, 173,
167, 172, 170, 169, 165, 168, 173, 166, 170, 174, 168)

# Mean
mean_height <- mean(heights)

# Median
median_height <- median(heights)

7
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Mode
mode_height <- unique(heights[which.max(tabulate(match(heights,
unique(heights))))])

# Range
range_height <- max(heights) - min(heights)

# Variance
var_height <- var(heights)

# Standard Deviation
sd_height <- sd(heights)

# Visualize the heights data

hist(heights, main = "Height Distribution", xlab = "Height", ylab =
"Frequency")

Problem 2:
# Blood pressure data
blood_pressure <- c(120, 118, 122, 124, 130, 126, 128, 124, 120, 122,
124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150,
152, 154, 156, 158, 160, 162)

# Five-Number Summary
summary_stats <- summary(blood_pressure)

# Box Plot
boxplot(blood_pressure, main = "Blood Pressure Box Plot")

Problem 3:
# Enzyme activity data
enzyme_activity <- c(10, 12, 15, 8, 14, 9, 13, 11, 16, 12, 10, 11, 13,
15, 9, 8, 14, 12, 16, 11, 13, 10, 9, 14, 12)

# Mean
mean_activity <- mean(enzyme_activity)

8
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Median
median_activity <- median(enzyme_activity)

# Range
range_activity <- max(enzyme_activity) - min(enzyme_activity)

# Interquartile Range (IQR)

iqr_activity <- IQR(enzyme_activity)

# Box Plot
boxplot(enzyme_activity, main = "Enzyme Activity Box Plot")

Problem 4:
# BMI data
bmi <- c(22.5, 24.8, 25.2, 26.7, 27.1, 28.3, 29.6, 30.2, 31.4, 32.0,
25.9, 27.3, 28.1, 29.4, 30.8, 32.2, 33.0, 34.5, 35.1, 36.7, 25.1, 26.7,
27.5, 29.0, 30.4, 31.9, 33.1, 34.2, 35.7, 37.2, 26.1, 27.7, 28.9, 30.3,
31.6, 33.4, 34.8, 36.2, 37.9, 39.0)

# Mean
mean_bmi <- mean(bmi)

# Median
median_bmi <- median(bmi)

# Standard Deviation
sd_bmi <- sd(bmi)

# Z-Score
individual_bmi <- 32.8
z_score <- (individual_bmi - mean_bmi) / sd_bmi

# Visualize the BMI data

hist(bmi, main = "BMI Distribution", xlab = "BMI", ylab = "Frequency")
9
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Problem 5:
# Reaction times data
reaction_times <- c(250, 260, 255, 270, 275, 280, 290, 295, 305, 310,
320, 315, 300, 280, 275, 270, 265, 255, 250, 245)

# Mean
mean_reaction <- mean(reaction_times)

# Median
median_reaction <- median(reaction_times)

# Variance
var_reaction <- var(reaction_times)

# Coefficient of Variation (CV)

cv_reaction <- (sd(reaction_times) / mean_reaction) * 100

# Visualize the reaction times data

hist(reaction_times, main = "Reaction Times Distribution", xlab =
"Reaction Time", ylab = "Frequency")

Calculation of distribution and probability

Problem 6: Body Mass Index (BMI)
The distribution of BMI values in a population follows a normal distribution with a mean of 25 and
a standard deviation of 3. Suppose we want to find the probability that a randomly selected individual
has a BMI greater than 30.
Solution:
We need to calculate the area under the normal distribution curve to the right of BMI = 30. We can
use the standard normal distribution table or a statistical software to find the corresponding z-score.
Let's assume the z-score for BMI = 30 is 1.5 (obtained from the standard normal distribution table).
Using the z-score formula:
Z = (x - μ) / σ
1.5 = (30 - 25) / 3

10
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Now, we can find the probability using the z-score and the standard normal distribution table:
P(BMI > 30) = P(Z > 1.5)
From the table, we find that P(Z > 1.5) is approximately 0.0668 or 6.68%.
Therefore, the probability that a randomly selected individual has a BMI greater than 30 is
approximately 6.68%.
R codes
# Required libraries
library(ggplot2)
library(patchwork)

# Parameters
mean_bmi <- 25
sd_bmi <- 3

# Probability calculation tính pnorm cho cận trên

prob_bmi <- 1 - pnorm(30, mean = mean_bmi, sd = sd_bmi)

quy ve z --> pnorm = limit

# Visualization
x <- seq(10, 40, by = 0.1)
density <- dnorm(x, mean = mean_bmi, sd = sd_bmi)

# Density plot
density_plot <- ggplot(data.frame(x), aes(x)) +
geom_line(aes(y = density), color = "blue") +
geom_area(aes(y = density, fill = (x >= 30)), alpha = 0.3) +
labs(x = "BMI", y = "Density") +
ggtitle("Normal Distribution of BMI") +
theme_minimal()

# Probability plot
prob_plot <- ggplot() +
geom_bar(stat = "identity", data = data.frame(x = 1, prob_bmi), aes(x =
"", y = prob_bmi), fill = "blue") +
11
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

coord_polar(theta = "y") +
labs(x = "", y = "Probability") +
ggtitle("Probability of BMI > 30") +
theme_minimal()

# Combine plots
density_plot + prob_plot + plot_layout(ncol = 2)

Note: This code calculates the probability of BMI > 30 using the P norm function and visualizes the
normal distribution of BMI along with the probability as a density plot and a bar plot.

Problem 7: Drug Dosage

The distribution of blood plasma concentrations of a drug in a population follows a normal
distribution with a mean of 100 mg/L and a standard deviation of 10 mg/L. If the desired therapeutic
range for the drug concentration is between 90 mg/L and 110 mg/L, what percentage of the population
falls within this range?
Solution:
To find the percentage of the population within the desired therapeutic range, we need to calculate
the area under the normal distribution curve between 90 mg/L and 110 mg/L.
First, we need to standardize the values using the z-score formula:
Z1 = (90 - 100) / 10 = -1.0
Z2 = (110 - 100) / 10 = 1.0
Now, we can find the probability using the z-scores and the standard normal distribution table:
P(90 ≤ X ≤ 110) = P(-1.0 ≤ Z ≤ 1.0)
From the table, we find that P(-1.0 ≤ Z ≤ 1.0) is approximately 0.6826 or 68.26%.
Therefore, approximately 68.26% of the population falls within the desired therapeutic range of 90
mg/L to 110 mg/L for the drug concentration.
R codes:
# Required libraries
library(ggplot2)
standard
deviation = 10
# Parameters
mean_concentration <- 100 mean

sd_concentration <- 10

12
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Probability calculation
prob_range <- diff(pnorm(c(90, 110), mean = mean_concentration, sd =
sd_concentration))

# Visualization
x <- seq(70, 130, by = 0.1)
density <- dnorm(x, mean = mean_concentration, sd = sd_concentration)

# Density plot
density_plot <- ggplot(data.frame(x), aes(x)) +
geom_line(aes(y = density), color = "blue") +
geom_area(aes(y = density, fill = (x >= 90 & x <= 110)), alpha = 0.3) +
labs(x = "Concentration (mg/L)", y = "Density") +
ggtitle("Normal Distribution of Drug Concentration") +
theme_minimal()

# Probability plot
prob_plot <- ggplot() +
geom_bar(stat = "identity", data = data.frame(x = 1, prob_range), aes(x
= "", y = prob_range), fill = "blue") +
coord_polar(theta = "y") +
labs(x = "", y = "Probability") +
ggtitle("Probability of Concentration in Range (90-110)") +
theme_minimal()

# Combine plots
density_plot + prob_plot + plot_layout(ncol = 2)

Note: This code calculates the probability of the drug concentration falling within the range of 90
mg/L to 110 mg/L using the pnorm function and visualizes the normal distribution of the drug
concentration along with the probability as a density plot and a bar plot.

These examples illustrate how normal distribution and probability concepts can be applied in
biostatistics to solve problems related to various variables and parameters of interest, such as
BMI and drug dosage.
The R code examples demonstrate how to solve the given problems using appropriate functions
and visualize the data using ggplot2 for clear and informative plots.
13
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Problem 8: Binomial Distribution

A drug is known to cure a certain disease in 80% of cases. If a doctor treats 10 patients with the drug,
what is the probability that exactly 7 patients will be cured?
Solution:
This problem follows a binomial distribution with parameters n = 10 (number of patients) and p = 0.8
(probability of success, i.e., being cured). We need to calculate P(X = 7), where X represents the
number of cured patients. Using the binomial probability formula
P(X = x) = (nCx) * px * (1 - p)(n - x)
P(X = 7) = (10C7) * (0.87) * (1 - 0.8)(10 - 7) = 0.2013
R codes:
# Required library
library(ggplot2)

# Parameters
n <- 10 # Number of patients
p <- 0.8 # Probability of success (cured patients)

# Probability calculation
x <- 7
prob <- dbinom(x, size = n, prob = p)

# Print the probability

print(prob)

# Bar plot
binom_plot <- ggplot(data.frame(x, prob), aes(x, prob)) +
geom_bar(stat = "identity", fill = "blue") +
labs(x = "Number of Cured Patients", y = "Probability") +
ggtitle("Binomial Distribution") +
theme_minimal()

binom_plot

14
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Problem 9: In a clinical trial, the success rate of a new treatment for a specific disease is 60%. If 100
patients are treated with the new drug, what is the probability that at least 70 patients will respond
positively to the treatment?
Solution:
This problem follows a binomial distribution with parameters n = 100 (number of patients) and p =
0.6 (probability of success, i.e., positive response). We need to calculate P(X >= 70), where X
represents the number of patients responding positively. Using the binomial cumulative probability
function
P(X >= x) = 1 - P(X < x)
P(X >= 70) = 1 - P(X < 70)
= 1 - sum(dbinom(0:69, size = 100, prob = 0.6))
R codes:
# Required library
library(ggplot2)

# Parameters
n <- 100 # Number of patients
p <- 0.6 # Probability of success (positive response)

# Probability calculation
x <- 70:n
prob <- 1 - sum(dbinom(0:69, size = n, prob = p))

# Print the probability

print(prob)

# Bar plot
binom_plot <- ggplot(data.frame(x, prob), aes(x, prob)) +
geom_bar(stat = "identity", fill = "blue") +
labs(x = "Number of Positive Responses", y = "Probability") +
ggtitle("Binomial Distribution") +
theme_minimal()

binom_plot

Problem 10:
15
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

In a population, the prevalence of a certain disease is 10%. A diagnostic test for the disease has a
sensitivity of 80% and a specificity of 90%. If a randomly selected individual tests positive for the
disease, what is the probability that the individual actually has the disease? Solution:
This problem requires applying conditional probability and Bayes' theorem. Let's denote the
following:
D: The individual has the disease (event D)
P: The individual tests positive for the disease (event P)
We need to calculate P(D | P), i.e., the probability that the individual has the disease given that they
tested positive. Using Bayes' theorem:
P(D | P) = (P(P | D) * P(D)) / P(P) P(P | D) = Sensitivity = 0.80
P(D) = Prevalence = 0.10
P(P) = P(P | D) * P(D) + P(P | D') * P(D') P(P | D') = 1 - Specificity = 1 - 0.90 = 0.10
P(D') = 1 - P(D) = 0.90
Plugging in the values: P(D | P) = (0.80 * 0.10) / ((0.80 * 0.10) + (0.10 * 0.90))
R codes:
# Parameters
prevalence <- 0.10
sensitivity <- 0.80
specificity <- 0.90

# Probability calculation
p_positive <- (sensitivity * prevalence) / ((sensitivity * prevalence) +
(1 - specificity) * (1 - prevalence))

# Print the probability

print(p_positive)

Problems 11: Poisson Distribution

The number of bacteria in a water sample follows a Poisson distribution with an average rate of 5
bacteria per milliliter. What is the probability that there are exactly 3 bacteria in a 1-milliliter sample?
Solution:
This problem follows a Poisson distribution with parameter λ = 5 (average rate of bacteria per
milliliter). We need to calculate P(X = 3), where X represents the number of bacteria in a 1-milliliter
sample. Using the Poisson probability formula
P(X = x) = (exp(-λ) * λ^x) / x!
16
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

P(X = 3) = (exp(-5) * 5^3) / 3!

= 0.1404
Therefore, the probability that there are exactly 3 bacteria in a 1-milliliter sample is approximately
0.1404.
R codes:
# Required library
library(ggplot2)

# Parameter
lambda <- 5 # Average rate of bacteria per milliliter

# Probability calculation
x <- 3
prob <- dpois(x, lambda)

# Print the probability

print(prob)

# Bar plot
poisson_plot <- ggplot(data.frame(x, prob), aes(x, prob)) +
geom_bar(stat = "identity", fill = "blue") +
labs(x = "Number of Bacteria", y = "Probability") +
ggtitle("Poisson Distribution") +
theme_minimal()

poisson_plot

Problem 12:
The number of heart attacks occurring in a particular city follows a Poisson distribution with an
average rate of 2 heart attacks per day. What is the probability that there are more than 3 heart attacks
in a given day?
Solution:
This problem follows a Poisson distribution with parameter λ = 2 (average rate of heart attacks per
day). We need to calculate P(X > 3), where X represents the number of heart attacks in a day. Using
the Poisson cumulative probability function
P(X > x) = 1 - P(X <= x)
17
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

P(X > 3) = 1 - sum(dpois(0:3, lambda = 2))

R codes
# Required library
library(ggplot2)

# Parameter
lambda <- 2 # Average rate of heart attacks per day

# Probability calculation
x <- 4:20
prob <- 1 - sum(dpois(0:3, lambda = lambda))

# Print the probability

print(prob)

# Bar plot
poisson_plot <- ggplot(data.frame(x, prob), aes(x, prob)) +
geom_bar(stat = "identity", fill = "blue") +
labs(x = "Number of Heart Attacks", y = "Probability") +
ggtitle("Poisson Distribution") +
theme_minimal()

poisson_plot

Problem 14: Probability

In a population, the prevalence of a certain genetic disorder is 0.02. If a random individual is selected,
what is the probability that they have the disorder?
Solution:
The problem involves calculating the probability of an event occurring, given the prevalence of the
disorder. P(Having the disorder) = 0.02 Therefore, the probability that a random individual has the
genetic disorder is 0.02.
R codes:
# Probability calculation
probability <- 0.02

18
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Print the probability

print(probability)

Problem 15:
In a population, the prevalence of a certain genetic mutation is 0.05. If two individuals are randomly
selected, what is the probability that both individuals have the mutation?
Solution:
The problem involves calculating the probability of both events (individuals having the mutation)
occurring, given the prevalence of the mutation. P(Both have the mutation) = 0.05 * 0.05.
R codes:
# Probability calculation
probability <- 0.05 * 0.05

# Print the probability

print(probability)
Problem 16:
A diagnostic test for a certain disease has a false positive rate of 5% and a false negative rate of 10%.
If a randomly selected individual tests positive for the disease, what is the probability that the
individual does not have the disease?
Solution:
This problem requires applying conditional probability. Let's denote the following:
D: The individual has the disease (event D)
P: The individual tests positive for the disease (event P)
We need to calculate P(D' | P), i.e., the probability that the individual does not have the disease given
that they tested positive.
P(D' | P) = (P(P | D') * P(D')) / P(P) P(P | D') = False Positive Rate = 0.05
P(D') = 1 - Prevalence = 1 - 0.10 = 0.90
P(P) = P(P | D) * P(D) + P(P | D') * P(D')
P(P | D) = 1 - False Negative Rate = 1 - 0.10 = 0.90
P(D) = Prevalence = 0.10
Plugging in the values: P(D' | P) = (0.05 * 0.90) / ((0.90 * 0.10) + (0.05 * 0.90))
R codes:
# Parameters
prevalence <- 0.10

19
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

false_positive <- 0.05

false_negative <- 0.10

# Probability calculation
p_not_disease <- (false_positive * (1 - prevalence)) /
((false_positive * (1 - prevalence)) + (false_negative *
prevalence))

# Print the probability

print(p_not_disease)

20
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

2. Statistic analysis
1. Student's t-test: The t-test is a parametric test that compares the means of two groups. It
calculates a t-value, which represents the difference between the means relative to the
variability within the groups. The t-value is compared to a critical value from the t-distribution
to determine if the difference is statistically significant. The independent samples t-test is used
when the two groups are independent, while the paired samples t-test is used when the samples
are related or matched.
a. Problem 1
A researcher wants to compare the effectiveness of two different cholesterol-lowering drugs (Drug A
and Drug B). They randomly assign 30 participants to receive either Drug A or Drug B for a period
of 12 weeks. After the treatment, they measure the participants' cholesterol levels. The data are as
follows:
Drug A: 180, 195, 200, 185, 190, 205, 195, 180, 200, 195, 190, 185, 200
Drug B: 175, 185, 190, 165, 170, 180, 185, 170, 190, 195, 180, 175, 185
Is there a significant difference between the mean cholesterol levels of the two drugs? Use a
significance level of 0.05.
Suggestion:
In this problem, the researcher wants to compare the effectiveness of two different cholesterol-
lowering drugs (Drug A and Drug B) by measuring the participants' cholesterol levels. The data
provided for Drug A and Drug B are the cholesterol level measurements for each group. To determine
if there is a significant difference between the mean cholesterol levels of the two drugs, you would
perform an independent samples t-test.
To conduct the t-test, you would calculate the t-value using the formula:
t = (mean of Drug A - mean of Drug B) / sqrt[(squared deviation of Drug A / sample size of Drug A)
+ (squared deviation of Drug B / sample size of Drug B)]
Once you have calculated the t-value, you would compare it to the critical value from the t-distribution
for the given significance level (0.05 in this case). If the calculated t-value exceeds the critical value,
you would conclude that there is a significant difference between the mean cholesterol levels of the
two drugs.
R codes:
# Cholesterol data
drug_A <- c(180, 195, 200, 185, 190, 205, 195, 180, 200, 195, 190, 185,
200)
drug_B <- c(175, 185, 190, 165, 170, 180, 185, 170, 190, 195, 180, 175,
185)

# Independent samples t-test

21
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

result <- t.test(drug_A, drug_B, alternative = "two.sided", var.equal =

TRUE)

# Print the results

print(result)

b. Problem 2
A study aims to investigate the effect of a new exercise program on blood pressure. A sample of 25
participants is randomly assigned to either the exercise group or the control group. After 8 weeks,
their systolic blood pressure readings are recorded. The data are as follows:
Exercise group: 130, 125, 135, 140, 132, 128, 127, 130, 133, 135
Control group: 135, 140, 145, 138, 142, 140, 150, 136, 140, 143
Is there a significant difference in the mean systolic blood pressure between the exercise group and
the control group? Use a significance level of 0.01.
Suggestion:
In this problem, the study aims to investigate the effect of a new exercise program on systolic blood
pressure. The participants are divided into an exercise group and a control group, and their systolic
blood pressure readings are recorded. To determine if there is a significant difference in the mean
systolic blood pressure between the two groups, you would perform an independent samples t-test.
Similar to Problem 1, you would calculate the t-value using the formula mentioned earlier and
compare it to the critical value from the t-distribution for the given significance level (0.01 in this
case). If the calculated t-value exceeds the critical value, you would conclude that there is a significant
difference in the mean systolic blood pressure between the exercise group and the control group.
R codes:
# Blood pressure data
exercise_group <- c(130, 125, 135, 140, 132, 128, 127, 130, 133, 135)
control_group <- c(135, 140, 145, 138, 142, 140, 150, 136, 140, 143)

# Independent samples t-test

result <- t.test(exercise_group, control_group, alternative = "two.sided",
var.equal = TRUE)

# Print the results

print(result)

c. Problem 3:
A researcher investigates the effect of a new drug on pain relief. They randomly assign 18 patients to
receive either the new drug or a placebo. After a specified period, the patients rate their pain levels
on a scale of 1-10 (higher values indicating more pain). The data are as follows:
22
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

New drug: 4, 5, 3, 6, 4, 5, 3
Placebo: 6, 7, 5, 7, 6, 8, 7
Is there a significant difference in the mean pain levels between the patients who received the new
drug and those who received the placebo? Use a significance level of 0.05.
Suggestions:
In this problem, the researcher investigates the effect of a new drug on pain relief. The patients are
randomly assigned to receive either the new drug or a placebo, and they rate their pain levels on a
scale of 1-10. To determine if there is a significant difference in the mean pain levels between the two
groups, you would perform an independent samples t-test.
Using the provided data, you would calculate the t-value and compare it to the critical value from the
t-distribution for the given significance level (0.05 in this case). If the calculated t-value exceeds the
critical value, you would conclude that there is a significant difference in the mean pain levels
between the patients who received the new drug and those who received the placebo.
These examples illustrate the application of the independent samples t-test in comparing the means
of two groups. By calculating the t-value and comparing it to the critical value, you can determine if
the observed differences in the data are statistically significant or if they could have occurred by
chance. The t-test is a commonly used statistical test for evaluating group differences in various
research studies.
R codes:
# Pain level data
new_drug <- c(4, 5, 3, 6, 4, 5, 3)
placebo <- c(6, 7, 5, 7, 6, 8, 7)

# Independent samples t-test

result <- t.test(new_drug, placebo, alternative = "two.sided", var.equal
= TRUE)

# Print the results

print(result)

Important notes:
• Data Preparation:
Before running the t-test, make sure you have the data properly formatted. In the provided examples,
the data are stored in separate vectors (drug_A, drug_B, exercise_group, control_group, new_drug,
placebo).
Ensure that the data are numerical and correspond to the appropriate groups or conditions you want
to compare.

23
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

• t-test Function:
The t.test() function in R performs the independent samples t-test.
The first argument of the function corresponds to the data for the first group, and the second argument
corresponds to the data for the second group.
The alternative argument specifies the alternative hypothesis and can be set to "two.sided" (default),
"less" (for a lower-tailed test), or "greater" (for an upper-tailed test).
The var.equal argument specifies whether to assume equal variances between the groups (TRUE) or
not (FALSE). Setting it to TRUE assumes equal variances, while setting it to FALSE performs a
Welch's t-test, which does not assume equal variances.
• Result Interpretation:
The t.test() function returns a list of results that includes the t-value, degrees of freedom, p-value, and
confidence interval for the difference in means.
The t-value represents the test statistic, which measures the difference between the sample means
relative to the variability within the groups. A larger absolute t-value indicates a more significant
difference.
The p-value indicates the probability of obtaining the observed difference (or a more extreme
difference) assuming the null hypothesis is true. A p-value below the chosen significance level
indicates statistical significance.
The confidence interval provides a range of plausible values for the true difference in means, with the
chosen level of confidence.
• Result Printing:
The print() function is used to display the result of the t-test.
By default, the output includes the t-value, degrees of freedom, p-value, and the confidence interval.
You can customize the output by accessing the individual elements of the result list. For example,
result$p.value will give you only the p-value.

2. Analysis of Variance (ANOVA): ANOVA is a parametric test used to compare the means of
three or more groups. It determines if there are significant differences among the means by
analyzing the variation between groups and within groups. ANOVA calculates an F-value,
which compares the between-group variation to the within-group variation. The F-value is
compared to a critical value from the F-distribution to determine if the group differences are
statistically significant.
Problem 4: ANOVA Example
A researcher wants to compare the mean blood pressure levels among three different treatment groups
(A, B, and C). The data collected are as follows:
Group A: 130, 135, 140, 145
Group B: 125, 130, 135, 140

24
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Group C: 120, 125, 130, 135

Perform a one-way ANOVA to determine if there are any significant differences in the mean blood
pressure levels among the treatment groups. Use a significance level of 0.05.
Solution:
To perform a one-way ANOVA, we use the F-test to compare the variability between groups to the
variability within groups. The null hypothesis is that there are no differences in the means of the
treatment groups.
R Codes:
# Blood pressure data
groupA <- c(130, 135, 140, 145)
groupB <- c(125, 130, 135, 140)
groupC <- c(120, 125, 130, 135)

# Perform one-way ANOVA

result <- aov(c(groupA, groupB, groupC) ~ factor(rep(c("A", "B", "C"),
each = 4)))

# Summary of ANOVA
print(summary(result))

# Post hoc test (Tukey's HSD)

posthoc <- TukeyHSD(result)
print(posthoc)

Data Visualization (Box plots)

# Box plots Example
# Cholesterol data
groupA <- c(130, 135, 140, 145)
groupB <- c(125, 130, 135, 140)
groupC <- c(120, 125, 130, 135)

# Combine data into a single vector

data <- c(groupA, groupB, groupC)

25
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Create factor variable for groups

groups <- factor(rep(c("A", "B", "C"), each = 4))

# Box plot
boxplot(data ~ groups, xlab = "Groups", ylab = "Cholesterol Levels",
main = "Cholesterol Levels by Group")
Problem 5: Drug Efficacy Study
A pharmaceutical company is testing the efficacy of three different drugs (A, B, and C) for treating a
specific condition. They randomly assign 50 patients into three groups: Group A receives Drug A,
Group B receives Drug B, and Group C receives Drug C. After a certain treatment period, the patients'
symptom scores are measured. The company wants to determine if there are any significant
differences in the mean symptom scores among the three drug groups.
Solution
After performing the one-way ANOVA, you will obtain a summary table that provides information
on the between-group variability (sums of squares, degrees of freedom, mean squares) and the within-
group variability (residual sum of squares, degrees of freedom, mean squares). The F-statistic and its
corresponding p-value are also reported. If the p-value is below the chosen significance level (e.g.,
0.05), you can conclude that there are significant differences among the groups.
Post hoc tests, such as Tukey's HSD, can be performed to determine which specific groups differ
significantly from each other. The post hoc test results will provide confidence intervals and p-values
for pairwise group comparisons.
R codes:
# Drug Efficacy Study Example
# Symptom scores
groupA <- c(3, 4, 2, 5, 3, 4, 3, 2, 4, 5)
groupB <- c(2, 3, 2, 4, 3, 2, 1, 3, 4, 2)
groupC <- c(4, 5, 5, 4, 3, 5, 3, 4, 4, 3)

# Perform one-way ANOVA

result <- aov(c(groupA, groupB, groupC) ~ factor(rep(c("A", "B", "C"),
each = 10)))

# Summary of ANOVA
print(summary(result))

# Post hoc test (Tukey's HSD)

posthoc <- TukeyHSD(result)

26
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

print(posthoc)

Visualizing the data:

Data visualization is an essential aspect of data analysis.
Here's an example of how you can visualize the data using box plots in R:
# Box plot Example
# Symptom scores
groupA <- c(3, 4, 2, 5, 3, 4, 3, 2, 4, 5)
groupB <- c(2, 3, 2, 4, 3, 2, 1, 3, 4, 2)
groupC <- c(4, 5, 5, 4, 3, 5, 3, 4, 4, 3)

# Combine data into a single vector

data <- c(groupA, groupB, groupC)

# Create factor variable for groups

groups <- factor(rep(c("A", "B", "C"), each = 10))

# Box plot
boxplot(data ~ groups, xlab = "Groups", ylab = "Symptom Scores", main =
"Symptom Scores by Group")

Problem 6: Drug Dosage Comparison

A pharmaceutical company is comparing the effectiveness of three different dosages (Low, Medium,
High) of a drug in reducing blood pressure. They randomly assign 60 patients into the three dosage
groups. After a certain treatment period, the patients' blood pressure levels are recorded. The company
wants to determine if there are any significant differences in the mean blood pressure levels among
the three dosage groups.
Solution:
In Example 2, we have three dosage groups: Low, Medium, and High. We want to determine if there
are any significant differences in the mean blood pressure levels among these groups.
To solve this, we perform a one-way ANOVA and examine the p-value. If the p-value is below the
chosen significance level (e.g., 0.05), we can conclude that there are significant differences among
the dosage groups. Additionally, we can conduct post hoc tests, such as Tukey's HSD, to determine
which specific groups differ significantly from each other.
R codes:
# Drug Dosage Comparison Example
# Blood pressure levels
low_dosage <- c(120, 122, 118, 125, 123, 120, 116, 118, 122, 119)
27
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

medium_dosage <- c(130, 128, 132, 135, 129, 136, 130, 133, 137,
131)
high_dosage <- c(140, 138, 142, 145, 139, 136, 140, 142, 141, 144)

# Perform one-way ANOVA

result <- aov(c(low_dosage, medium_dosage, high_dosage) ~
factor(rep(c("Low", "Medium", "High"), each = 10)))

# Summary of ANOVA
print(summary(result))

# Post hoc test (Tukey's HSD)

posthoc <- TukeyHSD(result)
print(posthoc)
Visualization of the data using box plots:
# Box plot Example
# Blood pressure levels
low_dosage <- c(120, 122, 118, 125, 123, 120, 116, 118, 122, 119)
medium_dosage <- c(130, 128, 132, 135, 129, 136, 130, 133, 137, 131)
high_dosage <- c(140, 138, 142, 145, 139, 136, 140, 142, 141, 144)

# Combine data into a single vector

data <- c(low_dosage, medium_dosage, high_dosage)

# Create factor variable for dosage groups

groups <- factor(rep(c("Low", "Medium", "High"), each = 10))

# Box plot
boxplot(data ~ groups, xlab = "Dosage Groups", ylab = "Blood Pressure
Levels", main = "Blood Pressure Levels by Dosage Group")

Note:
This code will generate a box plot visualizing the distribution of blood pressure levels for each dosage
group. The x-axis represents the dosage groups (Low, Medium, High), while the y-axis represents the
blood pressure levels. The plot title and axis labels can be customized to suit your specific needs.

28
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

By examining the box plots, you can visually compare the central tendency and dispersion of blood
pressure levels among the dosage groups. This visualization can help identify any potential
differences or patterns in the data.
Problem 7: Two-Way ANOVA Example
A study is conducted to investigate the effects of two factors, diet type (A, B, C) and exercise intensity
(low, medium, high), on cholesterol levels. The data collected are as follows:
Diet A, Low Exercise: 150, 155, 160
Diet A, Medium Exercise: 140, 145, 150
Diet A, High Exercise: 130, 135, 140
Diet B, Low Exercise: 160, 165, 170
Diet B, Medium Exercise: 150, 155, 160
Diet B, High Exercise: 140, 145, 150
Diet C, Low Exercise: 170, 175, 180
Diet C, Medium Exercise: 160, 165, 170
Diet C, High Exercise: 150, 155, 160
Perform a two-way ANOVA to determine if there are any significant effects of diet type, exercise
intensity, or their interaction on cholesterol levels. Use a significance level of 0.05.
Solution:
To perform a two-way ANOVA, we examine the effects of two factors (diet type and exercise
intensity) and their interaction on the outcome (cholesterol levels).
R codes:
# Cholesterol data
diet <- rep(c("A", "B", "C"), each = 3, times = 3)
exercise <- rep(c("Low", "Medium", "High"), each = 9)
cholesterol <- c(150, 155, 160, 140, 145, 150, 130, 135, 140, 160, 165,
170, 150, 155, 160, 140, 145, 150, 170, 175, 180, 160, 165, 170, 150, 155,
160)

# Perform two-way ANOVA

result <- aov(cholesterol ~ diet + exercise + diet:exercise)

# Summary of ANOVA
print(summary(result))

# Post hoc test (Tukey's HSD)

29
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

posthoc <- TukeyHSD(result)

print(posthoc)

Problem 8: Drug Efficacy Study with Gender

A pharmaceutical company is testing the efficacy of a drug on a specific condition, considering both
the drug type (A, B, C) and gender (Male, Female) as factors. They randomly assign patients to
different drug groups and record their symptom scores. The company wants to determine if there are
any significant differences in the mean symptom scores considering both the drug type and gender.
Solution:
To solve this, we perform a two-way ANOVA to analyze the effects of both the drug type and gender
on the symptom scores. We examine the main effects of drug type and gender, as well as the
interaction effect between drug type and gender.
R codes:
# Drug Efficacy Study Example with Gender
# Symptom scores
drugA_male <- c(3, 4, 2, 5, 3)
drugB_male <- c(2, 3, 2, 4, 3)
drugC_male <- c(4, 5, 5, 4, 3)
drugA_female <- c(4, 3, 2, 3, 4)
drugB_female <- c(3, 2, 1, 3, 4)
drugC_female <- c(5, 3, 4, 4, 3)

# Combine data into a single vector

data <- c(drugA_male, drugB_male, drugC_male, drugA_female, drugB_female,
drugC_female)

# Create factor variables for drug type and gender

drug <- factor(rep(c("A", "B", "C"), each = 5))
gender <- factor(rep(c("Male", "Female"), each = 15))

# Two-way ANOVA
result <- aov(data ~ drug * gender)

# Interaction plot
interaction.plot(drug, gender, data, xlab = "Drug", ylab = "Symptom
Scores", legend = TRUE)

30
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Note: This code will generate an interaction plot that shows the interaction effect of drug type and
gender on symptom scores. The x-axis represents the drug type, the lines represent the gender (Male,
Female), and the y-axis represents the symptom scores. The plot helps us visualize how the
relationship between drug type and symptom scores differs across gender groups.
Problem 9: Drug Dosage Study with Time Points
A pharmaceutical company is comparing the efficacy of different dosages (Low, Medium, High) of a
drug for treating a condition, considering multiple time points (Baseline, Week 4, Week 8) as factors.
They measure the symptom scores at each time point for patients in each dosage group. The company
wants to determine if there are any significant differences in the mean symptom scores considering
both the dosages and time points.
Solution:
To solve this, we perform a two-way ANOVA to analyze the effects of both the dosage and time points
on the symptom scores. We examine the main effects of dosage and time points, as well as the
interaction effect between dosage and time points.
R codes:
# Drug Dosage Study Example with Time Points
# Symptom scores
low_baseline <- c(3, 4, 2, 5, 3)
low_week4 <- c(2, 3, 2, 4, 3)
low_week8 <- c(4, 5, 5, 4, 3)
medium_baseline <- c(4, 3, 2, 3, 4)
medium_week4 <- c(3, 2, 1, 3, 4)
medium_week8 <- c(5, 3, 4, 4, 3)
high_baseline <- c(2, 3, 2, 4, 3)
high_week4 <- c(4, 5, 5, 4, 3)
high_week8 <- c(3, 2, 1, 3, 4)

# Combine data into a single vector

data <- c(low_baseline, low_week4, low_week8,
medium_baseline, medium_week4, medium_week8,
high_baseline, high_week4, high_week8)

# Create factor variables for dosages and time points

dosage <- factor(rep(c("Low", "Medium", "High"), each = 15))
time <- factor(rep(c("Baseline", "Week 4", "Week 8"), each = 5))

31
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Two-way ANOVA
result <- aov(data ~ dosage * time)

# Line plot
interaction.plot(dosage, time, data, xlab = "Dosage", ylab = "Symptom
Scores", legend = TRUE, type = "b")

Note: This code will generate a line plot that shows the interaction effect of dosage and time points
on symptom scores. The x-axis represents the dosages, the lines represent the time points (Baseline,
Week 4, Week 8), and the y-axis represents the symptom scores. The plot helps us visualize how the
symptom scores change over time for each dosage group and how the relationship between dosage
and symptom scores differs across time points.
Remember to customize the plot labels and titles as needed to suit your specific analysis!
Problem 10: Vaccine Efficacy Study with Ethnicity
A research team is studying the efficacy of a new vaccine for preventing a certain disease, considering
both the vaccine type (A, B, C) and ethnicity (Asian, Black, White) as factors. They administer the
different vaccines to individuals from different ethnic backgrounds and record the presence or absence
of the disease. The team wants to determine if there are any significant differences in the disease
incidence considering both the vaccine type and ethnicity.
R codes:
# Vaccine Efficacy Study Example with Ethnicity
# Disease incidence
vaccineA_asian <- c(10, 5, 7, 12, 8)
vaccineB_asian <- c(8, 4, 6, 9, 7)
vaccineC_asian <- c(6, 3, 5, 7, 5)
vaccineA_black <- c(9, 6, 7, 11, 9)
vaccineB_black <- c(7, 5, 6, 8, 6)
vaccineC_black <- c(5, 4, 4, 6, 5)
vaccineA_white <- c(11, 7, 8, 14, 10)
vaccineB_white <- c(9, 6, 7, 10, 8)
vaccineC_white <- c(7, 5, 6, 8, 6)

# Combine data into a single vector

data <- c(vaccineA_asian, vaccineB_asian, vaccineC_asian, vaccineA_black,
vaccineB_black, vaccineC_black, vaccineA_white, vaccineB_white,
vaccineC_white)

# Create factor variables for vaccine type and ethnicity

32
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

vaccine <- factor(rep(c("A", "B", "C"), each = 15))

ethnicity <- factor(rep(c("Asian", "Black", "White"), each = 5))

# Two-way ANOVA
result <- aov(data ~ vaccine * ethnicity)

# Interaction plot
interaction.plot(vaccine, ethnicity, data, xlab = "Vaccine", ylab =
"Disease Incidence", legend = TRUE)

Problem 11: Clinical Trial with Treatment and Gender

A clinical trial is conducted to evaluate the effectiveness of two different treatments (Treatment A,
Treatment B) for a specific medical condition, considering both the treatment type and gender as
factors. Patients are randomly assigned to one of the treatment groups, and their treatment outcomes
are recorded. The researchers want to determine if there are any significant differences in the
treatment outcomes considering both the treatment type and gender.
Solution:
For each example, you would perform a two-way ANOVA to analyze the effects of both factors on
the outcome variable (e.g., symptom scores, disease incidence, treatment outcomes). You would
examine the main effects of each factor (drug type, age group, vaccine type, ethnicity, treatment type,
gender) and the interaction effect between the factors.
R codes:
# Clinical Trial Example with Treatment and Gender
# Treatment outcomes
treatmentA_male <- c(3, 4, 2, 5, 3)
treatmentB_male <- c(2, 3, 2, 4, 3)
treatmentA_female <- c(4, 3, 2, 3, 4)
treatmentB_female <- c(3, 2, 1, 3, 4)

# Combine data into a single vector

data <- c(treatmentA_male, treatmentB_male, treatmentA_female,
treatmentB_female)

# Create factor variables for treatment and gender

treatment <- factor(rep(c("A", "B"), each = 10))
gender <- factor(rep(c("Male", "Female"), each = 5))

33
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Two-way ANOVA
result <- aov(data ~ treatment * gender)

# Interaction plot
interaction.plot(treatment, gender, data, xlab = "Treatment", ylab =
"Treatment Outcomes", legend = TRUE)

Note: Please note that in each example, you need to replace the data vectors with your actual data,
adjust the factor levelsas needed, and customize the plot labels and titles according to your specific
study.
The R code for visualizing these examples would depend on the specific nature of the data and the
desired visualization technique. Some common visualization options for two-way ANOVA include
interaction plots, bar plots, heatmaps, or scatter plots.
Here's a general template for visualizing a two-way ANOVA using an interaction plot in R:
# Load necessary libraries
library(ggplot2)

# Create a data frame with your data

df <- data.frame(
Factor1 = factor(rep(c("Level1", "Level2", "Level3"), each = n)),
Factor2 = factor(rep(c("LevelA", "LevelB", "LevelC"), n)),
Outcome = c(outcome_values)
)

# Perform the two-way ANOVA

result <- aov(Outcome ~ Factor1 * Factor2, data = df)

# Create an interaction plot

ggplot(df, aes(x = Factor1, y = Outcome, color = Factor2, group = Factor2))
+
geom_line() +
geom_point() +
labs(x = "Factor 1", y = "Outcome", color = "Factor 2") +
theme_bw()

Remember to replace "Factor1", "Factor2", and "Outcome" with the appropriate variable names from
your dataset, and customize the plot labels and titles as needed.

34
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

These examples and the provided R code offer a starting point for analyzing and visualizing two-way
ANOVA data in biostatistics. You can adapt them to fit your own datasets and research questions,
exploring different visualization techniques based on the nature of your data and the specific
hypotheses you want to investigate.
3. Chi-square test: The chi-square test is a non-parametric test used to analyze the association
between two categorical variables. It compares the observed frequencies in each category to
the expected frequencies under the assumption of independence. The test calculates a chi-
square statistic, which measures the overall discrepancy between observed and expected
frequencies. The chi-square statistic is compared to a critical value from the chi-square
distribution to determine if the association is statistically significant.
4. Pearson correlation coefficient: The Pearson correlation coefficient, denoted as "r,"
measures the strength and direction of the linear relationship between two continuous
variables. It ranges from -1 to +1, where -1 represents a perfect negative linear relationship,
+1 represents a perfect positive linear relationship, and 0 represents no linear relationship. The
correlation coefficient is estimated by calculating the covariance between the variables
divided by the product of their standard deviations.
5. Simple linear regression: Simple linear regression is a parametric model that examines the
relationship between a dependent variable and one independent variable. It assumes a linear
relationship and estimates the slope and intercept of the regression line. The regression line
represents the best-fit line that minimizes the sum of squared differences between the observed
and predicted values. The model can be used to predict the values of the dependent variable
based on the independent variable.
General solution:
For each problem, you would perform a simple linear regression analysis using appropriate statistical
software (such as R, Python, or SPSS). The analysis would involve fitting a regression line to the data
and assessing the significance of the relationship between the independent variable and dependent
variable.
Here's a general template for performing a simple linear regression analysis using R:
# Load necessary libraries
library(ggplot2)

# Create a data frame with your data

df <- data.frame(
Independent_Variable = c(independent_variable_values),
Dependent_Variable = c(dependent_variable_values)
)

# Perform simple linear regression

model <- lm(Dependent_Variable ~ Independent_Variable, data = df)

35
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Print the regression coefficients and statistical information

summary(model)

# Visualize the regression line

ggplot(df, aes(x = Independent_Variable, y = Dependent_Variable)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Independent Variable", y = "Dependent Variable") +
theme_bw()

Note: Remember to replace "Independent_Variable" and "Dependent_Variable" with the appropriate

variable names from your dataset, and customize the plot labels and titles as needed.
These examples and the provided R code offer a starting point for analyzing and visualizing simple
linear regression data in biostatistics. You can adapt them to fit your own datasets and research
questions, exploring different statistical software and techniques based on your specific study
objectives.
Problem 1: Height and Weight Relationship
A researcher wants to investigate the relationship between height (independent variable) and weight
(dependent variable) in a sample of individuals. The researcher collects data on the height and weight
of 50 participants. They want to determine if there is a significant linear relationship between height
and weight and estimate the weight based on the height of an individual.
Example of R codes:
# Height and Weight Relationship Example
# Height (independent variable)
height <- c(150, 160, 165, 170, 155, 175, 180, 158, 166, 172, 168, 162,
157, 169, 163, 171, 176, 159, 173, 167, 161, 164, 154, 177, 181, 156, 174,
179, 153, 178, 152)

# Weight (dependent variable)

weight <- c(50, 58, 60, 65, 52, 70, 75, 55, 62, 68, 66, 57, 54, 67, 58,
69, 72, 56, 71, 65, 59, 60, 51, 73, 77, 53, 68, 74, 49, 76, 48)

# Create a data frame

df <- data.frame(Height = height, Weight = weight)

# Perform simple linear regression

model <- lm(Weight ~ Height, data = df)

36
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Print the regression coefficients and statistical information

summary(model)

# Visualize the regression line

ggplot(df, aes(x = Height, y = Weight)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Height", y = "Weight") +
theme_bw()

Problem 2: Blood Pressure and Age

A study is conducted to examine the association between blood pressure (dependent variable) and age
(independent variable) in a group of 100 participants. The researchers measure the blood pressure and
record the age of each participant. They aim to determine if age can be used as a predictor of blood
pressure and quantify the strength and direction of the relationship.
Example of R codes:
# Blood Pressure and Age Example
# Age (independent variable)
age <- c(25, 33, 42, 51, 37, 60, 45, 29, 48, 55, 40, 34, 39, 47, 52, 43,
36, 31, 56, 38, 44, 27, 50, 41, 57, 46, 32, 59, 28, 53, 30, 35, 49, 54,
58)

# Blood Pressure (dependent variable)

blood_pressure <- c(120, 130, 140, 150, 140, 160, 150, 130, 155, 165, 135,
125, 138, 150, 155, 142, 136, 128, 160, 137, 145, 125, 158, 140, 163, 147,
130, 162, 127, 156, 132, 134, 148, 152, 159)

# Create a data frame

df <- data.frame(Age = age, Blood_Pressure = blood_pressure)

# Perform simple linear regression

model <- lm(Blood_Pressure ~ Age, data = df)

# Print the regression coefficients and statistical information

summary(model)

# Visualize the regression line

37
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

ggplot(df, aes(x = Age, y = Blood_Pressure)) +

geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Age", y = "Blood Pressure") +
theme_bw()

Problem 3: Cholesterol Level and Dietary Fat Intake

A nutritionist is interested in understanding the relationship between cholesterol levels (dependent
variable) and dietary fat intake (independent variable) among a sample of 75 individuals. The
nutritionist assesses the cholesterol levels and dietary fat intake of each participant and wants to
examine if higher dietary fat intake is associated with increased cholesterol levels.
Example of R codes:
# Cholesterol Level and Dietary Fat Intake Example
# Dietary Fat Intake (independent variable)
fat_intake <- c(40, 50, 60, 70, 55, 65, 75, 45, 55, 65, 50, 55, 58, 62,
68, 52, 58, 60, 70, 65, 45, 75, 40, 50, 55, 60, 70, 65, 55, 45, 68, 62,
58, 50, 48)

# Cholesterol Level (dependent variable)

cholesterol <- c(180, 190, 200, 210, 195, 205, 215, 185, 195, 205, 190,
195, 198, 202, 208, 192, 198, 200, 210, 205, 185, 215, 180, 190, 195, 200,
210, 205, 195, 185, 208, 202, 198, 190, 188)

# Create a data frame

df <- data.frame(Fat_Intake = fat_intake, Cholesterol = cholesterol)

# Perform simple linear regression

model <- lm(Cholesterol ~ Fat_Intake, data = df)

# Print the regression coefficients and statistical information

summary(model)

# Visualize the regression line

ggplot(df, aes(x = Fat_Intake, y = Cholesterol)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Dietary Fat Intake", y = "Cholesterol Level") +
38
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

theme_bw()

6. Multiple linear regression: Multiple linear regression extends simple linear regression to
include multiple independent variables. It examines the relationship between a dependent
variable and several independent variables, assuming a linear relationship. Multiple regression
estimates the coefficients for each independent variable and allows for adjusting for
confounding factors. The model can be used for prediction, inference, and identifying
significant predictors.
Problem 1: Blood Pressure Prediction
A researcher wants to predict blood pressure (dependent variable) based on age (independent variable
1), body mass index (BMI) (independent variable 2), and cholesterol level (independent variable 3).
The researcher gathers data on 100 individuals, including their age, BMI, cholesterol level, and
corresponding blood pressure measurements. The goal is to develop a multiple linear regression
model to predict blood pressure using the three independent variables.
Problem 2: Disease Progression Prediction
A study aims to predict the progression of a particular disease (dependent variable) based on variables
such as age (independent variable 1), gender (independent variable 2), smoking status (independent
variable 3), and genetic marker (independent variable 4). The researchers collect data from 200
patients, recording their age, gender, smoking status, genetic marker status, and disease progression
scores. The objective is to build a multiple linear regression model to predict the disease progression
based on the given independent variables.
Problem 3: Drug Dosage Optimization
A pharmaceutical company is conducting a study to optimize the dosage of a new drug (dependent
variable) based on variables like body weight (independent variable 1), age (independent variable 2),
and liver function (independent variable 3). The company collects data on 50 patients, including their
body weight, age, liver function test results, and the corresponding optimal drug dosage. The aim is
to develop a multiple linear regression model to determine the optimal drug dosage based on the
independent variables.
Solution:
For each problem, you would perform a multiple linear regression analysis using appropriate
statistical software (such as R, Python, or SPSS). The analysis would involve fitting a regression
model to the data and assessing the significance of the relationships between the independent
variables and the dependent variable.
Here's a general template for performing a multiple linear regression analysis using R:
# Load necessary libraries
library(ggplot2)

# Create a data frame with your data

df <- data.frame(
Dependent_Variable = c(dependent_variable_values),
39
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

Independent_Variable_1 = c(independent_variable_1_values),
Independent_Variable_2 = c(independent_variable_2_values),
Independent_Variable_3 = c(independent_variable_3_values)
)

# Perform multiple linear regression

model <- lm(Dependent_Variable ~ ., data = df)

# Print the regression coefficients and statistical information

summary(model)

# Visualize the predicted vs. observed values

ggplot(df, aes(x = Dependent_Variable, y = fitted(model))) +
geom_point() +
geom_abline(intercept = 0, slope = 1, color = "red") +
labs(x = "Observed", y = "Predicted") +
theme_bw()

Detail R codes for each problem:

Problem 1:
# Blood Pressure Prediction Example
# Age (independent variable 1)
age <- c(35, 42, 50, 55, 60, 38, 45, 52, 58, 65, 40, 48, 54, 59, 63)

# Body Mass Index (independent variable 2)

bmi <- c(25, 28, 30, 31, 29, 26, 27, 29, 32, 33, 24, 27, 31, 30, 34)

# Cholesterol Level (independent variable 3)

cholesterol <- c(180, 195, 200, 210, 190, 185, 195, 205, 220, 230, 180,
195, 200, 210, 225)

# Blood Pressure (dependent variable)

blood_pressure <- c(120, 128, 135, 140, 130, 122, 126, 133, 142, 148, 118,
130, 138, 142, 150)

40
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Create a data frame

df <- data.frame(Age = age, BMI = bmi, Cholesterol = cholesterol,
Blood_Pressure = blood_pressure)

# Perform multiple linear regression

model <- lm(Blood_Pressure ~ Age + BMI + Cholesterol, data = df)

# Print the regression coefficients and statistical information

summary(model)

# Visualize the predicted vs. observed values

ggplot(df, aes(x = Blood_Pressure, y = fitted(model))) +
geom_point() +
geom_abline(intercept = 0, slope = 1, color = "red") +
labs(x = "Observed", y = "Predicted") +
theme_bw()

Problem 2:
# Disease Progression Prediction Example
# Age (independent variable 1)
age <- c(45, 52, 60, 62, 55, 48, 50, 57, 65, 68, 41, 49, 56, 63, 67)

# Gender (independent variable 2)

gender <- c("Male", "Female", "Male", "Female", "Male", "Female", "Male",
"Female", "Male", "Female", "Male", "Female", "Male", "Female", "Male")

# Smoking Status (independent variable 3)

smoking <- c("Non-Smoker", "Smoker", "Non-Smoker", "Smoker", "Non-Smoker",
"Smoker", "Non-Smoker", "Smoker", "Non-Smoker", "Smoker", "Non-Smoker",
"Smoker", "Non-Smoker", "Smoker", "Non-Smoker")

# Genetic Marker (independent variable 4)

genetic_marker <- c("Present", "Absent", "Present", "Absent", "Present",
"Absent", "Present", "Absent", "Present", "Absent", "Present", "Absent",
"Present", "Absent", "Present")

41
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

# Disease Progression (dependent variable)

disease_progression <- c(3, 5, 8, 10, 6, 4, 5, 7, 12, 15, 2, 5, 7, 9, 13)

# Create a data frame

df <- data.frame(Age = age, Gender = gender, Smoking = smoking,
Genetic_Marker = genetic_marker, Disease_Progression =
disease_progression)

# Perform multiple linear regression

model <- lm(Disease_Progression ~ Age + Gender + Smoking + Genetic_Marker,
data = df)

# Print the regression coefficients and statistical information

summary(model)

# Visualize the predicted vs. observed values

ggplot(df, aes(x = Disease_Progression, y = fitted(model))) +
geom_point() +
geom_abline(intercept = 0, slope = 1, color = "red") +
labs(x = "Observed", y = "Predicted") +
theme_bw()

Problem 3:
# Drug Dosage Optimization Example
# Body Weight (independent variable 1)
weight <- c(65, 70, 75, 80, 85, 69, 75, 78, 82, 88, 72, 76, 79, 83, 87)

# Age (independent variable 2)

age <- c(45, 52, 60, 62, 55, 48, 50, 57, 65, 68, 41, 49, 56, 63, 67)

# Liver Function (independent variable 3)

liver_function <- c(85, 90, 92, 87, 88, 86, 88, 91, 95, 98, 84, 87, 89,
92, 96)

# Drug Dosage (dependent variable)

42
Introduction to Biostatistics – Pharmacy Assoc. Prof. Pham The Hai (pham-the.hai@usth.edu.vn)

dosage <- c(150, 160, 170, 180, 190, 155, 165, 168, 175, 185, 158, 162,
169, 173, 180)

# Create a data frame

df <- data.frame(Weight = weight, Age = age, Liver_Function =
liver_function, Dosage = dosage)

# Perform multiple linear regression

model <- lm(Dosage ~ Weight + Age + Liver_Function, data = df)

# Print the regression coefficients and statistical information

summary(model)

# Visualize the predicted vs. observed values

ggplot(df, aes(x = Dosage, y = fitted(model))) +
geom_point() +
geom_abline(intercept = 0, slope = 1, color = "red") +
labs(x = "Observed", y = "Predicted") +
theme_bw()

7. Logistic regression: Logistic regression is a statistical model used when the dependent
variable is binary or categorical. It models the relationship between the independent variables
and the probability of a particular outcome. Logistic regression estimates the odds ratios,
which represent the change in odds of the outcome for each unit change in the independent
variable. The model is widely used in medical and biological research for predicting binary
outcomes and assessing the impact of risk factors.
8. Survival analysis: Survival analysis is a statistical method used to analyze time-to-event data,
where the event of interest could be death, disease recurrence, or any other event. It assesses
the survival rates over time and examines the impact of different variables on survival.
Kaplan-Meier survival curves are used to estimate the survival probability over time, and the
log-rank test is used to compare survival between groups. Cox proportional hazards regression
is a commonly used model in survival analysis to estimate hazard ratios and assess the effect
of covariates on survival.
9. Non-parametric tests: Non-parametric tests are used when the data do not meet the
assumptions of parametric tests, such as normal distribution or equal variances. These tests
make fewer assumptions about the underlying distribution and rely on ranks or other
distribution-free methods. The Wilcoxon rank-sum test compares the medians of two
independent groups, the Mann-Whitney U test is a variation of the rank-sum test, and the
Kruskal-Wallis test compares three or more independent groups. These tests are robust
alternatives when the assumptions of parametric tests are not met.

Unit 1 Biostatistics
No ratings yet
Unit 1 Biostatistics
12 pages
Descriptive Statistics and Graphical Techniques-V1
No ratings yet
Descriptive Statistics and Graphical Techniques-V1
52 pages
Summary Biometry
No ratings yet
Summary Biometry
51 pages
Conflict of Interest Disclosures
No ratings yet
Conflict of Interest Disclosures
24 pages
Week 3 - Measures of Central Tendency
No ratings yet
Week 3 - Measures of Central Tendency
4 pages
Descriptive Statistics & Probability Guide
No ratings yet
Descriptive Statistics & Probability Guide
510 pages
Lý thuyết:: Measures of Central Tendency
No ratings yet
Lý thuyết:: Measures of Central Tendency
5 pages
Full Slides Beginselen2019
No ratings yet
Full Slides Beginselen2019
364 pages
Week 2b - Descriptive Statistics-Measures of Dispersion-1 Feb2019
No ratings yet
Week 2b - Descriptive Statistics-Measures of Dispersion-1 Feb2019
26 pages
Measures of Central Tendency Dispersion and Location
No ratings yet
Measures of Central Tendency Dispersion and Location
3 pages
Descriptive Statistics Guide
No ratings yet
Descriptive Statistics Guide
16 pages
Biostatistics Basics for Researchers
No ratings yet
Biostatistics Basics for Researchers
19 pages
Word File For Prob and Stats
No ratings yet
Word File For Prob and Stats
25 pages
1000008355
No ratings yet
1000008355
23 pages
Classification of Data: Objectives: Understand How Data Are Classified. Recognize The Different Types of Data
No ratings yet
Classification of Data: Objectives: Understand How Data Are Classified. Recognize The Different Types of Data
39 pages
05 - Statistical Processing and Analysis of Medical Data
No ratings yet
05 - Statistical Processing and Analysis of Medical Data
14 pages
HNS 2321 Biostatistics Lecture 3 and 4 Descritive Statistics
No ratings yet
HNS 2321 Biostatistics Lecture 3 and 4 Descritive Statistics
36 pages
23 Biostatistics
No ratings yet
23 Biostatistics
18 pages
Introduction To Statistics: Measures of Central Tendency
No ratings yet
Introduction To Statistics: Measures of Central Tendency
35 pages
Statistics: Central Tendency & Dispersion
No ratings yet
Statistics: Central Tendency & Dispersion
35 pages
أسس أحصاء حيوي طبع
No ratings yet
أسس أحصاء حيوي طبع
215 pages
Biostatistics Lecture Notes
No ratings yet
Biostatistics Lecture Notes
24 pages
Bio Stats
No ratings yet
Bio Stats
40 pages
Lesson 1
No ratings yet
Lesson 1
37 pages
Biostat MCQ
No ratings yet
Biostat MCQ
11 pages
Biostatistics LM1
No ratings yet
Biostatistics LM1
28 pages
BIOSTAT LESSON 2 - Descriptive Statistics
No ratings yet
BIOSTAT LESSON 2 - Descriptive Statistics
3 pages
Appendix B: Introduction To Statistics: Eneral Terminology
No ratings yet
Appendix B: Introduction To Statistics: Eneral Terminology
15 pages
Week1 Introduction
No ratings yet
Week1 Introduction
36 pages
Fundamentals of Biostatistics: August 2017
No ratings yet
Fundamentals of Biostatistics: August 2017
123 pages
2statsnotes 1
No ratings yet
2statsnotes 1
24 pages
Basic Concepts in Biostatistics-1
No ratings yet
Basic Concepts in Biostatistics-1
40 pages
Biostatistics Basics 2021 Guide
No ratings yet
Biostatistics Basics 2021 Guide
57 pages
Day1 Descriptive and Summary
No ratings yet
Day1 Descriptive and Summary
36 pages
Descriptive Statistics - Measures of Central Tendency and Dispersion - PHD 2021
No ratings yet
Descriptive Statistics - Measures of Central Tendency and Dispersion - PHD 2021
31 pages
MTH1310 - Statistics
No ratings yet
MTH1310 - Statistics
34 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Lec3&4 02sep2016
No ratings yet
Lec3&4 02sep2016
43 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Understandingstatisticsinresearch 151026064600 Lva1 App6892
No ratings yet
Understandingstatisticsinresearch 151026064600 Lva1 App6892
37 pages
Biostatistics Module-1 PDF
No ratings yet
Biostatistics Module-1 PDF
13 pages
02 - Descriptive Statistics
No ratings yet
02 - Descriptive Statistics
45 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
Actuary Math - Stat. Lec1-9
No ratings yet
Actuary Math - Stat. Lec1-9
22 pages
WK 1b Biostat
No ratings yet
WK 1b Biostat
38 pages
WJBPHS 2023 0128
No ratings yet
WJBPHS 2023 0128
6 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Hns 2321 Biostatistics Descritive Statistics
No ratings yet
Hns 2321 Biostatistics Descritive Statistics
35 pages
PAHS 306: Health Statistics and Information
No ratings yet
PAHS 306: Health Statistics and Information
15 pages
Probability Distributions and Biostatistics
No ratings yet
Probability Distributions and Biostatistics
16 pages
Research Methodology 2025
No ratings yet
Research Methodology 2025
91 pages
Mean, Median, Mode and Standard Deviation
No ratings yet
Mean, Median, Mode and Standard Deviation
42 pages
Descriptive Statistics Part 1
No ratings yet
Descriptive Statistics Part 1
18 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
18 pages
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
No ratings yet
Statistics 1 (Final) / Orthodontic Courses by Indian Dental Academy
15 pages
Statistics
No ratings yet
Statistics
47 pages
Introduction To Statistics 2 - 012233
No ratings yet
Introduction To Statistics 2 - 012233
29 pages
10th Samacheer Kalvi Maths EM Public Exam QP Sample 4 PDF
No ratings yet
10th Samacheer Kalvi Maths EM Public Exam QP Sample 4 PDF
4 pages
CSC 607 - Fuzzy Logic Systems Chapter 1. Classical Set Theory
No ratings yet
CSC 607 - Fuzzy Logic Systems Chapter 1. Classical Set Theory
51 pages
Arguments
No ratings yet
Arguments
16 pages
MSC Research Project Dissertation Guideline 2019-20
No ratings yet
MSC Research Project Dissertation Guideline 2019-20
14 pages
Icmr 2021 Template
No ratings yet
Icmr 2021 Template
3 pages
Magnetic Force (Assignment-2)
No ratings yet
Magnetic Force (Assignment-2)
6 pages
Introduction To Algorithm: Unit-1 Basics of Algorithms and
No ratings yet
Introduction To Algorithm: Unit-1 Basics of Algorithms and
11 pages
Journal of Statistical Software: Learning Bayesian Networks With The Bnlearn R Package
No ratings yet
Journal of Statistical Software: Learning Bayesian Networks With The Bnlearn R Package
22 pages
Assignment Data Science in Python
No ratings yet
Assignment Data Science in Python
2 pages
A Computational Simulation of Electromembrane Extraction Based On Poisson - Nernst - Planck Equations
No ratings yet
A Computational Simulation of Electromembrane Extraction Based On Poisson - Nernst - Planck Equations
11 pages
Comments To FprEN1991 1-3-2023 Cylindrical Roof
No ratings yet
Comments To FprEN1991 1-3-2023 Cylindrical Roof
15 pages
Grade 9 Unit 7
No ratings yet
Grade 9 Unit 7
5 pages
Basics of Vibration Isolation
No ratings yet
Basics of Vibration Isolation
8 pages
Model Important Questions-1
No ratings yet
Model Important Questions-1
2 pages
Heap Sort
No ratings yet
Heap Sort
28 pages
Heer and Maussner PP 28-41
No ratings yet
Heer and Maussner PP 28-41
14 pages
The Crystal Connection A Guidebook For Personal and Planetary Ascension by Randall N and Vickie V Baer 1986
100% (2)
The Crystal Connection A Guidebook For Personal and Planetary Ascension by Randall N and Vickie V Baer 1986
410 pages
7.NS.1ab Absolute Value and Zero Pairs Lesson
No ratings yet
7.NS.1ab Absolute Value and Zero Pairs Lesson
3 pages
Introduction To Mathematical Finance and Derivatives (PHD) : Lecturer
No ratings yet
Introduction To Mathematical Finance and Derivatives (PHD) : Lecturer
3 pages
2006 Int ANSYS Conf 180 PDF
No ratings yet
2006 Int ANSYS Conf 180 PDF
21 pages
CSIR NET Physical Sciences Syllabus
No ratings yet
CSIR NET Physical Sciences Syllabus
4 pages
Lecture 8-Shearing Forces and Bending Moments in Beams (DELIVERED)
No ratings yet
Lecture 8-Shearing Forces and Bending Moments in Beams (DELIVERED)
30 pages
School of Engineering and Mathematical Sciences: by Umar Draz Ahmad
No ratings yet
School of Engineering and Mathematical Sciences: by Umar Draz Ahmad
116 pages
JEE Advanced 2022 Test Paper
No ratings yet
JEE Advanced 2022 Test Paper
12 pages
Module 1 Quiz
No ratings yet
Module 1 Quiz
2 pages
Grade 10 Math: Composite Functions
No ratings yet
Grade 10 Math: Composite Functions
2 pages
Background: 1.1. DNA - Deoxyribonucleic Acid
No ratings yet
Background: 1.1. DNA - Deoxyribonucleic Acid
19 pages
Semi-Detailed Lesson Plan
75% (4)
Semi-Detailed Lesson Plan
4 pages
(Time: 2 Hours) Total Marks: 75: Q.P. Code: 36158
No ratings yet
(Time: 2 Hours) Total Marks: 75: Q.P. Code: 36158
2 pages
Norma ANSI+AGMA+6000-B96+ (R2002)
100% (6)
Norma ANSI+AGMA+6000-B96+ (R2002)
28 pages