One-way ANOVA:
# Importing library
from scipy.stats import f_oneway
# Performance when each of the engine
# oil is applied
performance1 = [89, 89, 88, 78, 79]
performance2 = [93, 92, 94, 89, 88]
performance3 = [89, 88, 89, 93, 90]
performance4 = [81, 78, 81, 92, 82]
# Conduct the one-way ANOVA
print(f_oneway(performance1, performance2, performance3, performance4))
Output: F_onewayResult(statistic=4.625000000000002, pvalue=0.016336459839780215)
###############################################
import pandas as pd
# load data file
df = pd.read_excel("C:/Users/user/Documents/sampanova.xlsx")
# reshape the d dataframe suitable for statsmodels package
df_melt = pd.melt(df.reset_index(), id_vars=['index'], value_vars=['A', 'B', 'C', 'D'])
# replace column names
df_melt.columns = ['index', 'treatments', 'value']
# generate a boxplot to see the data distribution by treatments. Using boxplot, we can
# easily detect the differences between different treatments
import matplotlib.pyplot as plt
import seaborn as sns
ax = sns.boxplot(x='treatments', y='value', data=df_melt, color='#99c2a2')
ax = sns.swarmplot(x="treatments", y="value", data=df_melt, color='#7d0013')
plt.show()
import scipy.stats as stats
# stats f_oneway functions takes the groups as input and returns ANOVA F and p value
fvalue, pvalue = stats.f_oneway(df['A'], df['B'], df['C'], df['D'])
print(fvalue, pvalue)
# 17.492810457516338 2.639241146210922e-05
# get ANOVA table as R like output
import statsmodels.api as sm
from statsmodels.formula.api import ols
# Ordinary Least Squares (OLS) model
model = ols('value ~ C(treatments)', data=df_melt).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
#######################
# install
pip install bioinfokit
# upgrade to latest version
pip install bioinfokit --upgrade
# uninstall
pip uninstall bioinfokit
################################
t-test
import scipy.stats as stats
import numpy as np
# Creating data groups
data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,
17, 16, 14, 19, 20, 21, 15,
15, 16, 16, 13, 14, 12])
data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,
19, 19, 14, 17, 22, 24, 16,
13, 16, 13, 18, 15, 13])
# Print the variance of both data groups
print(np.var(data_group1), np.var(data_group2))
output: 7.727500000000001 12.260000000000002
1. Performing Two-Sample T-Test
Method 1
# Python program to demonstrate how to
# perform two sample T-test
# Import the library
import scipy.stats as stats
import numpy as np
# Creating data groups
data_group1 = np.array([14, 15, 15, 16, 13, 8, 14,
17, 16, 14, 19, 20, 21, 15,
15, 16, 16, 13, 14, 12])
data_group2 = np.array([15, 17, 14, 17, 14, 8, 12,
19, 19, 14, 17, 22, 24, 16,
13, 16, 13, 18, 15, 13])
# Perform the two sample t-test with equal variances
print(stats.ttest_ind(a=data_group1, b=data_group2, equal_var=True))
output: Ttest_indResult(statistic=-0.6337397070250238, pvalue=0.5300471010405257)
method 2
# Python program to conduct two-sample
# T-test using pingouin library
# Importing library
from statsmodels.stats.weightstats import ttest_ind
import numpy as np
import pingouin as pg
# Creating data groups
data_group1 = np.array([160, 150, 160, 156.12, 163.24,
160.56, 168.56, 174.12,
167.123, 165.12])
data_group2 = np.array([157.97, 146, 140.2, 170.15,
167.34, 176.123, 162.35, 159.123,
169.43, 148.123])
# Conducting two-sample ttest
result = pg.ttest(data_group1,
data_group2,
correction=True)
# Print the result
print(result)
output: T dof alternative ... cohen-d BF10 power
T-test 0.653148 14.389477 two-sided ... 0.292097 0.462 0.094912
Method 3
from statsmodels.stats.weightstats import ttest_ind
import numpy as np
import pingouin as pg
# Creating data groups
data_group1 = np.array([160, 150, 160, 156.12,
163.24,
160.56, 168.56, 174.12,
167.123, 165.12])
data_group2 = np.array([157.97, 146, 140.2, 170.15,
167.34, 176.123, 162.35,
159.123, 169.43, 148.123])
# Conducting two-sample ttest
print(ttest_ind(data_group1, data_group2))
output: (0.6531479162158739, 0.5219170107019715, 18.0) ….> t-stat, p-val, df
linear regression
pip install sklearn-pandas==1.5.0