How to Perform t-Tests in Pandas (3
Examples)
The following examples show how to perform three
different t-tests using a pandas DataFrame:
Independent Two Sample t-Test
Welch’s Two Sample t-Test
Paired Samples t-Test
Example 1: Independent Two Sample t-Test in Pandas
An independent two sample t-test is used to determine if
two population means are equal.
For example, suppose a professor wants to know if
two different studying methods lead to different
mean exam scores.
To test this, he recruits 10 students to use method
A and 10 students to use method B.
The following code shows how to enter the scores
of each student in a pandas DataFrame and then
use the ttest_ind() function from the SciPy library to
perform an independent two sample t-test:
import pandas as pd
from scipy.stats import ttest_ind
#create pandas DataFrame
df = pd.DataFrame({'method': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'score': [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 81,
84, 88, 88, 89, 90, 90, 91]})
#view first five rows of DataFrame
df.head()
method score
0 A 71
1 A 72
2 A 72
3 A 75
4 A 78
#define samples
group1 = df[df['method']=='A']
group2 = df[df['method']=='B']
#perform independent two sample t-test
ttest_ind(group1['score'], group2['score'])
Ttest_indResult(statistic=-2.6034304605397938, pvalue=0.017969284594810425)
From the output we can see:
t test statistic: –2.6034
p-value: 0.0179
Since the p-value is less than .05, we reject the null
hypothesis of the t-test and conclude that there is
sufficient evidence to say that the two methods
lead to different mean exam scores.
Example 2: Welch’s t-Test in Pandas
Welch’s t-test is similar to the independent two
sample t-test, except it does not assume that the
two populations that the samples came from
have equal variance.
To perform Welch’s t-test on the exact same
dataset as the previous example, we simply need
to specify equal_var=False within the ttest_ind()
function as follows:
import pandas as pd
from scipy.stats import ttest_ind
#create pandas DataFrame
df = pd.DataFrame({'method': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'score': [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 81,
84, 88, 88, 89, 90, 90, 91]})
#define samples
group1 = df[df['method']=='A']
group2 = df[df['method']=='B']
#perform Welch's t-test
ttest_ind(group1['score'], group2['score'], equal_var=False)
Ttest_indResult(statistic=-2.603430460539794, pvalue=0.02014688617423973)
From the output we can see:
t test statistic: –2.6034
p-value: 0.0201
Since the p-value is less than .05, we reject the null
hypothesis of Welch’s t-test and conclude that
there is sufficient evidence to say that the two
methods lead to different mean exam scores.
Example 3: Paired Samples t-Test in Pandas
A paired samples t-test is used to determine if two
population means are equal in which each
observation in one sample can be paired with an
observation in the other sample.
For example, suppose a professor wants to know if
two different studying methods lead to different
mean exam scores.
To test this, he recruits 10 students to use method
A and then take a test. Then, he lets the same 10
students used method B to prepare for and take
another test of similar difficulty.
Since all of the students appear in both samples,
we can perform a paired samples t-test in this
scenario.
The following code shows how to enter the scores
of each student in a pandas DataFrame and then
use the ttest_rel() function from the SciPy library to
perform a paired samples t-test:
import pandas as pd
from scipy.stats import ttest_rel
#create pandas DataFrame
df = pd.DataFrame({'method': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'score': [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 81,
84, 88, 88, 89, 90, 90, 91]})
#view first five rows of DataFrame
df.head()
method score
0 A 71
1 A 72
2 A 72
3 A 75
4 A 78
#define samples
group1 = df[df['method']=='A']
group2 = df[df['method']=='B']
#perform independent two sample t-test
ttest_rel(group1['score'], group2['score'])
Ttest_relResult(statistic=-6.162045351967805, pvalue=0.0001662872100210469)
From the output we can see:
t test statistic: –6.1620
p-value: 0.0001
Since the p-value is less than .05, we reject the null
hypothesis of the paired samples t-test and
conclude that there is sufficient evidence to say
that the two methods lead to different mean exam
scores.