3/14/25, 4:35 PM L-2 (Data Frame Part 1).
ipynb - Colab
keyboard_arrow_down Python Data Frames Part 1
Basic Operations on Data
# import libraries
import numpy as np
import pandas as pd
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
marks = pd.read_csv("/content/drive/MyDrive/Data_Analytics/Test_data.csv")
print(marks)
Mounted at /content/drive
RollNo Name Eco Maths
0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaaroosh 18 27
marks.columns=['ROLLNO', 'NAME', 'ECONOMICS', 'MATHS'] # Renaming of Column
marks
ROLLNO NAME ECONOMICS MATHS
0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaaroosh 18 27
# nsmallest(n, column_label) gives the n smallest values in the column, creates a dataframe as its result
least2 = marks.nsmallest(2, "ECONOMICS")
print(least2)
ROLLNO NAME ECONOMICS MATHS
0 1 Arnab 18 57
4 5 Aaaroosh 18 27
# nlargest(n, column_label) gives the n largest values in the column, creates a dataframe as its result
great2 = marks.nlargest(2, "MATHS")
print(great2)
ROLLNO NAME ECONOMICS MATHS
3 4 Vivaan 40 60
0 1 Arnab 18 57
# between checks for values in a range
result = marks["MATHS"].between(35, 45, "both")
print(marks[result]) # Filtering the dataframe on a boolean series
ROLLNO NAME ECONOMICS MATHS
1 2 Kritika 23 45
2 3 Divyam 51 37
print(result)
print(type(result))
0 False
1 True
2 True
3 False
4 False
Name: Maths, dtype: bool
<class 'pandas.core.series.Series'>
datadic = {"P":[2, 9, 8, 7],
"Q":[1, 20, 12, 5],
"R":[14, 30, 18, 52],
"S":[52, 46, 12, 83]}
df = pd.DataFrame(datadic)
df
https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 1/5
3/14/25, 4:35 PM L-2 (Data Frame Part 1).ipynb - Colab
P Q R S
0 2 1 14 52
1 9 20 30 46
2 8 12 18 12
3 7 5 52 83
count = df['P'].count()
print(count)
max_val=df['P'].max()
print ("Maximum Value of one Column P\n", max_val)
max_row=df.max(axis=1)
print ("Maximum Value Rowwise\n", max_row)
max_col=df.max(axis=0)
print ("Maximum Value Columnwise\n", max_col)
Maximum Value of one Column P
9
Maximum Value Rowwise
0 52
1 46
2 18
3 83
dtype: int64
Maximum Value Columnwise
P 9
Q 20
R 52
S 83
dtype: int64
min_val=df['P'].min()
print ("Minimum Value of one Column P\n", min_val)
min_row=df.min(axis=1)
print ("Minimum Value Rowwise\n", min_row)
min_col=df.min(axis=0)
print ("Minimum Value Columnwise\n", min_col)
Minimum Value of one Column P
2
Minimum Value Rowwise
0 1
1 9
2 8
3 5
dtype: int64
Minimum Value Columnwise
P 2
Q 1
R 14
S 12
dtype: int64
Basic Statistical Functions
mean_val=df['P'].mean()
print ("Mean Value of one Column P\n", mean_val)
mean_row=df.mean(axis=1)
print ("Mean Value Rowwise\n", mean_row)
mean_col=df.mean(axis=0)
print ("Mean Value Columnwise\n", mean_col)
Mean Value of one Column P
6.5
Mean Value Rowwise
0 17.25
1 26.25
2 12.50
3 36.75
dtype: float64
Mean Value Columnwise
P 6.50
Q 9.50
R 28.50
S 48.25
https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 2/5
3/14/25, 4:35 PM L-2 (Data Frame Part 1).ipynb - Colab
dtype: float64
# mean() function on a dataframe which has Na values.
df = pd.DataFrame({"Anu":[12, 4, 5, None, 1], "Bina":[7, 2, 54, 3, None],
"Chitra":[20, 16, 11, 3, 8], "Deep":[14, 3, None, 2, 6]})
print(df)
# skip the Na values while finding the mean
df.mean(axis = 1, skipna = True) # Mean over the column axis.
Anu Bina Chitra Deep
0 12.0 7.0 20 14.0
1 4.0 2.0 16 3.0
2 5.0 54.0 11 NaN
3 NaN 3.0 3 2.0
4 1.0 NaN 8 6.0
0 13.250000
1 6.250000
2 23.333333
3 2.666667
4 5.000000
dtype: float64
mode_val=df['P'].mode()
print ("Mode Value of one Column P\n", mode_val)
mode_row=df.mode(axis=1)
print ("Mode Value Rowwise\n", mode_row)
mode_col=df.mode(axis=0)
print ("Mode Value Columnwise\n", mode_col)
Mode Value of one Column P
0 2
1 7
2 8
3 9
Name: P, dtype: int64
Mode Value Rowwise
0 1 2 3
0 1.0 2.0 14.0 52.0
1 9.0 20.0 30.0 46.0
2 12.0 NaN NaN NaN
3 5.0 7.0 52.0 83.0
Mode Value Columnwise
P Q R S
0 2 1 14 12
1 7 5 18 46
2 8 12 30 52
3 9 20 52 83
median_val=df['P'].median()
print ("Median Value of one Column P\n", median_val)
median_row=df.median(axis=1)
print ("Median Value Rowwise\n", median_row)
median_col=df.median(axis=0)
print ("Median Value Columnwise\n", median_col)
Median Value of one Column P
7.5
Median Value Rowwise
0 8.0
1 25.0
2 12.0
3 29.5
dtype: float64
Median Value Columnwise
P 7.5
Q 8.5
R 24.0
S 49.0
dtype: float64
std_val=df['P'].std()
print ("Standard Deviation Value of one Column P\n",
std_val)
std_row=df.std(axis=1)
print ("Standard Deviation Value Rowwise\n", std_row)
std_col=df.std(axis=0)
print ("Standard Deviation Value Columnwise\n", std_col)
Standard Deviation Value of one Column P
3.1091263510296048
Standard Deviation Value Rowwise
0 23.907809
1 15.713582
2 4.123106
https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 3/5
3/14/25, 4:35 PM L-2 (Data Frame Part 1).ipynb - Colab
3 37.703890
dtype: float64
Standard Deviation Value Columnwise
P 3.109126
Q 8.346656
R 17.078251
S 29.101833
dtype: float64
print(df.cov())
P Q R S
P 9.666667 22.000000 21.666667 -19.833333
Q 22.000000 69.666667 2.333333 -100.833333
R 21.666667 2.333333 291.666667 379.833333
S -19.833333 -100.833333 379.833333 846.916667
df['P'].cov(df['Q']) # Correlation between two specific columns
print(df.corr())
P Q R S
P 1.000000 0.847758 0.408047 -0.219198
Q 0.847758 1.000000 0.016369 -0.415118
R 0.408047 0.016369 1.000000 0.764239
S -0.219198 -0.415118 0.764239 1.000000
df['P'].corr(df['Q']) # Correlation between two specific columns
print(df)
print(df.cumsum(axis=0))
P Q R S
0 2 1 14 52
1 9 20 30 46
2 8 12 18 12
3 7 5 52 83
P Q R S
0 2 1 14 52
1 11 21 44 98
2 19 33 62 110
3 26 38 114 193
print(df)
print(df.cumsum(axis=1))
P Q R S
0 2 1 14 52
1 9 20 30 46
2 8 12 18 12
3 7 5 52 83
P Q R S
0 2 3 17 69
1 9 29 59 105
2 8 20 38 50
3 7 12 64 147
print(df)
print(df.cumprod(axis=0))
P Q R S
0 2 1 14 52
1 9 20 30 46
2 8 12 18 12
3 7 5 52 83
P Q R S
0 2 1 14 52
1 18 20 420 2392
2 144 240 7560 28704
3 1008 1200 393120 2382432
https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 4/5
3/14/25, 4:35 PM L-2 (Data Frame Part 1).ipynb - Colab
https://colab.research.google.com/drive/1Bos1V9K5-scUxXEBk7rNViiSpMC4k0wJ#scrollTo=A6cdItmM9Kkg&printMode=true 5/5