Chapter – 2
PYTHON PANDAS
DataFrame Data Structure – II
ITERATING OVER A DATAFRAME
Sometimes, it is required to process all the data values of a DataFrame,
in such case writing separate statements for accessing individual values will make the
process cumbersome. Thus, in such cases we need to iterate over a DataFrame. To
iterate over a DataFrame iterrows( ) and iteritems( ) methods are used.
i. iterrows( ) method: This method accesses the elements of a DataFrame
horizontally i.e. row wise. Each horizontal subset is in the form
(row_index, Series) where Series contains the values in the row for the
row_index. The syntax of iterrows( ) method is as follows:
Syntax:
Dataframe_object.iterrows( )
row_index
Name Marks Grade City
11251 Anil 65.5 B Abu Road
11252 Vinay 89.0 A Udaipur
11253 Kartik 72.5 B Jaipur
11254 Ravi 45.5 C Bikaner
11255 Nakul 95.0 A+ Delhi
1
ii. iteritems( ) method: This method accesses the elements of a DataFrame
vertically i.e. column wise. Each vertical subset is in the form (col_index,
Series) where Series contains the values in the column for the col_index.
The syntax of iteritems( ) method is as follows:
Syntax:
Dataframe_object.iteritems( )
col_index
Name Marks Grade City
11251 Anil 65.5 B Abu Road
11252 Vinay 89.0 A Udaipur
11253 Kartik 72.5 B Jaipur
11254 Ravi 45.5 C Bikaner
11255 Nakul 95.0 A+ Delhi
2
BINARY OPERATIONS / CALCULATIONS ON DATAFRAME
In binary operations, the data from 2 DataFrames are aligned on the basis of their row
and column indexes. For matching row and column index, the given operation is performed
and for non-matching row and column index NaN value is stored in the result. The different
operations which can be performed on DataFrame are:
S.N. Function Description Example
df3 = df1.add(df2)
1 add( ) Adds two DataFrames df1 and df2. or
df3 = df1 + df2
df3 = df1.sub(df2)
2 sub( ) Subtracts DataFrames df2 from df1. or
df3 = df1 – df2
df3 = df1.mul(df2)
3 mul( ) Multiplies two DataFrames df1 and df2. or
df3 = df1 * df2
df3 = df1.div(df2)
4 div( ) Divides DataFrames df1 by df2. or
df3 = df1 / df2
df3 = df1.mod(df2)
Divides DataFrames df1 by df2 and gives
5 mod( ) or
remainder as result.
df3 = df1 % df2
df3 = df1.floordiv(df2)
Divides DataFrames df1 by df2 and gives
6 floordiv( ) or
integer part as result.
df3 = df1 // df2
3
Sample Program 1:
Create two DataFrames term1 and term2 to store marks of students in three
subjects for Term-I and Term-II exams. DataFrame term1 contains marks of 4
students in 3 subjects and DataFrame term2 contains marks of 5 students in 3
subjects. Add marks of Term-I and Term-II and store in third DataFrame total and
display the output.
DataFrame: term1 DataFrame: term2
Phy Chem Maths Phy Chem Maths
11251 50 70 75 11251 65 55 75
11252 40 45 42 11252 57 63 70
11253 90 92 99 11253 89 95 98
11254 68 74 90 11254 78 69 76
11255 50 60 70
Program & Output:
Note: The result for non-matching values is NaN
4
Sample Program 2:
Create two DataFrames term1 and term2 to store marks of students in three
subjects (Phy, Chem, Maths) for Term-I and Term-II exams. DataFrame term1
contains marks of 4 students in 3 subjects (Phy, Chem, Comp) and DataFrame term2
contains marks of 4 students in 3 subjects. Find difference in marks of Term-I and
Term-II and store in third DataFrame diff and display the output.
DataFrame: term1 DataFrame: term2
Phy Chem Maths Phy Chem Comp
11251 50 70 75 11251 65 55 75
11252 40 45 42 11252 57 63 70
11253 90 92 99 11253 89 95 98
11254 68 74 90 11254 78 69 76
Program & Output:
5
Sample Program 3:
Create two DataFrames term1 and term2 to store marks of students in three
subjects (Phy, Chem, Maths) for Term-I and Term-II exams. DataFrame term1
contains marks of 4 students in 3 subjects (Phy, Chem, Comp) and DataFrame term2
contains marks of 5 students in 3 subjects. Find multiplication of marks of Term-I and
Term-II and store in third DataFrame product and display the output.
DataFrame: term1 DataFrame: term2
Phy Chem Maths Phy Chem Comp
11251 50 70 75 11251 65 55 75
11252 40 45 42 11252 57 63 70
11253 90 92 99 11253 89 95 98
11254 68 74 90 11254 78 69 76
11255 50 60 70
Program & Output:
6
DESCRIPTIVE STATISTICS WITH PANDAS
Python Pandas is widely used data science library and it offers many useful
functions. Among many other functions, Pandas also offer many useful Statistical and
Aggregate functions. Some of these functions are as follows:
1. min( ) function: The min( ) function finds out the minimum value from a given
DataFrame.
Syntax:
DataFrame.min( axis = None, skipna = None, numeric_only = None)
Parameters:
Parameter Value Description
axis 0 or 1 By default, minimum value is calculated along
axis 0 i.e. index 0 & column 1
skipna True or False It excludes NaN/null values when computing
the result.
numeric_only True or False It includes only int, float and Boolean columns
for calculating min. If value is None, it will
attempt to use everything, then uses only
numeric data.
Example: Program to find out minimum marks of each student.
7
2. max( ) function: The max( ) function finds out the maximum value from a given
DataFrame.
Syntax:
DataFrame.max( axis = None, skipna = None, numeric_only = None)
Parameters:
Parameter Value Description
axis 0 or 1 By default, maximum value is calculated along
axis 0 i.e. index 0 & column 1
skipna True or False It excludes NaN/null values when computing
the result.
numeric_only True or False It includes only int, float and Boolean columns
for calculating max. If value is None, it will
attempt to use everything, then uses only
numeric data.
Example: Program to find out maximum marks subject wise.
8
3. mode( ) function: The mode( ) function returns the mode value (i.e. the value that
appears most number of times) from a set of values.
Syntax:
DataFrame.mode(axis = 0, numeric_only = False)
Parameters:
Parameter Value Description
axis 0 or 1 By default, mode is calculated along axis 0 i.e.
index 0 & column 1
skipna True or False It excludes NaN/null values when computing
the result.
numeric_only True or False It includes only int, float and Boolean columns
for calculating mode. If value is None, it will
attempt to use everything, then uses only
numeric data.
Example: Program to calculate mode subject wise.
Note: In the above program in Phy marks 50 appears 3 times so it is displayed, in
Chem marks both 69 and 74 appears twice so both are displayed and in Maths marks
all the values appear only once so all the values are displayed. And to fill the
remaining cells in Phy and Chem column NaN is used.
9
4. mean( ) function: The mean( ) function returns the computed mean(average) from a
set of values.
Syntax:
DataFrame.mean(axis = None, skipna = None, numeric_only = None)
Parameters:
Parameter Value Description
axis 0 or 1 By default, mean is calculated along axis 0 i.e.
index 0 & column 1
skipna True or False It excludes NaN/null values when computing
the result.
numeric_only True or False It includes only int, float and Boolean columns
for calculating mean. If value is None, it will
attempt to use everything, then uses only
numeric data.
Example: Program to calculate mean student wise.
10
5. median( ) function: This function returns the middle number from a set of numbers.
Syntax:
DataFrame.median(axis = None, skipna = None, numeric_only = None)
Parameters:
Parameter Value Description
axis 0 or 1 By default, median is calculated along axis 0
i.e. index 0 & column 1
skipna True or False It excludes NaN/null values when computing
the result.
numeric_only True or False It includes only int, float and Boolean columns
for calculating median. If value is None, it will
attempt to use everything, then uses only
numeric data.
Example: Program to find median student wise.
Note: To find the Median first the values are arranged in some order (ascending or
descending) and then middle value is marked. If the number of values are even then
median is calculated as average of two middle values.
11
6. count( ) function: This function counts the non-NaN values for each row or column.
Syntax:
DataFrame.count(axis = None, skipna = None, numeric_only = None,
min_count = 0)
Parameters:
Parameter Value Description
axis 0 or 1 By default, count function is performed along
axis 0 i.e. index 0 & column 1
skipna True or It excludes NaN/null values when computing
False the result.
numeric_only True or It includes only int, float and Boolean columns
False for counting. If value is None, it will attempt to
use everything, then uses only numeric data.
min_count int _value, The required number of valid values to perform
Default is 0 the operation
Example: Program to Count the number of students appeared in exam in each
subject.
12
7. sum( ) function: This function returns the sum of the values for the requested axis.
Syntax:
DataFrame.mean(axis = None, skipna = None, numeric_only = None,
min_count = 0)
Parameters:
Parameter Value Description
axis 0 or 1 By default, sum is calculated along axis 0 i.e.
index 0 & column 1
skipna True or False It excludes NaN/null values when computing
the result.
numeric_only True or False It includes only int, float and Boolean columns
for calculating sum. If value is None, it will
attempt to use everything, then uses only
numeric data.
min_count int _value, The required number of valid values to perform
Default is 0 the operation
Example: Program to find sum of marks for each student.
13
OTHER FUNCTIONS:
8. info( ) function: The info( ) function is used to get the basic information about the
DataFrame. This function gives information about its type, index values, number of
rows, data columns, num of values in columns, data type of each column and memory
usage.
Syntax:
DataFrame.info( )
Example: Program to display information about DataFrame term1.
14
9. head( ) function: The head( ) function is used to fetch the specified number of rows
from the top of the DataFrame. The syntax is as follows:
Syntax:
DataFrame_Object . head ( [n] )
Here n is the number of rows to be displayed from the top, if n is not specified then it
displays top 5 rows from the DataFrame.
Example: Program to display top 5 rows of the DataFrame.
15
10. tail( ) function: The tail( ) function is used to fetch the specified number of rows
from the bottom of the Dataframe: The syntax is as follows:
Syntax:
DataFrame_Object. tail ( [n] )
Here n is the number of rows to be displayed from the bottom, if n is not specified
then it displays bottom 5 rows from the DataFrame.
Example: Program to display bottom 3 rows of the DataFrame.
16
APPLYING FUNCTIONS ON A SUBSET OF DATAFRAME:
Sometimes, you need to apply a function on a selective column or a row or a
subset of the DataFrame. For doing this the concept of accessing a single row, single
column or a subset is used.
i. Applying Functions on a Single Column of a DataFrame:
Syntax:
DataFrame[Column_name].functionname
Example: Program to display minimum marks in Physics.
df1[“Phy”].min( )
17
ii. Applying Functions on Multiple Columns of a DataFrame:
Syntax:
DataFrame[[col1, col2, col3…]].functionname
Example: Program to count number of values in Physics and Maths columns.
df1[[“Phy” , “Maths”]].count( )
18
iii. Applying Functions on a Single Row of a DataFrame:
Syntax:
DataFrame.loc[ row_index , : ].functionname
Example: Program to find sum of marks for Roll number 11254.
df1.loc[ 11254 , : ].sum( )
19
iv. Applying Functions on Multiple Rows of a DataFrame:
Syntax:
DataFrame.loc[ Starting_row_index : Ending_row-index , : ].functionname
Example: Program to find max marks for Roll number 11253 to 11256.
df1.loc[ 11253 : 11256 , : ].max( )
20
v. Applying Functions on a Subset of a DataFrame:
Syntax:
DataFrame.loc[ Starting_row_index : Ending_row-index ,
Starting_col_index : Ending_col-index ].functionname
Example: Program to find max marks from Roll number 11252 to 11257 in
the subjects of Chem and Maths.
df1.loc[ 11252 : 11257 , “Chem” : “Maths” ].max( )
---- x ----
21