PANDAS & DATAFRAME REVISION NOTES
Class XII
DataFrame() * 2-D : single column (rows/col collection)
* size mutable: after creation, addition of new data,row,col allowed
*Value Mutable: index values can change means Df[2:2]=5 or Df[2:2]+=4 it is
applicable.
*DataFrame output with index,column and value in tabular form.
Q1. Can we add more values in same DF?
Ans: Yes, size mutable.
Q4. Can we change values?
Ans: Yes, value mutable.
Creation
Empty S=pd.DataFrame()
Dictionary student=pd.DataFrame({‘rno’:[1,2,3],’name’:[‘neha’,’karan’,’priya’],’marks’:[43,4
with list 7,39]},index=[‘a’,’b’,’c’])
Dictionary student=pd.DataFrame({'rno':{'a':1,'b':2,'c':3},'name':{'a':'neha','b':'karan','c':'pri
with dict ya'}})
print(student)
Dictionary student=pd.DataFrame({'rno':pd.Series([1,2,3]),'name':pd.Series(['neha','karan','
with Series priya'])})
print(student)
List student=pd.DataFrame([[1,'neha',43],[2,'karan',47],[3,'priya',39]],columns=['rno'
,'name','marks'],index=['a','b','c'])
print(student)
Single student=pd.DataFrame(5,columns=['rno','name','marks'],index=['a','b','c'])
value print(student)
Numpy import numpy as np
array a=np.array([[1,'neha',43],[2,'karan',47],[3,'priya',39]])
student=pd. DataFrame(a, columns=['rno','name','marks'],index=['a','b','c'])
print(student)
Using s=pd.Series([1,2,3])
formula s1=pd.Series(['neha','karan','priya'])
student=pd.DataFrame({'rno':s+2,'name':s1+s1})
print(student)
* formula not possible with list: not work with vector operation
Operators with DataFrame (Arithmetic operators : +, - ,*, /, %, //,**) , (Relational : > < >= <=
==),Logical (and or not)
Vector * DataFrame operate with single value using operators (single value affects each
operation DataFrame value it is vector operation)
df*2 , df>3, df%2==0
df=pd.DataFrame([[4589,45000,44.89],[3500,56000,27.988],[4500,57000,1.865]],i
ndex=['delhi','bombay','chennai'],columns=['pop','avg','per'])
print(df)
print(df>3)
print(df['avg']>5000)
print(df[['avg','pop']]>5000)
*check all dataframe value
print(df[df>3])
*check all avg column value
print(df[df['avg']>50000])
*check all avg column value
print(df.loc[df['avg']>50000])
*check all avg column value & display avg only
print(df.loc[(df['avg']>50000),'avg'])
check all avg column value & display avg & per col
only print(df.loc[(df['avg']>50000),['avg','per']])
Binary * DataFrame operate with other DataFrame value using operators (matched index
Operation operate according to operator and unmatched index give NaN value) .
import pandas as pd
df1=pd.DataFrame([[4,4,9],[3,2.9,3],[4,5,1]],index=['d','b','c'],columns=['X','Y','Z']
)
df2=pd.DataFrame([[4,4,9],[3,2.9,5],[4,5,1]],index=['d','b','c'],columns=['X','Y','Z']
)
print(df1)
print(df2)
print(df1+df2)
print(df1>df2)
* for unmatched index Give NaN value
Attributes:- DataFrame have some properties called attributes.
It is used with DataFrame name and dot operator
Brackets not used
import pandas as pd
df=pd.DataFrame([[4,4,9],[3,2.9,3],[4,5,1]],index=['d','b','c'],columns=['X','Y','Z'])
print(df.index)
print(df.columns)
print(df.dtypes)
print(df.values)
print(df.shape)
print(df.size)
print(df.T)
print(df.axes)
print(df.empty)
print(df.ndim)
* index and columns attribute also used to assign or change the label index
print(df)
df.index=['m','n','k']
df.columns=['XX','YY','ZZ']
print(df)
Slicing to display particular part using [start:stop:step], iloc[],loc[]
[start:sto *for positive step value N-1
p:step, *for negative step value N+1
start:stop all slicing shows selection of rows
:step] print(df)
* simple slicing applied only on row selections
print(df[0:2:1])
print(df.iloc[0:2:1])
print(df.loc['d':'b'])
all slicing shows selection of columns
print(df)
print(df.iloc[:,0:2:1])
print(df.loc[:,'x':'y'])
all slicing shows selection of rows & cols
print(df)
print(df.iloc[0:2:1,0:2:1])
print(df.loc['b':'c','X':'Y'])
Functions/Methods
* methods call with ‘.’ dot operator
*use brackets
head(): *Display first five(default) & depend on number first values of rows (row
display first selection)
5. import numpy as np
head(2): a=np.array([[1,'neha',43],[2,'karan',47],[3,'priya',39]])
first two student=pd.DataFrame(a,columns=['rno','name','marks'],index=['a','b','c'])
print(student)
print(student.head())
print(student.head(2))
print(student['name'].head(1))
print(student[0:2:1].head(1))
print(student.loc['a':'b','rno':'marks'].head(1))
tail() *Display last five value
tail(2) *Display last 2 values
student.tail()
student.tail(2)
count() *display total values exclude NaN values
import numpy as np
a=np.array([[1,'neha',43],[2,'karan',47],[3,'priya',39]])
student=pd.DataFrame(a,columns=['rno','name','marks'],index=['a','b','c'])
print(student)
print(student.count())
print(student['rno'].count())
print(student[['rno','marks']].count())
max() print(student.max())
min()
sum()
print(student.min())
print(student.sum())
Insert new row in a DataFrame
student.loc['d']=[4,'shipra',48]
print(student)
Insert new col in a DataFrame
student['grace']=student['marks']+2
print(student)
Insert new col at any place using insert() function in a DataFrame
student.insert(1,'class',['ix','x','ix'])
print(student)
append() *combine two DataFrame
import pandas as pd
df=pd.DataFrame([[1,2,3,4],[10,20,34,42]],index=['x','y'])
df1=pd.DataFrame([[5,6,7,3],[11,21,31,14]],index=['x','y'])
df3=df.append(df1)
print(df3)
df3=df.append(df1,ignore_index=True)
this command ignore df index and provide default index
sort_values() * Use to arrange data in ascending/descending order according to value
print(student)
student.sort_values('marks',inplace=True)
print(student) # for ascending order
student.sort_values(‘marks’,ascending=False, inplace=True)
# for descending order
Three ways to remove/delete data
drop(), * use to remove column
pop(),del student.pop('marks')
print(student)
del student['marks']
print(student)
student.drop('marks',axis=1,inplace=True)
print(student)
* use to remove row
student.drop(‘a’,axis=0,inplace=True)
print(student)
Rename(),reindex() functions to make changes indices
rename() df=pd.DataFrame(data=[[101,'Priya',30000,np.NaN],
[102,'Shipra',45000,np.NaN],[103,'Karan',40000,0]
columns=['Id','Name','Sal','Bonus'], index=['x','y','z'])
It changes the name of the column label or row index in a dataframe.
* axis 0 for rows and axis I for columns)
* It uses two arguments index(define in dictionary form) and axis (define 0 or 1)
For columns-:
Df=Df.rename({oldname:newname},axis=1)
Df=Df.rename(columns={oldname:newname})
Df.rename({oldname:newname},axis=1,inplace=True)
For rows-:
Df=Df.rename({oldname:newname},axis=0)
Df=Df.rename(index={oldname:newname})
Df.rename({oldname:newname},axis=0,inplace=True)
reindex() Change order of existing rows/columns
Create new rows/column labels
Delete rows/column label.
For rows
df=df.reindex(index=['y','z','x'])
df=df.reindex(['x','z','y'],axis=0)
df.reindex(index=[‘y’,'z'],inplace=True)
For columns
df=df.reindex(columns=['Name','Sal','Bonus','Id'])
df=df.reindex(['Name','Sal','Bonus','Id'],axis=1)
df.reindex(columns=['Name','Sal','Bonus','Id'],inplace=True)
Boolean Indexing
* it is use for index True/False
import pandas as pd
student=pd.DataFrame([[1,'neha',43],[2,'karan',47],[3,'priya',39]],columns=['rno','name','ma
rks'],index=[True,False,True])
print(student.loc[True])
Veriations of loc[]
student=pd.DataFrame([[1,'neha',43],[2,'karan',47],[3,'priya',39]],columns=['rno','name','marks'
],index=['a','b','c'])
# when loc contains single index label
print(student.loc['a'])
# when loc contains column label
print(student.loc[:,'name'])
print(student.iloc[0])
print(student.iloc[:,1])
# loc also used for conditional display
print(student.loc[student[‘marks’]>45],’rno’]
# loc used for add new row
student.loc[‘d’]=0
# for label slicing
Key Points
• DataFrame() is a function of pandas library.
• D & F letter always capital for DataFrame.
• DataFrame functions call with dot operator.
• Axis 0 for rows and axis 1 for columns.
Q1. Name any three attributes of DF.
Ans: size,shape,index
Q2. Name any two function.
Ans: head(),tail(),count()
Q3. Difference between attributes and functions.
Ans: attributes used without brackets.
Attributes show the properties of dataframe but functions operate on DF data.