KEMBAR78
Revision Notes DataFrame XII IP | PDF | Computer Programming | Software Engineering
0% found this document useful (0 votes)
6 views8 pages

Revision Notes DataFrame XII IP

Uploaded by

pearlkumbhat7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

Revision Notes DataFrame XII IP

Uploaded by

pearlkumbhat7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

PANDAS & DATAFRAME REVISION NOTES

Class XII
DataFrame() * 2-D : single column (rows/col collection)
* size mutable: after creation, addition of new data,row,col allowed
*Value Mutable: index values can change means Df[2:2]=5 or Df[2:2]+=4 it is
applicable.
*DataFrame output with index,column and value in tabular form.
Q1. Can we add more values in same DF?
Ans: Yes, size mutable.
Q4. Can we change values?
Ans: Yes, value mutable.
Creation
Empty S=pd.DataFrame()

Dictionary student=pd.DataFrame({‘rno’:[1,2,3],’name’:[‘neha’,’karan’,’priya’],’marks’:[43,4
with list 7,39]},index=[‘a’,’b’,’c’])

Dictionary student=pd.DataFrame({'rno':{'a':1,'b':2,'c':3},'name':{'a':'neha','b':'karan','c':'pri
with dict ya'}})
print(student)

Dictionary student=pd.DataFrame({'rno':pd.Series([1,2,3]),'name':pd.Series(['neha','karan','
with Series priya'])})
print(student)

List student=pd.DataFrame([[1,'neha',43],[2,'karan',47],[3,'priya',39]],columns=['rno'
,'name','marks'],index=['a','b','c'])
print(student)

Single student=pd.DataFrame(5,columns=['rno','name','marks'],index=['a','b','c'])
value print(student)

Numpy import numpy as np


array a=np.array([[1,'neha',43],[2,'karan',47],[3,'priya',39]])
student=pd. DataFrame(a, columns=['rno','name','marks'],index=['a','b','c'])
print(student)

Using s=pd.Series([1,2,3])
formula s1=pd.Series(['neha','karan','priya'])
student=pd.DataFrame({'rno':s+2,'name':s1+s1})
print(student)

* formula not possible with list: not work with vector operation
Operators with DataFrame (Arithmetic operators : +, - ,*, /, %, //,**) , (Relational : > < >= <=
==),Logical (and or not)
Vector * DataFrame operate with single value using operators (single value affects each
operation DataFrame value it is vector operation)
df*2 , df>3, df%2==0
df=pd.DataFrame([[4589,45000,44.89],[3500,56000,27.988],[4500,57000,1.865]],i
ndex=['delhi','bombay','chennai'],columns=['pop','avg','per'])

print(df)

print(df>3)

print(df['avg']>5000)

print(df[['avg','pop']]>5000)

*check all dataframe value


print(df[df>3])

*check all avg column value


print(df[df['avg']>50000])

*check all avg column value


print(df.loc[df['avg']>50000])

*check all avg column value & display avg only


print(df.loc[(df['avg']>50000),'avg'])

check all avg column value & display avg & per col
only print(df.loc[(df['avg']>50000),['avg','per']])

Binary * DataFrame operate with other DataFrame value using operators (matched index
Operation operate according to operator and unmatched index give NaN value) .
import pandas as pd
df1=pd.DataFrame([[4,4,9],[3,2.9,3],[4,5,1]],index=['d','b','c'],columns=['X','Y','Z']
)
df2=pd.DataFrame([[4,4,9],[3,2.9,5],[4,5,1]],index=['d','b','c'],columns=['X','Y','Z']
)

print(df1)
print(df2)
print(df1+df2)
print(df1>df2)

* for unmatched index Give NaN value

Attributes:- DataFrame have some properties called attributes.


It is used with DataFrame name and dot operator
Brackets not used
import pandas as pd
df=pd.DataFrame([[4,4,9],[3,2.9,3],[4,5,1]],index=['d','b','c'],columns=['X','Y','Z'])

print(df.index)
print(df.columns)

print(df.dtypes)

print(df.values)

print(df.shape)

print(df.size)

print(df.T)

print(df.axes)

print(df.empty)
print(df.ndim)
* index and columns attribute also used to assign or change the label index
print(df)

df.index=['m','n','k']
df.columns=['XX','YY','ZZ']
print(df)

Slicing to display particular part using [start:stop:step], iloc[],loc[]


[start:sto *for positive step value N-1
p:step, *for negative step value N+1
start:stop all slicing shows selection of rows
:step] print(df)

* simple slicing applied only on row selections


print(df[0:2:1])

print(df.iloc[0:2:1])

print(df.loc['d':'b'])
all slicing shows selection of columns

print(df)

print(df.iloc[:,0:2:1])

print(df.loc[:,'x':'y'])

all slicing shows selection of rows & cols

print(df)

print(df.iloc[0:2:1,0:2:1])

print(df.loc['b':'c','X':'Y'])

Functions/Methods
* methods call with ‘.’ dot operator
*use brackets

head(): *Display first five(default) & depend on number first values of rows (row
display first selection)
5. import numpy as np
head(2): a=np.array([[1,'neha',43],[2,'karan',47],[3,'priya',39]])
first two student=pd.DataFrame(a,columns=['rno','name','marks'],index=['a','b','c'])

print(student)

print(student.head())

print(student.head(2))

print(student['name'].head(1))

print(student[0:2:1].head(1))

print(student.loc['a':'b','rno':'marks'].head(1))

tail() *Display last five value


tail(2) *Display last 2 values
student.tail()

student.tail(2)
count() *display total values exclude NaN values
import numpy as np
a=np.array([[1,'neha',43],[2,'karan',47],[3,'priya',39]])
student=pd.DataFrame(a,columns=['rno','name','marks'],index=['a','b','c'])

print(student)

print(student.count())

print(student['rno'].count())

print(student[['rno','marks']].count())

max() print(student.max())
min()
sum()
print(student.min())

print(student.sum())

Insert new row in a DataFrame


student.loc['d']=[4,'shipra',48]
print(student)

Insert new col in a DataFrame


student['grace']=student['marks']+2
print(student)

Insert new col at any place using insert() function in a DataFrame


student.insert(1,'class',['ix','x','ix'])
print(student)

append() *combine two DataFrame


import pandas as pd
df=pd.DataFrame([[1,2,3,4],[10,20,34,42]],index=['x','y'])
df1=pd.DataFrame([[5,6,7,3],[11,21,31,14]],index=['x','y'])
df3=df.append(df1)
print(df3)

df3=df.append(df1,ignore_index=True)
this command ignore df index and provide default index
sort_values() * Use to arrange data in ascending/descending order according to value
print(student)

student.sort_values('marks',inplace=True)

print(student) # for ascending order

student.sort_values(‘marks’,ascending=False, inplace=True)
# for descending order

Three ways to remove/delete data


drop(), * use to remove column
pop(),del student.pop('marks')
print(student)

del student['marks']
print(student)

student.drop('marks',axis=1,inplace=True)
print(student)

* use to remove row

student.drop(‘a’,axis=0,inplace=True)
print(student)

Rename(),reindex() functions to make changes indices


rename() df=pd.DataFrame(data=[[101,'Priya',30000,np.NaN],
[102,'Shipra',45000,np.NaN],[103,'Karan',40000,0]
columns=['Id','Name','Sal','Bonus'], index=['x','y','z'])

It changes the name of the column label or row index in a dataframe.


* axis 0 for rows and axis I for columns)
* It uses two arguments index(define in dictionary form) and axis (define 0 or 1)
For columns-:
Df=Df.rename({oldname:newname},axis=1)
Df=Df.rename(columns={oldname:newname})
Df.rename({oldname:newname},axis=1,inplace=True)

For rows-:
Df=Df.rename({oldname:newname},axis=0)
Df=Df.rename(index={oldname:newname})
Df.rename({oldname:newname},axis=0,inplace=True)

reindex() Change order of existing rows/columns


Create new rows/column labels
Delete rows/column label.
For rows
df=df.reindex(index=['y','z','x'])
df=df.reindex(['x','z','y'],axis=0)
df.reindex(index=[‘y’,'z'],inplace=True)
For columns
df=df.reindex(columns=['Name','Sal','Bonus','Id'])
df=df.reindex(['Name','Sal','Bonus','Id'],axis=1)
df.reindex(columns=['Name','Sal','Bonus','Id'],inplace=True)

Boolean Indexing
* it is use for index True/False
import pandas as pd
student=pd.DataFrame([[1,'neha',43],[2,'karan',47],[3,'priya',39]],columns=['rno','name','ma
rks'],index=[True,False,True])
print(student.loc[True])

Veriations of loc[]

student=pd.DataFrame([[1,'neha',43],[2,'karan',47],[3,'priya',39]],columns=['rno','name','marks'
],index=['a','b','c'])
# when loc contains single index label
print(student.loc['a'])

# when loc contains column label


print(student.loc[:,'name'])

print(student.iloc[0])
print(student.iloc[:,1])

# loc also used for conditional display


print(student.loc[student[‘marks’]>45],’rno’]

# loc used for add new row


student.loc[‘d’]=0

# for label slicing

Key Points
• DataFrame() is a function of pandas library.
• D & F letter always capital for DataFrame.
• DataFrame functions call with dot operator.
• Axis 0 for rows and axis 1 for columns.

Q1. Name any three attributes of DF.


Ans: size,shape,index

Q2. Name any two function.


Ans: head(),tail(),count()

Q3. Difference between attributes and functions.


Ans: attributes used without brackets.
Attributes show the properties of dataframe but functions operate on DF data.

You might also like