0% found this document useful (0 votes)

9 views22 pages

XII IP Resource Material - DataFrame

ip resource material dataframe

Uploaded by

doodhwala894

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views22 pages

XII IP Resource Material - DataFrame

ip resource material dataframe

Uploaded by

doodhwala894

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Resource Material - Informatics Practices (XII )

Chapter 2 - Data Handling using Pandas – I

DATAFRAME

DATAFRAME-It is a two-dimensional object that is useful in representing data in the form of

rows and columns. It is similar to a spreadsheet or an SQL table. This is the most commonly
used pandas object. The data in DataFrame is aligned in a tabular fashion in rows and columns
therefore has both a row and column labels. Each column can have a different type of value
such as numeric, string, boolean, etc. Once we store the data into the Dataframe, we can perform
various operations that are useful in analyzing and understanding the data.

For example: ID

ID NAME DEPT SEX EXPERIENCE

101 JOHN ENT M 12
104 SMITH ORTHOPE M 5
DIC
107 GEORGE CARDIOL M 10
OGY
109 LARA SKIN F 3
113 GEORGE MEDICINE F 9
115 JOHNSON ORTHOPE M 10
DIC
The above table describes data of doctors in the form of rows and columns. Here vertical subset
are columns and horizontal subsets are rows. The column labels are Id, Name, Dept, Sex and
Experience and row labels are 101, 104, 107, 109, 113 and 115.
Properties of DataFrame
1. A Dataframe has axes (indices)-
● Row Index (axis=0)
● Column Index (axis=1)
2. It is similar to a spreadsheet, whose row index is called index and column index is called
column name.
3. A Dataframe contains Heterogeneous data.
4. A Dataframe Size is Mutable.
5. A Dataframe Data is Mutable
Difference between DataFrames and Numpy Array
The basic difference between a Numpy array and a Dataframe is that Numpy array contains
homogenous data while Dataframe contains heterogenous data.

A data frame can be created using any of the following-

1. Series
2. Lists
3. Dictionary
4. A numpy 2D array

SYNTAX:
import pandas as pd

pd.DataFrame( data, index, column)

where
data: takes various forms like series, list, constants/scalar values, dictionary, another
dataframe.
index: specifies index/row labels to be used for resulting frame. They are unique and hashable
with same length as data. Default is np.arrange(n) if no index is passed.
column: specifies column labels to be used for resulting frame. They are unique and hashable
with same length as data. Default is np.arrange(n) if no index is passed.

Creating an Empty Dataframe

An empty dataframe can be created as follows:

EXAMPLE 1
import pandas as pd
dfempty = pd.DataFrame()
print (dfempty)

OUTPUT:

Empty DataFrame
Columns: []
Index: []

Creating Dataframe from List

import pandas as pd
list1=[10,20,30,40,50]
df = pd.DataFrame(list1)
print(df)

Output
0
0 10
1 20
2 30
3 40
4 50
How to create Dataframe From Series
Program-

import pandas as pd
s = pd.Series(['a','b','c','d']) When a single series is used to create a
df=pd.DataFrame(s) dataframe, the elements of the series
print(df) become the elements of the column in a
dataframe
Output-
0
0 a
1 b Default Column Name As 0
2 c
3 d

Creating Dataframe from another Series

import pandas as pd
s=pd.Series([1,2,3,4,5])
s1=pd.Series([10,20,30,40,50])
df = pd.DataFrame([s,s1])
When two or more series are provided as an
print(df)
argument to a dataframe, the elements of the
Output series shall become the rows in the resultant
0 1 2 3 4 dataframe.
0 1 2 3 4 5
1 10 20 30 40 50

Creating a DataFrame from Dictionary of Series

import pandas as pd
s=pd.Series([1,2,3,4,5])
s1=pd.Series([60,80,50,89,85])
df = pd.DataFrame({‘Roll No':s,‘Marks':s1})
print(df)
Dictionary of Series can be passed to form a
Output
DataFrame. The resultant index is the union
Roll No Marks of all the series indexes passed.
0 1 60
1 2 80
2 3 50
3 4 89 Here, the dictionary keys are treated as
4 5 85 column labels and row labels take default
values starting from zero. The values
Creating a DataFrame from List of Dictionaries corresponding to each key are treated as
rows. The number of rows is equal to the
import pandas as pd number of dictionaries present in the list.
dic=[{'Name':'Rajat','Sname':'Sehgal'}, There are two rows in the above dataframe
{'Name':'Rajesh','Sname':'Dabur'}, {'Name':'Rishi}] as there are three dictionaries in the list. In
df = pd.DataFrame(dic) the Third row the value corresponding to
print(df) key Sname is NaN because Sname key is
missing in the Third dictionary.
Output
Name Sname
0 Rajat Sehgal
1 Rajesh Dabur
2 Rishi NaN

Creating a DataFrame from List of Lists

import pandas as pd
l=[[101,'Rajat'],[102,'Rajesh'],[103,'Rishi'],[104,'Sanjay']]
df = pd.DataFrame(l, columns=['Rollno','Name'])
print(df)

Output
Rollno Name
0 101 Rajat
1 102 Rajesh
df = pd.DataFrame(l,index=[‘I’,’II’,’III’,’IV’],
2 103 Rishi
3 104 Sanjay columns=['Rollno','Name'])

Creation of DataFrame from Dictionary of Lists

import pandas as pd
d1={"Rollno":[1,2,3], "Total":[350.5,400,420], "Percentage":[70,80,84]}
df1=pd.DataFrame(d1)
print(df1)

OUTPUT Here, the dictionary keys are treated as column labels and row
Rollno Total Percentage labels take default values starting from zero.
0 1 350.5 70
1 2 400.0 80
2 3 420.0 84

Creation of DataFrame from NumPy ndarrays

# importiong the modules
import pandas as pd
import numpy as np

# creating the Numpy array

array = np.array([[1, 1, 1], [2, 4, 8], [3, 9, 27], [4, 16, 64], [5, 25, 125],
[6, 36, 216], [7, 49, 343]])

# creating a list of index names

index_values = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh']

# creating a list of column names

column_values = ['number', 'squares', 'cubes']

# creating the dataframe

df = pd.DataFrame(data = array, index = index_values, columns = column_values)
# displaying the dataframe
print(df)
Output

DATAFRAME ATTRIBUTES
The dataframe attribute is defined as any information related to the dataframe object such as
size, datatype. etc. Below are some of the attributes about the dataframe object (Consider the
dataframe df1 defined below for all the examples):

Rollno Total Percentage

0 1 350.5 70
1 2 400.0 80
2 3 420.0 84
3 4 356.0 80
4 5 434.0 87
5 6 398.0 79

1. df1.size
Return an int representing the number of elements in given dataframe.
print(df1.size)

OUTPUT
18
(6 rows X 3 columns =18)

2. df1.shape
Return a tuple representing the dimensions of the DataFrame.
print(df1.shape)

OUPUT
(6, 3)

3. df1.axes
Return a list representing the axes of the DataFrame.
print(df1.axes)

OUTPUT
[Int64Index([1, 2, 3, 4, 5, 6], dtype='int64'), Index(['Rollno', 'Total', 'Percentage'],
dtype='object')]

4. df1.ndim
Return an int representing the number of axes / array dimensions
print(df1.ndim)
OUTPUT
2

5. df1.columns
The column labels of the DataFrame
print(df1.columns)
OUTPUT
Index(['Rollno', 'Total', 'Percentage'], dtype='object')

6. df1.values
Return a Numpy representation of the DataFrame.
print(df1.values)
OUTPUT
[[ 1. 350.5 70. ]
[ 2. 400. 80. ]
[ 3. 420. 84. ]
[ 4. 356. 80. ]
[ 5. 434. 87. ]
[ 6. 398. 79. ]]

7. df1.empty
Indicator whether DataFrame is empty.
print(df1.empty)
OUTPUT:
False

8. df1.index
In pandas.DataFrame the row labels are called indexes, If you want to get index labels
separately then we can use pandas.DataFrame “index” attribute.
print(df1.index)
OUTPUT:
Index([1,2,3,4,5,6])
ROW/COLUMN OPERATIONS

SELECTING A PARTICULAR COLUMN

To access the columns data, we can mention the column name as subscript.
e.g.- df[empid]. This can also be done by using df.empid. To access multiple columns we can
wite as df[ [col1, col2,---] ]

Example:

import pandas as pd
dict={'BS':[80,98,100,65,72],'ACC':[88,67,93,50,90],
'ECO':[100,75,89,40,96],'IP':[100,98,92,80,86]}
df5=pd.DataFrame(dict,index=['Ammu','Achu','Manu','Anu','Abu'])
print(df5)
Output:

Note: Now if we want to select/display a particular column empid then we will write it as
follows:

 Selecting / Accessing a column

Syntax :
<dataframe object>[<column name>] Or <dataframe object>.<column name>
 In the dot notation make sure not to put any quotation marks around the column
name.
print(df5.BS)
or
print(df5['BS'])

 Selecting / Accessing multiple columns

Syntax :
<dataframe object>[[<column name>,<column name>,…….]]
 Columns appear in the order of column names given in the list inside square
brackets.
print(df5[['BS','IP']])
 Selecting / Accessing a subset from a DataFrame using Row/Column names
<dataframe object>.loc[<start row>:<end row>,<start column>:<end column>]

 To access a row:
<dataframe object>.loc[<row label>, : ]
Note: Make sure not to miss the colon after comma

 To access multiple rows:

<dataframe object>.loc[<start row>:<end row> , : ]
 Python will return all rows falling between start row and end row; along with start
row and end row.

print(df5.loc['Ammu':'Manu', : ])

Note: Make sure not to miss the colon after comma.

 To access selective columns:
<dataframe object>.loc[ : , <start column> : <end column>]
 Lists all columns falling between start and end column.
print(df5.loc[:,'ACC':'IP'])

Note: Make sure not to miss the colon before comma.

 To access range of columns from a range of rows:

<dataframe object>.loc[<start row> : <end row>,
<start column> : <end column>]
print(df5.loc['Manu':'Abu','ACC':'ECO'])

 Selecting / Accessing a subset from a DataFrame using Row/Column numeric

Index/position
Sometimes our dataframe object does not contain row or column labels or even we may
not remember, then to extract subset from dataframe we can use iloc.

<dataframe object>.iloc[<start row index> : <end row index>, [<start column index>
: <end column index>]

When we use iloc, then end index is excluded.

print(df5.iloc[1:3,1:3])

 Selecting / Accessing individual value

(i) Either give name of row or numeric index in square bracket of column name
<dataframe object>.<column>[<row name or row numeric index>]
print(df5.ACC['Achu']) 67
or
print(df5.ACC[1])

 Assigning / Modifying Data Values in DataFrame

 To change or add a column
<dataframe object>[<column name>]=<new value>
 If the given column name does not exist in
dataframe then a new column with the name is
added.
df5['ENG']=60
print(df5)
 If you want to add a column that has different values for all its rows, then we
can assign the data values for each row of the column in the form of a list.
df5[‘ENG’]=[50,60,40,30,70]

 There are some other ways for adding a column to a database.

<dataframe object>.loc[ : ,<column name>]=value
df5.loc[ : ,'ENG']=60
(df5)

 To change or add a row

<dataframe object>.loc[rowname , : ]=value
df5.loc['Sabu', : ]=50
print(df5)
 If there is no row with such row label,
then adds new row with this row label and
assigns given values to all its columns.

 To change or modify a single data value

<dataframe object>.<column>[<row label or row index>] = value
df5.BS['Ammu']=100
print(df5)
or
df5.BS[0]=100
print(df5)
 Deleting columns in DataFrame
 We can use del statement, to delete a column
del <dataframeobject>[<column name>]
e.g.: del df5[‘ENG’]
 We can use drop() also to delete a column. By default axis=0.
<dataframe object> = <dataframeobject>.drop([<columnname or index>],axis=1)
Or
<dataframe object> = <dataframeobject>.drop(columns=[<columnnames or
indices>])
df5=df5.drop([‘ECO’], axis =1)
df5=df5.drop(columns=['ECO','IP'])
 We can use pop() to delete a column. The deleted column will be returned as Series
object.
bstud=df5.pop(‘BS’)
print(bstud)
 Deleting rows in DataFrame
<dataframe object>=<dataframe object>.drop([index or sequence of index], axis=0)
df5=df5.drop(['Ammu','Achu'])
or
df5=df5.drop(index=['Ammu','Achu'])

Iterating over a DataFrame

 Using pandas.iterrows() Function
 The method <DF>.iterrows() views a dataframe in the form of horizontal subset ie
row-wise.
 Each horizontal subset is in the form of (rowindex, Series) where Series contains
all column values for that row –index.
 We can iterate over a Series object just as we iterate over other sequences.

import pandas as pd
dict={'BS':[80,98],'ACC':[88,67]}
df5=pd.DataFrame(dict,index=['Ammu','Achu'])
print(df5,"\n")
for (row,rowseries) in df5.iterrows():
print("Row index:",row)
print("containing")
i=0
for val in rowseries:
print("At position ",i,":",val)
i=i+1
print()

 Using pandas.iteritems() Function

 The method <DF>.iteritem() views a dataframe in the form of vertical subset ie
column-wise.
 Each vertical subset is in the form of (col-index, Series) where Series contains all
row values for that column index.
import pandas as pd
dict={'BS':[80,98],'ACC':[88,67]}
df5=pd.DataFrame(dict,index=['Ammu','Achu'])
print(df5,"\n")
for (column,columnseries) in df5.iteritems():
print("Column index:",column)
print("containing")
i=0
for val in columnseries:
print("At row ",i,":",val)
i=i+1
print()

 Head and Tail Functions

 head()
<DF>.head([n=5])
 To retrieve 5, top rows of a dataframe.
 We can change the number of rows by specifying value for n.
df5.head(5)
df5.head(2)
 tail()
 To retrieve 5, bottom rows of a dataframe.
 We can change the number of rows by specifying value for n.
df5.tail(5)
df5.tail(2)

 Renaming index / column labels

 rename() renames the existing index or column labels in a dataframe/series.
 The old and new index/column labels are to be provided in the form of a
dictionary where keys are the old indexes/row labels and the values are the new
names for the same.
Syntax:
<DF>.rename(index=None, columns=None, inplace=False)
where index and columns are dictionary like.
inplace, a boolean by default False (which returns a new dataframe with
renamed index/labels).
If True then changes are made in the current dataframe.

import pandas as pd
dict={'p_id':[101,102],'p_name':['Hard disk','Pen Drive']}
df=pd.DataFrame(dict)
print(df,"\n")

df.rename(columns={'p_id':'Product_ID','p_name':'product_name'},inplace=True)
or
df=df.rename(columns={'p_id':'Product_ID','p_name':'product_name'})
print(df)
 Columns can also be renamed by using the columns attribute of dataframe.
import pandas as pd
dict={'p_id':[101,102],'p_name':['Hard disk','Pen Drive']}
df=pd.DataFrame(dict)
df.columns=['Product_ID','product_name']
print(df,"\n")

 Boolean indexing
 Like default indexing (0,1,2…) or labeled
indexing , there is one more way to index –
Boolean Indexing (Setting row index to True/
False etc.) .
 This helps in displaying the rows of Data Frame,
according to True or False as specified in the
command.
import pandas as pd
dict={'p_id':[101,102,103],'p_name':['Hard disk','Pen Drive','Camera']}
df=pd.DataFrame(dict)
df.index=[True,False,True]
print(df,"\n")
print(df.loc[True])

Questions / Answers
1. Give the output:
import pandas as pd
Dic={‘empno’:[101,102,103,104,105,106],’grade’:[‘a’,’b’,’a’,’c’,’b’,’c’] , ’dept’:
[‘sales’,’pur’,’mar’,’sales’,’pur’,’mar’]}
df=pd.DataFrame(Dic)
print(df.head(3))
Output:
empno grade dept
0 101 a sales
1 102 b pur
2 103 a mar

2. What will be the output of df.iloc[3:7,3:6]?

Answer:
It will display the rows with index 3 to 6 and columns with index 3 to 5 in a dataframe
‘df’.
3. Write a python program to create a data frame with headings (CS and IP) from the
list given below-
[[79,92][86,96],[85,91],[80,99]]
Answer:
l=[[10,20],[20,30],[30,40]]
df=pd.DataFrame(l,columns=['CS','IP'])
print(df)
4. Write python statement to delete the 3rd and 5th rows from dataframe df.
Answer:
df1=df.drop(index=[2,4],axis=0)
or
df1=df.drop([2,4])

5. Carefully observe the following code:

import pandas as pd
Year1={'Q1':5000,'Q2':8000,'Q3':12000,'Q4': 18000}
Year2={'A' :13000,'B':14000,'C':12000}
totSales={1:Year1,2:Year2}
df=pd.DataFrame(totSales)
print(df)
Answer the following:
i. List the index of the DataFrame df
ii. List the column names of DataFrame df.
Answer: i. The index labels of df will include Q1,Q2,Q3,Q4,A,B,C
ii. The column names of df will be: 1,2
6. Consider the given DataFrame ‘Stock’:
Name Price
0 Nancy Drew 150
1 Hardy boys 180
2 Diary of a wimpy kid 225
3 Harry Potter 500
Write suitable Python statements for the following:
a) Add a column called Special_Price with the following data: [135,150,200,440].
b) Add a new book named ‘The Secret' having price 800.
c) Remove the column Special_Price.
Answer:
a) Stock['Special_Price']=[135,150,200,400]
b) Stock.loc['4']=['The Secret',800]
c) Stock=Stock.drop('Special_Price',axis=1)
7. Mr. Som, a data analyst has designed the DataFrame df that contains data about
Computer Olympiad with ‘CO1’, ‘CO2’, ‘CO3’, ‘CO4’, ‘CO5’ as indexes shown
below. Answer the following questions:

A. Predict the output of the following python statement: i. df.shape ii. df[2:4]
B. Write Python statement to display the data of Topper column of indexes CO2 to CO4.
C. Write Python statement to compute and display the difference of data of Tot_students
column and First_Runnerup column of the above given DataFrame.

Answer:
A: i. (5,4)
ii. School tot_students Topper First_Runner_up
CO3 GPS 20 18 2
CO4 MPS 18 10 8
B. print(df.loc['CO2': 'CO4', 'Topper'])
C. print(df.Tot_students-df.First_Runnerup)
8. Write a Python code to create a DataFrame with appropriate column headings from
the list given below:
[[101,'Gurman',98],[102,'Rajveer',95],[103,'Samar' ,96],[104,'Yuvraj',88]]
Answer:

import pandas as pd
data=[[101,'Gurman',98],[102,'Rajveer',95],[103,'Samar' ,96], [104,'Yuvraj',88]]
df=pd.DataFrame(data,columns=['Rno','Name', 'Marks'])

Case study questions:

1. Consider the following DataFrame df and answer questions

I. Write code to delete column B

II. Write the output of the below code
print(df.tail(2))
III. Write code to delete row salary
IV. Change the value of column A to 100
V. Change the value of DEPT of B to MECH
VI. Display DEPT and SALARY of column A and B
VII. Write code to rename column ‘A’ to ‘D’ which will not effect original
dataframe
VIII. Write code to add a column E with values [CS, 104,XYZ, 300000]
IX. Write code to add a row COMM with values [3000,4000,5000]
X. Write code to rename DEPT to DEPARTMENT which will effect the original
dataframe
XI. Write code to display DEPT in A
a) print(df.A[‘DEPT’])
b) print(df[‘A’,’DEPT’])
c) print(df.iloc[1:2,1:2])
XII. Write the output of the statement print(len(df))
i. 3 ii. 4 iii. (4,3) iv. (3,4)

Answer:
i. del df['A']
ii. A B C
ENAME ABC PQR LMN
SALARY 200000 100000 20000
iii. df=df.drop(['SALARY'],axis=0)
iv. df['A']=100
v. df.B['DEPT']='MECH'
vi. print(df.loc[['DEPT','SALARY'],["A","B"]])
vii. df.rename(columns={"A":"D"},inplace=False)
viii.df['E']=["CS",104,"XYZ",300000]
ix. df.loc['COMM']=[3000,4000,5000]
x. df.rename(index={"DEPT":"DEPARTMENT"},inplace=True)
xi. print(df.A[‘DEPT’])
xii. 4

2. Consider the following Data Frame df and answer questions

I. Display details of city delhi and chennai
II. Display hospitals in delhi
III. Display shape of dataframe
IV. Change the population in kolkatta as 50
V. Rename the column population as “pop”

Answers:
I. print(df[['delhi','chennai']])
II. print(df.delhi['hospitals'])
III. print(df.shape)
IV. df.kolkatta['population']=50
V. df.rename(index={"population":"pop"},inplace=True)

3. Consider the following Data Frame df and answer questions

I. Display the name of city whose population >=20 range of 12 to 20

II. Write command to set all vales of df as 0

III. Display the df with rows in the reverse order

IV. Display the df with only columns in the reverse order
V. Display the df with rows & columns in the reverse order
Answer:

I. print(df[df.population>=20])
I. df[:]=0
II. print(df.iloc[::-1)
III. print(df.iloc[:,::-1])
IV. print(df.iloc[::-1,::-1])

4. What are the purpose of following statements-

I. 1.df.columns
II. 2. df.iloc[ : , :-5]
III. 3. df[2:8]
IV. df[ :]
V. df.iloc[ : -4 , : ]

Answers:
1. It displays the names of columns of the Dataframe.
2. It will display all columns except the last 5 columns.
3. It displays all columns with row index 2 to 7.
4. It will display entire dataframe with all rows and columns.
5. It will display all rows except the last 4 four rows

5. Mr. Ankit is working in an organisation as data analyst. He uses Python Pandas and
Matplotlib for the same. He got a dataset of the passengers for the year 2010 to 2012
for January, March and December. His manager wants certain information from him,
but he is facing some problems. Help him by answering few questions given below:

Code to create the above data frame:

import pandas as ____________ #Statement 1
data={"Year":[2010,2010,2012,2010,2012],"Month":["Jan","Mar","Jan","Dec","Dec"]
,"Passengers":[25,50,35,55,65]}
df=pd.____________________(data) #Statement 2
print(df)

1. Choose the right code from the following for statement 1.

i. pd ii. df iii. data iv. P
Answer: i. pd
2. Choose the right code from the following for the statement 2.
i. Dataframe ii. DataFrame iii. Series iv Dictionary
Answer: ii. DataFrame
3. Choose the correct statement/ method for the required output: (5,3)
i. df.index ii. df.shape() iii. df.shape iv. df.size
Answer: (iii) df.shape

4. He wants to print the details of "January" month along with the number of
passengers, Identify the correct statement:

a) df.loc[['Month','Passengers']][df['Month']=='Jan']
b) df[['Month','Passengers']][df['Month']=='Jan']
c) df.iloc[['Month','Passengers']][df['Month']=='Jan']
d) df(['Month','Passengers']][df['Month']=='Jan')

Answer: (a) df[['Month','Passengers']][df['Month']=='Jan']

5. Sanyukta is the event incharge in a school. One of her students gave her a
suggestion to use Python Pandas andMatplotlib for analysing and visualising the
data, respectively. She has created a Data frame “SportsDay” to keeptrack of the
number of First, Second and Third prizes won by different houses in various
events.

Write Python commands to do the following:

I. Display the house names where the number of Second Prizes are in the range of
12 to 20.

a. df['Name'][(df['Second']>=12) and (df['Second']<=20)]

b. df[Name][(df['Second']>=12) & (df['Second']<=20)]
c. df['Name'][(df['Second']>=12) & (df['Second']<=20)]
d. df[(df['Second']>=12) & (df['Second']<=20)]
Answer: c. df['Name'][(df['Second']>=12) & (df['Second']<=20)]
II. Display all the records in the reverse order.
a. print(df[::1]) b. print(df.iloc[::-1]) c. print(df[-1:]+df[:-1]) d. print(df.reverse())
Answer: b. print(df.iloc[::-1])
iii. Display the bottom 3 records.
a. df.last(3) b. df.bottom(3) c. df.next(3) d. df.tail(3)
Answer: d. df.tail(3)
iv. Choose the correct output for the given statements:
x=df.columns[:1]
print(x)
a. 0 b. Name c. First d. Error
Answer: b. Name
v. Which command will give the output 24:
a. print(df.size) b. print(df.shape) c. print(df.index) d. print(df.axes)
Answer: a. df.size

Multiple Choice Questions

1. Mr. Ankit wants to change the index of the Data Frame and the output for the
same is given below. Identify the correct statement to change the index

a) df.index[]=["Air India","Indigo","Spicejet","Jet","Emirates"]
b) df.index["Air India","Indigo","Spicejet","Jet","Emirates"]
c) df.index=["Air India","Indigo","Spicejet","Jet","Emirates"]
d) df.index()=["Air India","Indigo","Spicejet","Jet","Emirates"]
Answer: (c) df.index=["Air India","Indigo","Spicejet","Jet","Emirates"]
6. To display the 3rd, 4th and 5th columns from the 6th to 9th rows of a dataframe you
can write
a) DF.loc[6:9, 3:5] b) DF.loc[6:10, 3:6] c) DF.iloc[6:10, 3:6] d) DF.iloc[6:9,
3:5]
Answer: c) DF.iloc[6:10, 3:6]
8. We can add a new row to a DataFrame using the _____________ method
(i) rloc[ ] (ii) loc[ ] (iii)iloc[ ] (iv)None of the above
Answer: (ii) loc[ ]
9. The head() function of dataframe will display how may rows from top if no
parameter is passed.
(i) 1 (ii) 3 (iii) 5 (iv) None of these
Answer : (iii) 5
10. Which function is used to find values from a DataFrame D using the index number?
a) D.loc b) D.iloc c) D.index d) None of these
Answer: b) D.iloc
11. In a DataFrame, Axis= 0 represents the elements
a.rows b.columns c.both d.None of these.
Answer: a.rows
12. In DataFrame, by default new column added as the _____________ column
(i) First (Left Side) (ii) Second (iii)Last (Right Side) (iv) Any where in
dataframe
Answer: (iii)Last (Right Side)
13. Which of the following is correct Features of DataFrame?
a. Potentially columns are of different types
b. Can Perform Arithmetic operations on rows and columns
c. Labeled axes (rows and columns)
d. All of the above
Answer: d. All of the above
14. When we create DataFrame from List of Dictionaries, then number of columns in
DataFrame isequal to the _______
a. maximum number of keys in first dictionary of the list
b. maximum number of different keys in all dictionaries of the list
c. maximum number of dictionaries in the list
d. None of the above
Answer: b. maximum number of different keys in all dictionaries of the list
15. When we create DataFrame from List of Dictionaries, then dictionary keys will
become ______
(i) Column labels (ii) Row labels (iii) Both of the above (iv) None of the above
Answer: (i) Column labels
16. Which method is used to access vertical subset of a dataframe?
(i) iterrows() (ii) iteritems() (iii) itercolumns() (iv) itercols()
Answer: (ii) iteritems()
17. Write statement to transpose dataframe DF.
(i) DF.t (ii) DF.transpose (iii)DF.T (iv)DF.T( )
Answer: (iii)DF.T
18. In DataFrame, by default new column added as the _____________ column
a. First (Left Side) b. Second c. Last (Right Side) d. Any where in dataframe
ANS: Last (Right Side)
19. We can add a new row to a DataFrame using the _____________ method
(i) rloc[ ] (ii) loc[ ] (iii) iloc[ ] (iv) None of the above
ANS: (ii) loc[ ]
20. Which among the following options can be used to create a DataFrame in Pandas ?
(a) A scalar value (b) An ndarray (c) A python dict (d) All of these
ANS:- (d) All of these
21. Write short code to show the information having city=”Delhi” from dataframe
SHOP.
(a) print(SHOP[City==’Delhi’]) (b) print(SHOP[SHOP.City==’Delhi’])
(c) print(SHOP[SHOP.’City’==’Delhi’]) (d) print(SHOP[SHOP[City]==’Delhi’])
ANS: (b) print(SHOP[SHOP.City==’Delhi’])
22. Which of the following commands is used to install pandas?
(i)pip install python –pandas (ii)pip install pandas (iii)python install python
(iv)python install pandas
ANS: (ii) pip install pandas
23. Which attribute of a dataframe is used to get number of axis?
a.T b.Ndim c.Empty d.Shape
ANS: b.Ndim
24. Display first row of dataframe ‘DF’
(i) print(DF.head(1)) (ii) print(DF[0 : 1]) (iii)print(DF.iloc[0 : 1]) (iv)All of the above
ANS: (iv)All of the above
25. To delete a column from a DataFrame, you may use statement.
(a) remove (b) del (c) drop (d) cancel statement.
ANS:- (b) del & (c) drop
26. In given code dataframe ‘Df1’ has ________ rows and _______ columns
import pandas as pd
dict= [{‘a’:10, ‘b’:20}, {‘a’:5, ‘b’:10, ‘c’:20},{‘a’:7, ‘d’:10, ‘e’:20}]
Df1 = pd.DataFrame(dict)
(i) 3, 3 (ii) 3, 4 (iii)3, 5 (iv)None of the above
ANS: (iii)3, 5
27. In the following statement, if column ‘mark’ already exists in the DataFrame ‘Df1’
then the assignment statement will __________ Df1['mark'] = [95,98,100] #There
are only three rows in DataFrame Df1
(i) Return error (ii) Replace the already existing values.
(iii)Add new column (iv)None of the above
ANS: (ii) Replace the already existing values.
28. Which of the following statement is false:
i. DataFrame is size mutable ii. DataFrame is value mutable
iii. DataFrame is immutable iv. DataFrame is capable of holding multiple types of data
ANS:- iii. DataFrame is immutable
29. To delete a row, the parameter axis of function drop( ) is assigned the value
______________
(i) 0 (ii) 1 (iii) 2 (iv) 3
ANS: (i) 0
30. Write code to delete rows those getting 5000 salary.
(a) df=df.drop[salary==5000] (b) df=df[df.salary!=5000]
(c) df.drop[df.salary==5000,axis=0] (d) df=df.drop[salary!=5000]
ANS: (b) df=df[df.salary!=5000]
31. DF1.loc[ ] method is used to ______ # DF1 is a DataFrame
(i) Add new row in a DataFrame ‘DF1’ (ii) To change the data values of a row to a
particular value (iii)Both of the above (iv)None of the above
ANS: (iii)Both of the above
32. To iterate over horizontal subsets of dataframe,
(a) iterate( ) (b) iterrows( ) function may be used. (c) itercols( ) (d) iteritems( )
ANS:- (b) iterrows( ) function may be used.
33. Write code to delete the row whose index value is A1 from dataframe df.
(a) df=df.drop(‘A1’) (b) df=df.drop(index=‘A1’) (c) df=df.drop(‘A1,axis=index’)
(d) df=df.del(‘A1’)
ANS: (a) df=df.drop(‘A1’)
34. A two-dimension labeled array that is an ordered collection of columns to store
heterogeneous data type is
i. Series ii. Numpy array iii.Dataframe iv. Panel
ANS:- iii. Dataframe
35. In Pandas _______________ is used to store data in multiple columns.
(i)Series (ii) DataFrame (iii) Both of the above (iv) None of the above
ANS: (ii) DataFrame
36. What is dataframe?
a. 2 D array with heterogeneous data b. 1 D array with homogeneous data
c. 2 D array with homogeneous data d. 1 D array with heterogeneous data
ANS: a. 2 D array with heterogeneous data
37. In a DataFrame, Axis= 1 represents the_____________ elements
(a) Row (b) Column (c) True (d) False
ANS: (b) Column
38. Which of the following is not an attribute of a DataFrame Object ?
a. index b. Index c. size d. value
ANS: b. Index
39. To get top 5 rows of a dataframe, you may use
(a) head( ) (b) head(5) (c) top( ) (d) top(5)
ANS:- (a) head( ) , b) head(5)
40. In a DataFrame, Axis= 1 represents the_____________ elements
(a) Row (b) Column (c) True (d) False
ANS: (b) Column
41. NaN stands for:
a. Not a Number b. None and None c. Null and Null d. None a Number
ANS: a. Not a Number
42. The following code create a dataframe named ‘Df1’ with _______________
columns.
import pandas as pd
Df1 = pd.DataFrame([10,20,30] )
(i) 1 (ii) 2 (iii) 3 (iv) 4
ANS: (i) 1
43. Write the single line command to delete the column “marks” from dataframe df
using drop function.
(a) df=df.drop(col=‘marks’) (b) df=df.drop(‘marks’,axis=col)
(c) df=df.drop(‘marks’,axis=0) (d) df=df.drop(‘marks’,axis=1)
ANS: (d) df=df.drop(‘marks’,axis=1)
44. The following statement will _________
df = df.drop(['Name', 'Class', 'Rollno'], axis = 1) #df is a DataFrame object
a. delete three columns having labels ‘Name’, ‘Class’ and ‘Rollno’
b. delete three rows having labels ‘Name’, ‘Class’ and ‘Rollno’
c. delete any three columns
d. return error
ANS:- a. delete three columns having labels ‘Name’, ‘Class’ and ‘Rollno’
45. Difference between loc() and iloc().
a. Both are Label indexed based functions.
b. Both are Integer position-based functions.
c. loc() is label based function and iloc() integer position based function.
d. loc() is integer position based function and iloc() index position based function.
ANS: c. loc() is label based function and iloc() integer position based function.
46. Which command will be used to delete 3 and 5 rows of the data frame. Assuming
the data frame name as DF.
a. DF.drop([2,4],axis=0) b. DF.drop([2,4],axis=1) c. DF.drop([3,5],axis=1) d. DF.drop([3,5])
ANS: a DF.drop([2,4],axis=0)
47. Assuming the given structure, which command will give us the given output:
Output Required: (3,4)

a. print(df.shape()) b. print(df.shape) c. print(df.size) d. print(df.size()).

ANS: b. print(df.shape)
48. Write the output of the given command: df1.loc[:0,'Name'] Consider the given
dataframe.
EmpCode Name Desig
0 1405 VINAY Clerk
1 1985 MANISH Works Manager
2 1636 SMINA Sales Manager
3 1689 RINU Clerk
a. 0 1405 VINAY Clerk b. VINAY c. Works Manager d. Clerk
ANS : VINAY
49. Which of the following can be used to specify the data while creating a DataFrame?
i. Series ii. List of Dictionaries iii. Structured ndarray iv. All of these
ANS; iv All of Above

ASSERTION AND REASONING based questions. Mark the correct choice as

a) Both A and R are true and R is the correct explanation for A
b) Both A and R are true and R is not the correct explanation for A
c) A is True but R is False
d) A is false but R is True
1. Assertion (A):- DataFrame has both a row and column index.
Reasoning (R): - A DataFrame is a two-dimensional labelled data structure like a table
of MySQL.
Answer: a
2. Assertion (A): The rename function of Data Frame does not rename the columns of the
original data frame, but instead returns a dataframe with updated column names.
Reasoning (R): Default value of inplace parameter in rename function is False.
Answer: a
3. Assertion (A): loc is used to extract a subset of a data frame.
Reasoning (R): Transpose of a dataframe df can be obtained using df.T
Answer: b
4. Assertion (A): DataFrame has both a row and column index.
Reasoning (R): .loc() is a label based data selecting method to select a specific row(s) or
column(s) which we want to select.
Answer: a
5. Assertion (A): When DataFrame is created by using Dictionary, keys of dictionary are set
as columns of DataFrame.
Reasoning (R):- Boolean Indexing helps us to select the data from the DataFrames using
a boolean vector.
Answer: b
6. Assertion (A):- While creating a dataframe with a nested or 2D dictionary, Python
interprets the outer dict keys as the columns and the inner keys as the row indices.
Reasoning (R):- A column can be deleted using remove command
Answer: (c)
7. Assertion (A) : Pandas is an open source Python library which offers high performance,
easy-to-use data structures and data analysis tools.
Reason (R) : Professionals and developers are using the pandas library in data science
and machine learning.
Answer: (a)
8. ASSERTION(A):drop() function removes data from a Dataframe temporarily.
REASONING(R): Axis parameter is compulsory with drop() function.
Answer(a)
9. Assertion(A): In python pandas at attribute is to select or access multiple values from
data frame.
Reasoning(R): In python pandas, loc attribute is used to select or access a single/multiple
value(s) from dataframe.
Answer (d)
10.Assertion (A): Nidhi has create dataframe Df1

She can expand or delete any row /column in this dataframe.

Reasoning(R): In python DataFrame objects can be concatenated or merged
Answer: (a)
11.Assertion (A): Boolean indexing is a type of indexing.
Reasoning (R) : DataFrame.loc(False) function can be used to find the relative values
where index value is False
Answer: (a)
12.Assertion (A) : import pandas as pd is used to import pandas library.
Reason (R) : It is a python library so it is to be imported for using its function.
Answer: (a)

12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
100% (1)
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
24 pages
Class Xii Ip Ch-2 Dataframes
No ratings yet
Class Xii Ip Ch-2 Dataframes
100 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
Chapter 1 Python Pandas - I
No ratings yet
Chapter 1 Python Pandas - I
35 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Python Pandas Dataframe
No ratings yet
Python Pandas Dataframe
21 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Pandas DataFrame Creation Guide
No ratings yet
Pandas DataFrame Creation Guide
7 pages
Unit 4.2
No ratings yet
Unit 4.2
24 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas
No ratings yet
Pandas
8 pages
DataFrame in Pandas
No ratings yet
DataFrame in Pandas
4 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Class 12 Panda Project
No ratings yet
Class 12 Panda Project
13 pages
Dataframe PDF
No ratings yet
Dataframe PDF
14 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
No ratings yet
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
4 pages
Ip Study
No ratings yet
Ip Study
18 pages
Pandas DataFrame Basics
No ratings yet
Pandas DataFrame Basics
48 pages
Lab 9
No ratings yet
Lab 9
9 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Unit IV Part7
No ratings yet
Unit IV Part7
5 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Data Analysis - 5th Unit
No ratings yet
Data Analysis - 5th Unit
14 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas Assignment 3
No ratings yet
Pandas Assignment 3
5 pages
Python Pandas
No ratings yet
Python Pandas
34 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
6 pages
Pandas DataFrame Guide for Informatics
No ratings yet
Pandas DataFrame Guide for Informatics
11 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Pandas
No ratings yet
Pandas
16 pages
Pandas - Ipynb - Colab
No ratings yet
Pandas - Ipynb - Colab
8 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
P.no 35 To 52
No ratings yet
P.no 35 To 52
18 pages
Unit 4
No ratings yet
Unit 4
36 pages
Python Pandas Dataframe
No ratings yet
Python Pandas Dataframe
3 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
Pandas Dataframe1
No ratings yet
Pandas Dataframe1
43 pages
Lecture 9 Pandas
No ratings yet
Lecture 9 Pandas
176 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
Data Frames
No ratings yet
Data Frames
42 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Subject IP
No ratings yet
Subject IP
9 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
DataFrame Notes1
No ratings yet
DataFrame Notes1
32 pages
XII IP Resource Material - CSV Files
No ratings yet
XII IP Resource Material - CSV Files
7 pages
XII IP Resource Material - SOCIETAL IMPACTS
No ratings yet
XII IP Resource Material - SOCIETAL IMPACTS
12 pages
XII IP Resource Material - SQL Functions
No ratings yet
XII IP Resource Material - SQL Functions
14 pages
Document New New
No ratings yet
Document New New
13 pages
Analysis of Training Evaluation Process Using Kirkpatrick'S Training Evaluation Model at Pt. Bank Tabungan Negara (Persero) TBK
No ratings yet
Analysis of Training Evaluation Process Using Kirkpatrick'S Training Evaluation Model at Pt. Bank Tabungan Negara (Persero) TBK
10 pages
JLL Report On Green Building
No ratings yet
JLL Report On Green Building
29 pages
Dutch Flower Industry Analysis
No ratings yet
Dutch Flower Industry Analysis
13 pages
User Manual
No ratings yet
User Manual
2 pages
Entrepreneurship
No ratings yet
Entrepreneurship
27 pages
What Is The Reformed Faith
No ratings yet
What Is The Reformed Faith
5 pages
Electrical Engg Exam Paper
No ratings yet
Electrical Engg Exam Paper
25 pages
B.Tech Semester 7 Results
No ratings yet
B.Tech Semester 7 Results
2 pages
VITBEE 2022 Syllabus Sample Questions
No ratings yet
VITBEE 2022 Syllabus Sample Questions
2 pages
Republic of The Philippines Division of Bohol Department of Education Region VII, Central Visayas
No ratings yet
Republic of The Philippines Division of Bohol Department of Education Region VII, Central Visayas
6 pages
Zimbabwe School Examinations Council: Accounting 9197/3
50% (2)
Zimbabwe School Examinations Council: Accounting 9197/3
8 pages
Elms Activity 2
No ratings yet
Elms Activity 2
2 pages
Top 61 MCQ On Literary Theory and Criticism
100% (13)
Top 61 MCQ On Literary Theory and Criticism
18 pages
Programing TSSM Button
No ratings yet
Programing TSSM Button
10 pages
ISO 13485:2016 Design & Development Guide
No ratings yet
ISO 13485:2016 Design & Development Guide
3 pages
Prospective Board Member Questionnaire
No ratings yet
Prospective Board Member Questionnaire
2 pages
Result Declared - MJ - 2025 - 05.07.2025
No ratings yet
Result Declared - MJ - 2025 - 05.07.2025
47 pages
Ground Sensor Ga Class 0940 Testing
No ratings yet
Ground Sensor Ga Class 0940 Testing
4 pages
CASE 12-159347 Redacted
No ratings yet
CASE 12-159347 Redacted
5 pages
Injection Molding Control Plan
100% (1)
Injection Molding Control Plan
3 pages
Soil Permeability Calculations
No ratings yet
Soil Permeability Calculations
2 pages
What Is Development Studies
No ratings yet
What Is Development Studies
8 pages
Resume Kartikey Bharadwaj-1
No ratings yet
Resume Kartikey Bharadwaj-1
2 pages
Cabin Interior System - Lavatory
No ratings yet
Cabin Interior System - Lavatory
66 pages
TRC Southwire GFCI 32360001 41240001
No ratings yet
TRC Southwire GFCI 32360001 41240001
1 page
BMC Control-M For ZOS 9.0.19 User Guide
100% (1)
BMC Control-M For ZOS 9.0.19 User Guide
846 pages
Lesson 10 Gamification in Teaching English
No ratings yet
Lesson 10 Gamification in Teaching English
20 pages
Rule 8: Action To Avoid A Collision
100% (3)
Rule 8: Action To Avoid A Collision
48 pages
Educational Psychology 5664
No ratings yet
Educational Psychology 5664
9 pages
Harmonic Reduction in VSI: SVPWM vs SPWM
No ratings yet
Harmonic Reduction in VSI: SVPWM vs SPWM
5 pages

XII IP Resource Material - DataFrame

Uploaded by

XII IP Resource Material - DataFrame

Uploaded by

Resource Material - Informatics Practices (XII )

Chapter 2 - Data Handling using Pandas – I

DATAFRAME-It is a two-dimensional object that is useful in representing data in the form of

ID NAME DEPT SEX EXPERIENCE

A data frame can be created using any of the following-

pd.DataFrame( data, index, column)

Creating an Empty Dataframe

Creating Dataframe from List

Creating Dataframe from another Series

Creating a DataFrame from Dictionary of Series

Creating a DataFrame from List of Lists

Creation of DataFrame from Dictionary of Lists

Creation of DataFrame from NumPy ndarrays

# creating the Numpy array

# creating a list of index names

# creating a list of column names

# creating the dataframe

Rollno Total Percentage

SELECTING A PARTICULAR COLUMN

 Selecting / Accessing a column

 Selecting / Accessing multiple columns

 To access multiple rows:

Note: Make sure not to miss the colon after comma.

Note: Make sure not to miss the colon before comma.

 To access range of columns from a range of rows:

 Selecting / Accessing a subset from a DataFrame using Row/Column numeric

When we use iloc, then end index is excluded.

 Selecting / Accessing individual value

 Assigning / Modifying Data Values in DataFrame

 There are some other ways for adding a column to a database.

 To change or add a row

 To change or modify a single data value

Iterating over a DataFrame

 Using pandas.iteritems() Function

 Head and Tail Functions

 Renaming index / column labels

2. What will be the output of df.iloc[3:7,3:6]?

5. Carefully observe the following code:

Case study questions:

I. Write code to delete column B

2. Consider the following Data Frame df and answer questions

3. Consider the following Data Frame df and answer questions

I. Display the name of city whose population >=20 range of 12 to 20

III. Display the df with rows in the reverse order

4. What are the purpose of following statements-

Code to create the above data frame:

1. Choose the right code from the following for statement 1.

Answer: (a) df[['Month','Passengers']][df['Month']=='Jan']

Write Python commands to do the following:

a. df['Name'][(df['Second']>=12) and (df['Second']<=20)]

Multiple Choice Questions

a. print(df.shape()) b. print(df.shape) c. print(df.size) d. print(df.size()).

ASSERTION AND REASONING based questions. Mark the correct choice as

She can expand or delete any row /column in this dataframe.

You might also like