KEMBAR78
XII IP Resource Material - DataFrame | PDF | Computer Data | Computer Programming
0% found this document useful (0 votes)
9 views22 pages

XII IP Resource Material - DataFrame

ip resource material dataframe

Uploaded by

doodhwala894
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

XII IP Resource Material - DataFrame

ip resource material dataframe

Uploaded by

doodhwala894
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Resource Material - Informatics Practices (XII )

Chapter 2 - Data Handling using Pandas – I

DATAFRAME

DATAFRAME-It is a two-dimensional object that is useful in representing data in the form of


rows and columns. It is similar to a spreadsheet or an SQL table. This is the most commonly
used pandas object. The data in DataFrame is aligned in a tabular fashion in rows and columns
therefore has both a row and column labels. Each column can have a different type of value
such as numeric, string, boolean, etc. Once we store the data into the Dataframe, we can perform
various operations that are useful in analyzing and understanding the data.

For example: ID

ID NAME DEPT SEX EXPERIENCE


101 JOHN ENT M 12
104 SMITH ORTHOPE M 5
DIC
107 GEORGE CARDIOL M 10
OGY
109 LARA SKIN F 3
113 GEORGE MEDICINE F 9
115 JOHNSON ORTHOPE M 10
DIC
The above table describes data of doctors in the form of rows and columns. Here vertical subset
are columns and horizontal subsets are rows. The column labels are Id, Name, Dept, Sex and
Experience and row labels are 101, 104, 107, 109, 113 and 115.
Properties of DataFrame
1. A Dataframe has axes (indices)-
● Row Index (axis=0)
● Column Index (axis=1)
2. It is similar to a spreadsheet, whose row index is called index and column index is called
column name.
3. A Dataframe contains Heterogeneous data.
4. A Dataframe Size is Mutable.
5. A Dataframe Data is Mutable
Difference between DataFrames and Numpy Array
The basic difference between a Numpy array and a Dataframe is that Numpy array contains
homogenous data while Dataframe contains heterogenous data.

A data frame can be created using any of the following-


1. Series
2. Lists
3. Dictionary
4. A numpy 2D array

SYNTAX:
import pandas as pd

pd.DataFrame( data, index, column)

where
data: takes various forms like series, list, constants/scalar values, dictionary, another
dataframe.
index: specifies index/row labels to be used for resulting frame. They are unique and hashable
with same length as data. Default is np.arrange(n) if no index is passed.
column: specifies column labels to be used for resulting frame. They are unique and hashable
with same length as data. Default is np.arrange(n) if no index is passed.

Creating an Empty Dataframe


An empty dataframe can be created as follows:

EXAMPLE 1
import pandas as pd
dfempty = pd.DataFrame()
print (dfempty)

OUTPUT:

Empty DataFrame
Columns: []
Index: []

Creating Dataframe from List

import pandas as pd
list1=[10,20,30,40,50]
df = pd.DataFrame(list1)
print(df)

Output
0
0 10
1 20
2 30
3 40
4 50
How to create Dataframe From Series
Program-

import pandas as pd
s = pd.Series(['a','b','c','d']) When a single series is used to create a
df=pd.DataFrame(s) dataframe, the elements of the series
print(df) become the elements of the column in a
dataframe
Output-
0
0 a
1 b Default Column Name As 0
2 c
3 d

Creating Dataframe from another Series

import pandas as pd
s=pd.Series([1,2,3,4,5])
s1=pd.Series([10,20,30,40,50])
df = pd.DataFrame([s,s1])
When two or more series are provided as an
print(df)
argument to a dataframe, the elements of the
Output series shall become the rows in the resultant
0 1 2 3 4 dataframe.
0 1 2 3 4 5
1 10 20 30 40 50

Creating a DataFrame from Dictionary of Series

import pandas as pd
s=pd.Series([1,2,3,4,5])
s1=pd.Series([60,80,50,89,85])
df = pd.DataFrame({‘Roll No':s,‘Marks':s1})
print(df)
Dictionary of Series can be passed to form a
Output
DataFrame. The resultant index is the union
Roll No Marks of all the series indexes passed.
0 1 60
1 2 80
2 3 50
3 4 89 Here, the dictionary keys are treated as
4 5 85 column labels and row labels take default
values starting from zero. The values
Creating a DataFrame from List of Dictionaries corresponding to each key are treated as
rows. The number of rows is equal to the
import pandas as pd number of dictionaries present in the list.
dic=[{'Name':'Rajat','Sname':'Sehgal'}, There are two rows in the above dataframe
{'Name':'Rajesh','Sname':'Dabur'}, {'Name':'Rishi}] as there are three dictionaries in the list. In
df = pd.DataFrame(dic) the Third row the value corresponding to
print(df) key Sname is NaN because Sname key is
missing in the Third dictionary.
Output
Name Sname
0 Rajat Sehgal
1 Rajesh Dabur
2 Rishi NaN

Creating a DataFrame from List of Lists

import pandas as pd
l=[[101,'Rajat'],[102,'Rajesh'],[103,'Rishi'],[104,'Sanjay']]
df = pd.DataFrame(l, columns=['Rollno','Name'])
print(df)

Output
Rollno Name
0 101 Rajat
1 102 Rajesh
df = pd.DataFrame(l,index=[‘I’,’II’,’III’,’IV’],
2 103 Rishi
3 104 Sanjay columns=['Rollno','Name'])

Creation of DataFrame from Dictionary of Lists

import pandas as pd
d1={"Rollno":[1,2,3], "Total":[350.5,400,420], "Percentage":[70,80,84]}
df1=pd.DataFrame(d1)
print(df1)

OUTPUT Here, the dictionary keys are treated as column labels and row
Rollno Total Percentage labels take default values starting from zero.
0 1 350.5 70
1 2 400.0 80
2 3 420.0 84

Creation of DataFrame from NumPy ndarrays


# importiong the modules
import pandas as pd
import numpy as np

# creating the Numpy array


array = np.array([[1, 1, 1], [2, 4, 8], [3, 9, 27], [4, 16, 64], [5, 25, 125],
[6, 36, 216], [7, 49, 343]])

# creating a list of index names


index_values = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh']

# creating a list of column names


column_values = ['number', 'squares', 'cubes']

# creating the dataframe


df = pd.DataFrame(data = array, index = index_values, columns = column_values)
# displaying the dataframe
print(df)
Output

DATAFRAME ATTRIBUTES
The dataframe attribute is defined as any information related to the dataframe object such as
size, datatype. etc. Below are some of the attributes about the dataframe object (Consider the
dataframe df1 defined below for all the examples):

Rollno Total Percentage


0 1 350.5 70
1 2 400.0 80
2 3 420.0 84
3 4 356.0 80
4 5 434.0 87
5 6 398.0 79

1. df1.size
Return an int representing the number of elements in given dataframe.
print(df1.size)

OUTPUT
18
(6 rows X 3 columns =18)

2. df1.shape
Return a tuple representing the dimensions of the DataFrame.
print(df1.shape)

OUPUT
(6, 3)

3. df1.axes
Return a list representing the axes of the DataFrame.
print(df1.axes)

OUTPUT
[Int64Index([1, 2, 3, 4, 5, 6], dtype='int64'), Index(['Rollno', 'Total', 'Percentage'],
dtype='object')]

4. df1.ndim
Return an int representing the number of axes / array dimensions
print(df1.ndim)
OUTPUT
2

5. df1.columns
The column labels of the DataFrame
print(df1.columns)
OUTPUT
Index(['Rollno', 'Total', 'Percentage'], dtype='object')

6. df1.values
Return a Numpy representation of the DataFrame.
print(df1.values)
OUTPUT
[[ 1. 350.5 70. ]
[ 2. 400. 80. ]
[ 3. 420. 84. ]
[ 4. 356. 80. ]
[ 5. 434. 87. ]
[ 6. 398. 79. ]]

7. df1.empty
Indicator whether DataFrame is empty.
print(df1.empty)
OUTPUT:
False

8. df1.index
In pandas.DataFrame the row labels are called indexes, If you want to get index labels
separately then we can use pandas.DataFrame “index” attribute.
print(df1.index)
OUTPUT:
Index([1,2,3,4,5,6])
ROW/COLUMN OPERATIONS

SELECTING A PARTICULAR COLUMN

To access the columns data, we can mention the column name as subscript.
e.g.- df[empid]. This can also be done by using df.empid. To access multiple columns we can
wite as df[ [col1, col2,---] ]

Example:

import pandas as pd
dict={'BS':[80,98,100,65,72],'ACC':[88,67,93,50,90],
'ECO':[100,75,89,40,96],'IP':[100,98,92,80,86]}
df5=pd.DataFrame(dict,index=['Ammu','Achu','Manu','Anu','Abu'])
print(df5)
Output:

Note: Now if we want to select/display a particular column empid then we will write it as
follows:

 Selecting / Accessing a column


Syntax :
<dataframe object>[<column name>] Or <dataframe object>.<column name>
 In the dot notation make sure not to put any quotation marks around the column
name.
print(df5.BS)
or
print(df5['BS'])

 Selecting / Accessing multiple columns


Syntax :
<dataframe object>[[<column name>,<column name>,…….]]
 Columns appear in the order of column names given in the list inside square
brackets.
print(df5[['BS','IP']])
 Selecting / Accessing a subset from a DataFrame using Row/Column names
<dataframe object>.loc[<start row>:<end row>,<start column>:<end column>]

 To access a row:
<dataframe object>.loc[<row label>, : ]
Note: Make sure not to miss the colon after comma

 To access multiple rows:


<dataframe object>.loc[<start row>:<end row> , : ]
 Python will return all rows falling between start row and end row; along with start
row and end row.

print(df5.loc['Ammu':'Manu', : ])

Note: Make sure not to miss the colon after comma.


 To access selective columns:
<dataframe object>.loc[ : , <start column> : <end column>]
 Lists all columns falling between start and end column.
print(df5.loc[:,'ACC':'IP'])

Note: Make sure not to miss the colon before comma.

 To access range of columns from a range of rows:


<dataframe object>.loc[<start row> : <end row>,
<start column> : <end column>]
print(df5.loc['Manu':'Abu','ACC':'ECO'])

 Selecting / Accessing a subset from a DataFrame using Row/Column numeric


Index/position
Sometimes our dataframe object does not contain row or column labels or even we may
not remember, then to extract subset from dataframe we can use iloc.

<dataframe object>.iloc[<start row index> : <end row index>, [<start column index>
: <end column index>]

When we use iloc, then end index is excluded.


print(df5.iloc[1:3,1:3])

 Selecting / Accessing individual value


(i) Either give name of row or numeric index in square bracket of column name
<dataframe object>.<column>[<row name or row numeric index>]
print(df5.ACC['Achu']) 67
or
print(df5.ACC[1])

 Assigning / Modifying Data Values in DataFrame


 To change or add a column
<dataframe object>[<column name>]=<new value>
 If the given column name does not exist in
dataframe then a new column with the name is
added.
df5['ENG']=60
print(df5)
 If you want to add a column that has different values for all its rows, then we
can assign the data values for each row of the column in the form of a list.
df5[‘ENG’]=[50,60,40,30,70]

 There are some other ways for adding a column to a database.


<dataframe object>.loc[ : ,<column name>]=value
df5.loc[ : ,'ENG']=60
(df5)

 To change or add a row


<dataframe object>.loc[rowname , : ]=value
df5.loc['Sabu', : ]=50
print(df5)
 If there is no row with such row label,
then adds new row with this row label and
assigns given values to all its columns.

 To change or modify a single data value


<dataframe object>.<column>[<row label or row index>] = value
df5.BS['Ammu']=100
print(df5)
or
df5.BS[0]=100
print(df5)
 Deleting columns in DataFrame
 We can use del statement, to delete a column
del <dataframeobject>[<column name>]
e.g.: del df5[‘ENG’]
 We can use drop() also to delete a column. By default axis=0.
<dataframe object> = <dataframeobject>.drop([<columnname or index>],axis=1)
Or
<dataframe object> = <dataframeobject>.drop(columns=[<columnnames or
indices>])
df5=df5.drop([‘ECO’], axis =1)
df5=df5.drop(columns=['ECO','IP'])
 We can use pop() to delete a column. The deleted column will be returned as Series
object.
bstud=df5.pop(‘BS’)
print(bstud)
 Deleting rows in DataFrame
<dataframe object>=<dataframe object>.drop([index or sequence of index], axis=0)
df5=df5.drop(['Ammu','Achu'])
or
df5=df5.drop(index=['Ammu','Achu'])

Iterating over a DataFrame


 Using pandas.iterrows() Function
 The method <DF>.iterrows() views a dataframe in the form of horizontal subset ie
row-wise.
 Each horizontal subset is in the form of (rowindex, Series) where Series contains
all column values for that row –index.
 We can iterate over a Series object just as we iterate over other sequences.

import pandas as pd
dict={'BS':[80,98],'ACC':[88,67]}
df5=pd.DataFrame(dict,index=['Ammu','Achu'])
print(df5,"\n")
for (row,rowseries) in df5.iterrows():
print("Row index:",row)
print("containing")
i=0
for val in rowseries:
print("At position ",i,":",val)
i=i+1
print()

 Using pandas.iteritems() Function


 The method <DF>.iteritem() views a dataframe in the form of vertical subset ie
column-wise.
 Each vertical subset is in the form of (col-index, Series) where Series contains all
row values for that column index.
import pandas as pd
dict={'BS':[80,98],'ACC':[88,67]}
df5=pd.DataFrame(dict,index=['Ammu','Achu'])
print(df5,"\n")
for (column,columnseries) in df5.iteritems():
print("Column index:",column)
print("containing")
i=0
for val in columnseries:
print("At row ",i,":",val)
i=i+1
print()

 Head and Tail Functions


 head()
<DF>.head([n=5])
 To retrieve 5, top rows of a dataframe.
 We can change the number of rows by specifying value for n.
df5.head(5)
df5.head(2)
 tail()
 To retrieve 5, bottom rows of a dataframe.
 We can change the number of rows by specifying value for n.
df5.tail(5)
df5.tail(2)

 Renaming index / column labels


 rename() renames the existing index or column labels in a dataframe/series.
 The old and new index/column labels are to be provided in the form of a
dictionary where keys are the old indexes/row labels and the values are the new
names for the same.
Syntax:
<DF>.rename(index=None, columns=None, inplace=False)
where index and columns are dictionary like.
inplace, a boolean by default False (which returns a new dataframe with
renamed index/labels).
If True then changes are made in the current dataframe.

import pandas as pd
dict={'p_id':[101,102],'p_name':['Hard disk','Pen Drive']}
df=pd.DataFrame(dict)
print(df,"\n")

df.rename(columns={'p_id':'Product_ID','p_name':'product_name'},inplace=True)
or
df=df.rename(columns={'p_id':'Product_ID','p_name':'product_name'})
print(df)
 Columns can also be renamed by using the columns attribute of dataframe.
import pandas as pd
dict={'p_id':[101,102],'p_name':['Hard disk','Pen Drive']}
df=pd.DataFrame(dict)
df.columns=['Product_ID','product_name']
print(df,"\n")

 Boolean indexing
 Like default indexing (0,1,2…) or labeled
indexing , there is one more way to index –
Boolean Indexing (Setting row index to True/
False etc.) .
 This helps in displaying the rows of Data Frame,
according to True or False as specified in the
command.
import pandas as pd
dict={'p_id':[101,102,103],'p_name':['Hard disk','Pen Drive','Camera']}
df=pd.DataFrame(dict)
df.index=[True,False,True]
print(df,"\n")
print(df.loc[True])

Questions / Answers
1. Give the output:
import pandas as pd
Dic={‘empno’:[101,102,103,104,105,106],’grade’:[‘a’,’b’,’a’,’c’,’b’,’c’] , ’dept’:
[‘sales’,’pur’,’mar’,’sales’,’pur’,’mar’]}
df=pd.DataFrame(Dic)
print(df.head(3))
Output:
empno grade dept
0 101 a sales
1 102 b pur
2 103 a mar

2. What will be the output of df.iloc[3:7,3:6]?


Answer:
It will display the rows with index 3 to 6 and columns with index 3 to 5 in a dataframe
‘df’.
3. Write a python program to create a data frame with headings (CS and IP) from the
list given below-
[[79,92][86,96],[85,91],[80,99]]
Answer:
l=[[10,20],[20,30],[30,40]]
df=pd.DataFrame(l,columns=['CS','IP'])
print(df)
4. Write python statement to delete the 3rd and 5th rows from dataframe df.
Answer:
df1=df.drop(index=[2,4],axis=0)
or
df1=df.drop([2,4])

5. Carefully observe the following code:


import pandas as pd
Year1={'Q1':5000,'Q2':8000,'Q3':12000,'Q4': 18000}
Year2={'A' :13000,'B':14000,'C':12000}
totSales={1:Year1,2:Year2}
df=pd.DataFrame(totSales)
print(df)
Answer the following:
i. List the index of the DataFrame df
ii. List the column names of DataFrame df.
Answer: i. The index labels of df will include Q1,Q2,Q3,Q4,A,B,C
ii. The column names of df will be: 1,2
6. Consider the given DataFrame ‘Stock’:
Name Price
0 Nancy Drew 150
1 Hardy boys 180
2 Diary of a wimpy kid 225
3 Harry Potter 500
Write suitable Python statements for the following:
a) Add a column called Special_Price with the following data: [135,150,200,440].
b) Add a new book named ‘The Secret' having price 800.
c) Remove the column Special_Price.
Answer:
a) Stock['Special_Price']=[135,150,200,400]
b) Stock.loc['4']=['The Secret',800]
c) Stock=Stock.drop('Special_Price',axis=1)
7. Mr. Som, a data analyst has designed the DataFrame df that contains data about
Computer Olympiad with ‘CO1’, ‘CO2’, ‘CO3’, ‘CO4’, ‘CO5’ as indexes shown
below. Answer the following questions:

A. Predict the output of the following python statement: i. df.shape ii. df[2:4]
B. Write Python statement to display the data of Topper column of indexes CO2 to CO4.
C. Write Python statement to compute and display the difference of data of Tot_students
column and First_Runnerup column of the above given DataFrame.

Answer:
A: i. (5,4)
ii. School tot_students Topper First_Runner_up
CO3 GPS 20 18 2
CO4 MPS 18 10 8
B. print(df.loc['CO2': 'CO4', 'Topper'])
C. print(df.Tot_students-df.First_Runnerup)
8. Write a Python code to create a DataFrame with appropriate column headings from
the list given below:
[[101,'Gurman',98],[102,'Rajveer',95],[103,'Samar' ,96],[104,'Yuvraj',88]]
Answer:

import pandas as pd
data=[[101,'Gurman',98],[102,'Rajveer',95],[103,'Samar' ,96], [104,'Yuvraj',88]]
df=pd.DataFrame(data,columns=['Rno','Name', 'Marks'])

Case study questions:


1. Consider the following DataFrame df and answer questions

I. Write code to delete column B


II. Write the output of the below code
print(df.tail(2))
III. Write code to delete row salary
IV. Change the value of column A to 100
V. Change the value of DEPT of B to MECH
VI. Display DEPT and SALARY of column A and B
VII. Write code to rename column ‘A’ to ‘D’ which will not effect original
dataframe
VIII. Write code to add a column E with values [CS, 104,XYZ, 300000]
IX. Write code to add a row COMM with values [3000,4000,5000]
X. Write code to rename DEPT to DEPARTMENT which will effect the original
dataframe
XI. Write code to display DEPT in A
a) print(df.A[‘DEPT’])
b) print(df[‘A’,’DEPT’])
c) print(df.iloc[1:2,1:2])
XII. Write the output of the statement print(len(df))
i. 3 ii. 4 iii. (4,3) iv. (3,4)

Answer:
i. del df['A']
ii. A B C
ENAME ABC PQR LMN
SALARY 200000 100000 20000
iii. df=df.drop(['SALARY'],axis=0)
iv. df['A']=100
v. df.B['DEPT']='MECH'
vi. print(df.loc[['DEPT','SALARY'],["A","B"]])
vii. df.rename(columns={"A":"D"},inplace=False)
viii.df['E']=["CS",104,"XYZ",300000]
ix. df.loc['COMM']=[3000,4000,5000]
x. df.rename(index={"DEPT":"DEPARTMENT"},inplace=True)
xi. print(df.A[‘DEPT’])
xii. 4

2. Consider the following Data Frame df and answer questions


I. Display details of city delhi and chennai
II. Display hospitals in delhi
III. Display shape of dataframe
IV. Change the population in kolkatta as 50
V. Rename the column population as “pop”

Answers:
I. print(df[['delhi','chennai']])
II. print(df.delhi['hospitals'])
III. print(df.shape)
IV. df.kolkatta['population']=50
V. df.rename(index={"population":"pop"},inplace=True)

3. Consider the following Data Frame df and answer questions

I. Display the name of city whose population >=20 range of 12 to 20


II. Write command to set all vales of df as 0

III. Display the df with rows in the reverse order


IV. Display the df with only columns in the reverse order
V. Display the df with rows & columns in the reverse order
Answer:

I. print(df[df.population>=20])
I. df[:]=0
II. print(df.iloc[::-1)
III. print(df.iloc[:,::-1])
IV. print(df.iloc[::-1,::-1])

4. What are the purpose of following statements-


I. 1.df.columns
II. 2. df.iloc[ : , :-5]
III. 3. df[2:8]
IV. df[ :]
V. df.iloc[ : -4 , : ]

Answers:
1. It displays the names of columns of the Dataframe.
2. It will display all columns except the last 5 columns.
3. It displays all columns with row index 2 to 7.
4. It will display entire dataframe with all rows and columns.
5. It will display all rows except the last 4 four rows

5. Mr. Ankit is working in an organisation as data analyst. He uses Python Pandas and
Matplotlib for the same. He got a dataset of the passengers for the year 2010 to 2012
for January, March and December. His manager wants certain information from him,
but he is facing some problems. Help him by answering few questions given below:

Code to create the above data frame:


import pandas as ____________ #Statement 1
data={"Year":[2010,2010,2012,2010,2012],"Month":["Jan","Mar","Jan","Dec","Dec"]
,"Passengers":[25,50,35,55,65]}
df=pd.____________________(data) #Statement 2
print(df)

1. Choose the right code from the following for statement 1.


i. pd ii. df iii. data iv. P
Answer: i. pd
2. Choose the right code from the following for the statement 2.
i. Dataframe ii. DataFrame iii. Series iv Dictionary
Answer: ii. DataFrame
3. Choose the correct statement/ method for the required output: (5,3)
i. df.index ii. df.shape() iii. df.shape iv. df.size
Answer: (iii) df.shape

4. He wants to print the details of "January" month along with the number of
passengers, Identify the correct statement:

a) df.loc[['Month','Passengers']][df['Month']=='Jan']
b) df[['Month','Passengers']][df['Month']=='Jan']
c) df.iloc[['Month','Passengers']][df['Month']=='Jan']
d) df(['Month','Passengers']][df['Month']=='Jan')

Answer: (a) df[['Month','Passengers']][df['Month']=='Jan']


5. Sanyukta is the event incharge in a school. One of her students gave her a
suggestion to use Python Pandas andMatplotlib for analysing and visualising the
data, respectively. She has created a Data frame “SportsDay” to keeptrack of the
number of First, Second and Third prizes won by different houses in various
events.

Write Python commands to do the following:

I. Display the house names where the number of Second Prizes are in the range of
12 to 20.

a. df['Name'][(df['Second']>=12) and (df['Second']<=20)]


b. df[Name][(df['Second']>=12) & (df['Second']<=20)]
c. df['Name'][(df['Second']>=12) & (df['Second']<=20)]
d. df[(df['Second']>=12) & (df['Second']<=20)]
Answer: c. df['Name'][(df['Second']>=12) & (df['Second']<=20)]
II. Display all the records in the reverse order.
a. print(df[::1]) b. print(df.iloc[::-1]) c. print(df[-1:]+df[:-1]) d. print(df.reverse())
Answer: b. print(df.iloc[::-1])
iii. Display the bottom 3 records.
a. df.last(3) b. df.bottom(3) c. df.next(3) d. df.tail(3)
Answer: d. df.tail(3)
iv. Choose the correct output for the given statements:
x=df.columns[:1]
print(x)
a. 0 b. Name c. First d. Error
Answer: b. Name
v. Which command will give the output 24:
a. print(df.size) b. print(df.shape) c. print(df.index) d. print(df.axes)
Answer: a. df.size

Multiple Choice Questions

1. Mr. Ankit wants to change the index of the Data Frame and the output for the
same is given below. Identify the correct statement to change the index

a) df.index[]=["Air India","Indigo","Spicejet","Jet","Emirates"]
b) df.index["Air India","Indigo","Spicejet","Jet","Emirates"]
c) df.index=["Air India","Indigo","Spicejet","Jet","Emirates"]
d) df.index()=["Air India","Indigo","Spicejet","Jet","Emirates"]
Answer: (c) df.index=["Air India","Indigo","Spicejet","Jet","Emirates"]
6. To display the 3rd, 4th and 5th columns from the 6th to 9th rows of a dataframe you
can write
a) DF.loc[6:9, 3:5] b) DF.loc[6:10, 3:6] c) DF.iloc[6:10, 3:6] d) DF.iloc[6:9,
3:5]
Answer: c) DF.iloc[6:10, 3:6]
8. We can add a new row to a DataFrame using the _____________ method
(i) rloc[ ] (ii) loc[ ] (iii)iloc[ ] (iv)None of the above
Answer: (ii) loc[ ]
9. The head() function of dataframe will display how may rows from top if no
parameter is passed.
(i) 1 (ii) 3 (iii) 5 (iv) None of these
Answer : (iii) 5
10. Which function is used to find values from a DataFrame D using the index number?
a) D.loc b) D.iloc c) D.index d) None of these
Answer: b) D.iloc
11. In a DataFrame, Axis= 0 represents the elements
a.rows b.columns c.both d.None of these.
Answer: a.rows
12. In DataFrame, by default new column added as the _____________ column
(i) First (Left Side) (ii) Second (iii)Last (Right Side) (iv) Any where in
dataframe
Answer: (iii)Last (Right Side)
13. Which of the following is correct Features of DataFrame?
a. Potentially columns are of different types
b. Can Perform Arithmetic operations on rows and columns
c. Labeled axes (rows and columns)
d. All of the above
Answer: d. All of the above
14. When we create DataFrame from List of Dictionaries, then number of columns in
DataFrame isequal to the _______
a. maximum number of keys in first dictionary of the list
b. maximum number of different keys in all dictionaries of the list
c. maximum number of dictionaries in the list
d. None of the above
Answer: b. maximum number of different keys in all dictionaries of the list
15. When we create DataFrame from List of Dictionaries, then dictionary keys will
become ______
(i) Column labels (ii) Row labels (iii) Both of the above (iv) None of the above
Answer: (i) Column labels
16. Which method is used to access vertical subset of a dataframe?
(i) iterrows() (ii) iteritems() (iii) itercolumns() (iv) itercols()
Answer: (ii) iteritems()
17. Write statement to transpose dataframe DF.
(i) DF.t (ii) DF.transpose (iii)DF.T (iv)DF.T( )
Answer: (iii)DF.T
18. In DataFrame, by default new column added as the _____________ column
a. First (Left Side) b. Second c. Last (Right Side) d. Any where in dataframe
ANS: Last (Right Side)
19. We can add a new row to a DataFrame using the _____________ method
(i) rloc[ ] (ii) loc[ ] (iii) iloc[ ] (iv) None of the above
ANS: (ii) loc[ ]
20. Which among the following options can be used to create a DataFrame in Pandas ?
(a) A scalar value (b) An ndarray (c) A python dict (d) All of these
ANS:- (d) All of these
21. Write short code to show the information having city=”Delhi” from dataframe
SHOP.
(a) print(SHOP[City==’Delhi’]) (b) print(SHOP[SHOP.City==’Delhi’])
(c) print(SHOP[SHOP.’City’==’Delhi’]) (d) print(SHOP[SHOP[City]==’Delhi’])
ANS: (b) print(SHOP[SHOP.City==’Delhi’])
22. Which of the following commands is used to install pandas?
(i)pip install python –pandas (ii)pip install pandas (iii)python install python
(iv)python install pandas
ANS: (ii) pip install pandas
23. Which attribute of a dataframe is used to get number of axis?
a.T b.Ndim c.Empty d.Shape
ANS: b.Ndim
24. Display first row of dataframe ‘DF’
(i) print(DF.head(1)) (ii) print(DF[0 : 1]) (iii)print(DF.iloc[0 : 1]) (iv)All of the above
ANS: (iv)All of the above
25. To delete a column from a DataFrame, you may use statement.
(a) remove (b) del (c) drop (d) cancel statement.
ANS:- (b) del & (c) drop
26. In given code dataframe ‘Df1’ has ________ rows and _______ columns
import pandas as pd
dict= [{‘a’:10, ‘b’:20}, {‘a’:5, ‘b’:10, ‘c’:20},{‘a’:7, ‘d’:10, ‘e’:20}]
Df1 = pd.DataFrame(dict)
(i) 3, 3 (ii) 3, 4 (iii)3, 5 (iv)None of the above
ANS: (iii)3, 5
27. In the following statement, if column ‘mark’ already exists in the DataFrame ‘Df1’
then the assignment statement will __________ Df1['mark'] = [95,98,100] #There
are only three rows in DataFrame Df1
(i) Return error (ii) Replace the already existing values.
(iii)Add new column (iv)None of the above
ANS: (ii) Replace the already existing values.
28. Which of the following statement is false:
i. DataFrame is size mutable ii. DataFrame is value mutable
iii. DataFrame is immutable iv. DataFrame is capable of holding multiple types of data
ANS:- iii. DataFrame is immutable
29. To delete a row, the parameter axis of function drop( ) is assigned the value
______________
(i) 0 (ii) 1 (iii) 2 (iv) 3
ANS: (i) 0
30. Write code to delete rows those getting 5000 salary.
(a) df=df.drop[salary==5000] (b) df=df[df.salary!=5000]
(c) df.drop[df.salary==5000,axis=0] (d) df=df.drop[salary!=5000]
ANS: (b) df=df[df.salary!=5000]
31. DF1.loc[ ] method is used to ______ # DF1 is a DataFrame
(i) Add new row in a DataFrame ‘DF1’ (ii) To change the data values of a row to a
particular value (iii)Both of the above (iv)None of the above
ANS: (iii)Both of the above
32. To iterate over horizontal subsets of dataframe,
(a) iterate( ) (b) iterrows( ) function may be used. (c) itercols( ) (d) iteritems( )
ANS:- (b) iterrows( ) function may be used.
33. Write code to delete the row whose index value is A1 from dataframe df.
(a) df=df.drop(‘A1’) (b) df=df.drop(index=‘A1’) (c) df=df.drop(‘A1,axis=index’)
(d) df=df.del(‘A1’)
ANS: (a) df=df.drop(‘A1’)
34. A two-dimension labeled array that is an ordered collection of columns to store
heterogeneous data type is
i. Series ii. Numpy array iii.Dataframe iv. Panel
ANS:- iii. Dataframe
35. In Pandas _______________ is used to store data in multiple columns.
(i)Series (ii) DataFrame (iii) Both of the above (iv) None of the above
ANS: (ii) DataFrame
36. What is dataframe?
a. 2 D array with heterogeneous data b. 1 D array with homogeneous data
c. 2 D array with homogeneous data d. 1 D array with heterogeneous data
ANS: a. 2 D array with heterogeneous data
37. In a DataFrame, Axis= 1 represents the_____________ elements
(a) Row (b) Column (c) True (d) False
ANS: (b) Column
38. Which of the following is not an attribute of a DataFrame Object ?
a. index b. Index c. size d. value
ANS: b. Index
39. To get top 5 rows of a dataframe, you may use
(a) head( ) (b) head(5) (c) top( ) (d) top(5)
ANS:- (a) head( ) , b) head(5)
40. In a DataFrame, Axis= 1 represents the_____________ elements
(a) Row (b) Column (c) True (d) False
ANS: (b) Column
41. NaN stands for:
a. Not a Number b. None and None c. Null and Null d. None a Number
ANS: a. Not a Number
42. The following code create a dataframe named ‘Df1’ with _______________
columns.
import pandas as pd
Df1 = pd.DataFrame([10,20,30] )
(i) 1 (ii) 2 (iii) 3 (iv) 4
ANS: (i) 1
43. Write the single line command to delete the column “marks” from dataframe df
using drop function.
(a) df=df.drop(col=‘marks’) (b) df=df.drop(‘marks’,axis=col)
(c) df=df.drop(‘marks’,axis=0) (d) df=df.drop(‘marks’,axis=1)
ANS: (d) df=df.drop(‘marks’,axis=1)
44. The following statement will _________
df = df.drop(['Name', 'Class', 'Rollno'], axis = 1) #df is a DataFrame object
a. delete three columns having labels ‘Name’, ‘Class’ and ‘Rollno’
b. delete three rows having labels ‘Name’, ‘Class’ and ‘Rollno’
c. delete any three columns
d. return error
ANS:- a. delete three columns having labels ‘Name’, ‘Class’ and ‘Rollno’
45. Difference between loc() and iloc().
a. Both are Label indexed based functions.
b. Both are Integer position-based functions.
c. loc() is label based function and iloc() integer position based function.
d. loc() is integer position based function and iloc() index position based function.
ANS: c. loc() is label based function and iloc() integer position based function.
46. Which command will be used to delete 3 and 5 rows of the data frame. Assuming
the data frame name as DF.
a. DF.drop([2,4],axis=0) b. DF.drop([2,4],axis=1) c. DF.drop([3,5],axis=1) d. DF.drop([3,5])
ANS: a DF.drop([2,4],axis=0)
47. Assuming the given structure, which command will give us the given output:
Output Required: (3,4)

a. print(df.shape()) b. print(df.shape) c. print(df.size) d. print(df.size()).


ANS: b. print(df.shape)
48. Write the output of the given command: df1.loc[:0,'Name'] Consider the given
dataframe.
EmpCode Name Desig
0 1405 VINAY Clerk
1 1985 MANISH Works Manager
2 1636 SMINA Sales Manager
3 1689 RINU Clerk
a. 0 1405 VINAY Clerk b. VINAY c. Works Manager d. Clerk
ANS : VINAY
49. Which of the following can be used to specify the data while creating a DataFrame?
i. Series ii. List of Dictionaries iii. Structured ndarray iv. All of these
ANS; iv All of Above

ASSERTION AND REASONING based questions. Mark the correct choice as


a) Both A and R are true and R is the correct explanation for A
b) Both A and R are true and R is not the correct explanation for A
c) A is True but R is False
d) A is false but R is True
1. Assertion (A):- DataFrame has both a row and column index.
Reasoning (R): - A DataFrame is a two-dimensional labelled data structure like a table
of MySQL.
Answer: a
2. Assertion (A): The rename function of Data Frame does not rename the columns of the
original data frame, but instead returns a dataframe with updated column names.
Reasoning (R): Default value of inplace parameter in rename function is False.
Answer: a
3. Assertion (A): loc is used to extract a subset of a data frame.
Reasoning (R): Transpose of a dataframe df can be obtained using df.T
Answer: b
4. Assertion (A): DataFrame has both a row and column index.
Reasoning (R): .loc() is a label based data selecting method to select a specific row(s) or
column(s) which we want to select.
Answer: a
5. Assertion (A): When DataFrame is created by using Dictionary, keys of dictionary are set
as columns of DataFrame.
Reasoning (R):- Boolean Indexing helps us to select the data from the DataFrames using
a boolean vector.
Answer: b
6. Assertion (A):- While creating a dataframe with a nested or 2D dictionary, Python
interprets the outer dict keys as the columns and the inner keys as the row indices.
Reasoning (R):- A column can be deleted using remove command
Answer: (c)
7. Assertion (A) : Pandas is an open source Python library which offers high performance,
easy-to-use data structures and data analysis tools.
Reason (R) : Professionals and developers are using the pandas library in data science
and machine learning.
Answer: (a)
8. ASSERTION(A):drop() function removes data from a Dataframe temporarily.
REASONING(R): Axis parameter is compulsory with drop() function.
Answer(a)
9. Assertion(A): In python pandas at attribute is to select or access multiple values from
data frame.
Reasoning(R): In python pandas, loc attribute is used to select or access a single/multiple
value(s) from dataframe.
Answer (d)
10.Assertion (A): Nidhi has create dataframe Df1

She can expand or delete any row /column in this dataframe.


Reasoning(R): In python DataFrame objects can be concatenated or merged
Answer: (a)
11.Assertion (A): Boolean indexing is a type of indexing.
Reasoning (R) : DataFrame.loc(False) function can be used to find the relative values
where index value is False
Answer: (a)
12.Assertion (A) : import pandas as pd is used to import pandas library.
Reason (R) : It is a python library so it is to be imported for using its function.
Answer: (a)

You might also like