XII IP Resource Material - DataFrame
XII IP Resource Material - DataFrame
DATAFRAME
For example: ID
SYNTAX:
import pandas as pd
where
data: takes various forms like series, list, constants/scalar values, dictionary, another
dataframe.
index: specifies index/row labels to be used for resulting frame. They are unique and hashable
with same length as data. Default is np.arrange(n) if no index is passed.
column: specifies column labels to be used for resulting frame. They are unique and hashable
with same length as data. Default is np.arrange(n) if no index is passed.
EXAMPLE 1
import pandas as pd
dfempty = pd.DataFrame()
print (dfempty)
OUTPUT:
Empty DataFrame
Columns: []
Index: []
import pandas as pd
list1=[10,20,30,40,50]
df = pd.DataFrame(list1)
print(df)
Output
0
0 10
1 20
2 30
3 40
4 50
How to create Dataframe From Series
Program-
import pandas as pd
s = pd.Series(['a','b','c','d']) When a single series is used to create a
df=pd.DataFrame(s) dataframe, the elements of the series
print(df) become the elements of the column in a
dataframe
Output-
0
0 a
1 b Default Column Name As 0
2 c
3 d
import pandas as pd
s=pd.Series([1,2,3,4,5])
s1=pd.Series([10,20,30,40,50])
df = pd.DataFrame([s,s1])
When two or more series are provided as an
print(df)
argument to a dataframe, the elements of the
Output series shall become the rows in the resultant
0 1 2 3 4 dataframe.
0 1 2 3 4 5
1 10 20 30 40 50
import pandas as pd
s=pd.Series([1,2,3,4,5])
s1=pd.Series([60,80,50,89,85])
df = pd.DataFrame({‘Roll No':s,‘Marks':s1})
print(df)
Dictionary of Series can be passed to form a
Output
DataFrame. The resultant index is the union
Roll No Marks of all the series indexes passed.
0 1 60
1 2 80
2 3 50
3 4 89 Here, the dictionary keys are treated as
4 5 85 column labels and row labels take default
values starting from zero. The values
Creating a DataFrame from List of Dictionaries corresponding to each key are treated as
rows. The number of rows is equal to the
import pandas as pd number of dictionaries present in the list.
dic=[{'Name':'Rajat','Sname':'Sehgal'}, There are two rows in the above dataframe
{'Name':'Rajesh','Sname':'Dabur'}, {'Name':'Rishi}] as there are three dictionaries in the list. In
df = pd.DataFrame(dic) the Third row the value corresponding to
print(df) key Sname is NaN because Sname key is
missing in the Third dictionary.
Output
Name Sname
0 Rajat Sehgal
1 Rajesh Dabur
2 Rishi NaN
import pandas as pd
l=[[101,'Rajat'],[102,'Rajesh'],[103,'Rishi'],[104,'Sanjay']]
df = pd.DataFrame(l, columns=['Rollno','Name'])
print(df)
Output
Rollno Name
0 101 Rajat
1 102 Rajesh
df = pd.DataFrame(l,index=[‘I’,’II’,’III’,’IV’],
2 103 Rishi
3 104 Sanjay columns=['Rollno','Name'])
import pandas as pd
d1={"Rollno":[1,2,3], "Total":[350.5,400,420], "Percentage":[70,80,84]}
df1=pd.DataFrame(d1)
print(df1)
OUTPUT Here, the dictionary keys are treated as column labels and row
Rollno Total Percentage labels take default values starting from zero.
0 1 350.5 70
1 2 400.0 80
2 3 420.0 84
DATAFRAME ATTRIBUTES
The dataframe attribute is defined as any information related to the dataframe object such as
size, datatype. etc. Below are some of the attributes about the dataframe object (Consider the
dataframe df1 defined below for all the examples):
1. df1.size
Return an int representing the number of elements in given dataframe.
print(df1.size)
OUTPUT
18
(6 rows X 3 columns =18)
2. df1.shape
Return a tuple representing the dimensions of the DataFrame.
print(df1.shape)
OUPUT
(6, 3)
3. df1.axes
Return a list representing the axes of the DataFrame.
print(df1.axes)
OUTPUT
[Int64Index([1, 2, 3, 4, 5, 6], dtype='int64'), Index(['Rollno', 'Total', 'Percentage'],
dtype='object')]
4. df1.ndim
Return an int representing the number of axes / array dimensions
print(df1.ndim)
OUTPUT
2
5. df1.columns
The column labels of the DataFrame
print(df1.columns)
OUTPUT
Index(['Rollno', 'Total', 'Percentage'], dtype='object')
6. df1.values
Return a Numpy representation of the DataFrame.
print(df1.values)
OUTPUT
[[ 1. 350.5 70. ]
[ 2. 400. 80. ]
[ 3. 420. 84. ]
[ 4. 356. 80. ]
[ 5. 434. 87. ]
[ 6. 398. 79. ]]
7. df1.empty
Indicator whether DataFrame is empty.
print(df1.empty)
OUTPUT:
False
8. df1.index
In pandas.DataFrame the row labels are called indexes, If you want to get index labels
separately then we can use pandas.DataFrame “index” attribute.
print(df1.index)
OUTPUT:
Index([1,2,3,4,5,6])
ROW/COLUMN OPERATIONS
To access the columns data, we can mention the column name as subscript.
e.g.- df[empid]. This can also be done by using df.empid. To access multiple columns we can
wite as df[ [col1, col2,---] ]
Example:
import pandas as pd
dict={'BS':[80,98,100,65,72],'ACC':[88,67,93,50,90],
'ECO':[100,75,89,40,96],'IP':[100,98,92,80,86]}
df5=pd.DataFrame(dict,index=['Ammu','Achu','Manu','Anu','Abu'])
print(df5)
Output:
Note: Now if we want to select/display a particular column empid then we will write it as
follows:
To access a row:
<dataframe object>.loc[<row label>, : ]
Note: Make sure not to miss the colon after comma
print(df5.loc['Ammu':'Manu', : ])
<dataframe object>.iloc[<start row index> : <end row index>, [<start column index>
: <end column index>]
import pandas as pd
dict={'BS':[80,98],'ACC':[88,67]}
df5=pd.DataFrame(dict,index=['Ammu','Achu'])
print(df5,"\n")
for (row,rowseries) in df5.iterrows():
print("Row index:",row)
print("containing")
i=0
for val in rowseries:
print("At position ",i,":",val)
i=i+1
print()
import pandas as pd
dict={'p_id':[101,102],'p_name':['Hard disk','Pen Drive']}
df=pd.DataFrame(dict)
print(df,"\n")
df.rename(columns={'p_id':'Product_ID','p_name':'product_name'},inplace=True)
or
df=df.rename(columns={'p_id':'Product_ID','p_name':'product_name'})
print(df)
Columns can also be renamed by using the columns attribute of dataframe.
import pandas as pd
dict={'p_id':[101,102],'p_name':['Hard disk','Pen Drive']}
df=pd.DataFrame(dict)
df.columns=['Product_ID','product_name']
print(df,"\n")
Boolean indexing
Like default indexing (0,1,2…) or labeled
indexing , there is one more way to index –
Boolean Indexing (Setting row index to True/
False etc.) .
This helps in displaying the rows of Data Frame,
according to True or False as specified in the
command.
import pandas as pd
dict={'p_id':[101,102,103],'p_name':['Hard disk','Pen Drive','Camera']}
df=pd.DataFrame(dict)
df.index=[True,False,True]
print(df,"\n")
print(df.loc[True])
Questions / Answers
1. Give the output:
import pandas as pd
Dic={‘empno’:[101,102,103,104,105,106],’grade’:[‘a’,’b’,’a’,’c’,’b’,’c’] , ’dept’:
[‘sales’,’pur’,’mar’,’sales’,’pur’,’mar’]}
df=pd.DataFrame(Dic)
print(df.head(3))
Output:
empno grade dept
0 101 a sales
1 102 b pur
2 103 a mar
A. Predict the output of the following python statement: i. df.shape ii. df[2:4]
B. Write Python statement to display the data of Topper column of indexes CO2 to CO4.
C. Write Python statement to compute and display the difference of data of Tot_students
column and First_Runnerup column of the above given DataFrame.
Answer:
A: i. (5,4)
ii. School tot_students Topper First_Runner_up
CO3 GPS 20 18 2
CO4 MPS 18 10 8
B. print(df.loc['CO2': 'CO4', 'Topper'])
C. print(df.Tot_students-df.First_Runnerup)
8. Write a Python code to create a DataFrame with appropriate column headings from
the list given below:
[[101,'Gurman',98],[102,'Rajveer',95],[103,'Samar' ,96],[104,'Yuvraj',88]]
Answer:
import pandas as pd
data=[[101,'Gurman',98],[102,'Rajveer',95],[103,'Samar' ,96], [104,'Yuvraj',88]]
df=pd.DataFrame(data,columns=['Rno','Name', 'Marks'])
Answer:
i. del df['A']
ii. A B C
ENAME ABC PQR LMN
SALARY 200000 100000 20000
iii. df=df.drop(['SALARY'],axis=0)
iv. df['A']=100
v. df.B['DEPT']='MECH'
vi. print(df.loc[['DEPT','SALARY'],["A","B"]])
vii. df.rename(columns={"A":"D"},inplace=False)
viii.df['E']=["CS",104,"XYZ",300000]
ix. df.loc['COMM']=[3000,4000,5000]
x. df.rename(index={"DEPT":"DEPARTMENT"},inplace=True)
xi. print(df.A[‘DEPT’])
xii. 4
Answers:
I. print(df[['delhi','chennai']])
II. print(df.delhi['hospitals'])
III. print(df.shape)
IV. df.kolkatta['population']=50
V. df.rename(index={"population":"pop"},inplace=True)
I. print(df[df.population>=20])
I. df[:]=0
II. print(df.iloc[::-1)
III. print(df.iloc[:,::-1])
IV. print(df.iloc[::-1,::-1])
Answers:
1. It displays the names of columns of the Dataframe.
2. It will display all columns except the last 5 columns.
3. It displays all columns with row index 2 to 7.
4. It will display entire dataframe with all rows and columns.
5. It will display all rows except the last 4 four rows
5. Mr. Ankit is working in an organisation as data analyst. He uses Python Pandas and
Matplotlib for the same. He got a dataset of the passengers for the year 2010 to 2012
for January, March and December. His manager wants certain information from him,
but he is facing some problems. Help him by answering few questions given below:
4. He wants to print the details of "January" month along with the number of
passengers, Identify the correct statement:
a) df.loc[['Month','Passengers']][df['Month']=='Jan']
b) df[['Month','Passengers']][df['Month']=='Jan']
c) df.iloc[['Month','Passengers']][df['Month']=='Jan']
d) df(['Month','Passengers']][df['Month']=='Jan')
I. Display the house names where the number of Second Prizes are in the range of
12 to 20.
1. Mr. Ankit wants to change the index of the Data Frame and the output for the
same is given below. Identify the correct statement to change the index
a) df.index[]=["Air India","Indigo","Spicejet","Jet","Emirates"]
b) df.index["Air India","Indigo","Spicejet","Jet","Emirates"]
c) df.index=["Air India","Indigo","Spicejet","Jet","Emirates"]
d) df.index()=["Air India","Indigo","Spicejet","Jet","Emirates"]
Answer: (c) df.index=["Air India","Indigo","Spicejet","Jet","Emirates"]
6. To display the 3rd, 4th and 5th columns from the 6th to 9th rows of a dataframe you
can write
a) DF.loc[6:9, 3:5] b) DF.loc[6:10, 3:6] c) DF.iloc[6:10, 3:6] d) DF.iloc[6:9,
3:5]
Answer: c) DF.iloc[6:10, 3:6]
8. We can add a new row to a DataFrame using the _____________ method
(i) rloc[ ] (ii) loc[ ] (iii)iloc[ ] (iv)None of the above
Answer: (ii) loc[ ]
9. The head() function of dataframe will display how may rows from top if no
parameter is passed.
(i) 1 (ii) 3 (iii) 5 (iv) None of these
Answer : (iii) 5
10. Which function is used to find values from a DataFrame D using the index number?
a) D.loc b) D.iloc c) D.index d) None of these
Answer: b) D.iloc
11. In a DataFrame, Axis= 0 represents the elements
a.rows b.columns c.both d.None of these.
Answer: a.rows
12. In DataFrame, by default new column added as the _____________ column
(i) First (Left Side) (ii) Second (iii)Last (Right Side) (iv) Any where in
dataframe
Answer: (iii)Last (Right Side)
13. Which of the following is correct Features of DataFrame?
a. Potentially columns are of different types
b. Can Perform Arithmetic operations on rows and columns
c. Labeled axes (rows and columns)
d. All of the above
Answer: d. All of the above
14. When we create DataFrame from List of Dictionaries, then number of columns in
DataFrame isequal to the _______
a. maximum number of keys in first dictionary of the list
b. maximum number of different keys in all dictionaries of the list
c. maximum number of dictionaries in the list
d. None of the above
Answer: b. maximum number of different keys in all dictionaries of the list
15. When we create DataFrame from List of Dictionaries, then dictionary keys will
become ______
(i) Column labels (ii) Row labels (iii) Both of the above (iv) None of the above
Answer: (i) Column labels
16. Which method is used to access vertical subset of a dataframe?
(i) iterrows() (ii) iteritems() (iii) itercolumns() (iv) itercols()
Answer: (ii) iteritems()
17. Write statement to transpose dataframe DF.
(i) DF.t (ii) DF.transpose (iii)DF.T (iv)DF.T( )
Answer: (iii)DF.T
18. In DataFrame, by default new column added as the _____________ column
a. First (Left Side) b. Second c. Last (Right Side) d. Any where in dataframe
ANS: Last (Right Side)
19. We can add a new row to a DataFrame using the _____________ method
(i) rloc[ ] (ii) loc[ ] (iii) iloc[ ] (iv) None of the above
ANS: (ii) loc[ ]
20. Which among the following options can be used to create a DataFrame in Pandas ?
(a) A scalar value (b) An ndarray (c) A python dict (d) All of these
ANS:- (d) All of these
21. Write short code to show the information having city=”Delhi” from dataframe
SHOP.
(a) print(SHOP[City==’Delhi’]) (b) print(SHOP[SHOP.City==’Delhi’])
(c) print(SHOP[SHOP.’City’==’Delhi’]) (d) print(SHOP[SHOP[City]==’Delhi’])
ANS: (b) print(SHOP[SHOP.City==’Delhi’])
22. Which of the following commands is used to install pandas?
(i)pip install python –pandas (ii)pip install pandas (iii)python install python
(iv)python install pandas
ANS: (ii) pip install pandas
23. Which attribute of a dataframe is used to get number of axis?
a.T b.Ndim c.Empty d.Shape
ANS: b.Ndim
24. Display first row of dataframe ‘DF’
(i) print(DF.head(1)) (ii) print(DF[0 : 1]) (iii)print(DF.iloc[0 : 1]) (iv)All of the above
ANS: (iv)All of the above
25. To delete a column from a DataFrame, you may use statement.
(a) remove (b) del (c) drop (d) cancel statement.
ANS:- (b) del & (c) drop
26. In given code dataframe ‘Df1’ has ________ rows and _______ columns
import pandas as pd
dict= [{‘a’:10, ‘b’:20}, {‘a’:5, ‘b’:10, ‘c’:20},{‘a’:7, ‘d’:10, ‘e’:20}]
Df1 = pd.DataFrame(dict)
(i) 3, 3 (ii) 3, 4 (iii)3, 5 (iv)None of the above
ANS: (iii)3, 5
27. In the following statement, if column ‘mark’ already exists in the DataFrame ‘Df1’
then the assignment statement will __________ Df1['mark'] = [95,98,100] #There
are only three rows in DataFrame Df1
(i) Return error (ii) Replace the already existing values.
(iii)Add new column (iv)None of the above
ANS: (ii) Replace the already existing values.
28. Which of the following statement is false:
i. DataFrame is size mutable ii. DataFrame is value mutable
iii. DataFrame is immutable iv. DataFrame is capable of holding multiple types of data
ANS:- iii. DataFrame is immutable
29. To delete a row, the parameter axis of function drop( ) is assigned the value
______________
(i) 0 (ii) 1 (iii) 2 (iv) 3
ANS: (i) 0
30. Write code to delete rows those getting 5000 salary.
(a) df=df.drop[salary==5000] (b) df=df[df.salary!=5000]
(c) df.drop[df.salary==5000,axis=0] (d) df=df.drop[salary!=5000]
ANS: (b) df=df[df.salary!=5000]
31. DF1.loc[ ] method is used to ______ # DF1 is a DataFrame
(i) Add new row in a DataFrame ‘DF1’ (ii) To change the data values of a row to a
particular value (iii)Both of the above (iv)None of the above
ANS: (iii)Both of the above
32. To iterate over horizontal subsets of dataframe,
(a) iterate( ) (b) iterrows( ) function may be used. (c) itercols( ) (d) iteritems( )
ANS:- (b) iterrows( ) function may be used.
33. Write code to delete the row whose index value is A1 from dataframe df.
(a) df=df.drop(‘A1’) (b) df=df.drop(index=‘A1’) (c) df=df.drop(‘A1,axis=index’)
(d) df=df.del(‘A1’)
ANS: (a) df=df.drop(‘A1’)
34. A two-dimension labeled array that is an ordered collection of columns to store
heterogeneous data type is
i. Series ii. Numpy array iii.Dataframe iv. Panel
ANS:- iii. Dataframe
35. In Pandas _______________ is used to store data in multiple columns.
(i)Series (ii) DataFrame (iii) Both of the above (iv) None of the above
ANS: (ii) DataFrame
36. What is dataframe?
a. 2 D array with heterogeneous data b. 1 D array with homogeneous data
c. 2 D array with homogeneous data d. 1 D array with heterogeneous data
ANS: a. 2 D array with heterogeneous data
37. In a DataFrame, Axis= 1 represents the_____________ elements
(a) Row (b) Column (c) True (d) False
ANS: (b) Column
38. Which of the following is not an attribute of a DataFrame Object ?
a. index b. Index c. size d. value
ANS: b. Index
39. To get top 5 rows of a dataframe, you may use
(a) head( ) (b) head(5) (c) top( ) (d) top(5)
ANS:- (a) head( ) , b) head(5)
40. In a DataFrame, Axis= 1 represents the_____________ elements
(a) Row (b) Column (c) True (d) False
ANS: (b) Column
41. NaN stands for:
a. Not a Number b. None and None c. Null and Null d. None a Number
ANS: a. Not a Number
42. The following code create a dataframe named ‘Df1’ with _______________
columns.
import pandas as pd
Df1 = pd.DataFrame([10,20,30] )
(i) 1 (ii) 2 (iii) 3 (iv) 4
ANS: (i) 1
43. Write the single line command to delete the column “marks” from dataframe df
using drop function.
(a) df=df.drop(col=‘marks’) (b) df=df.drop(‘marks’,axis=col)
(c) df=df.drop(‘marks’,axis=0) (d) df=df.drop(‘marks’,axis=1)
ANS: (d) df=df.drop(‘marks’,axis=1)
44. The following statement will _________
df = df.drop(['Name', 'Class', 'Rollno'], axis = 1) #df is a DataFrame object
a. delete three columns having labels ‘Name’, ‘Class’ and ‘Rollno’
b. delete three rows having labels ‘Name’, ‘Class’ and ‘Rollno’
c. delete any three columns
d. return error
ANS:- a. delete three columns having labels ‘Name’, ‘Class’ and ‘Rollno’
45. Difference between loc() and iloc().
a. Both are Label indexed based functions.
b. Both are Integer position-based functions.
c. loc() is label based function and iloc() integer position based function.
d. loc() is integer position based function and iloc() index position based function.
ANS: c. loc() is label based function and iloc() integer position based function.
46. Which command will be used to delete 3 and 5 rows of the data frame. Assuming
the data frame name as DF.
a. DF.drop([2,4],axis=0) b. DF.drop([2,4],axis=1) c. DF.drop([3,5],axis=1) d. DF.drop([3,5])
ANS: a DF.drop([2,4],axis=0)
47. Assuming the given structure, which command will give us the given output:
Output Required: (3,4)