1
Data Frames
Pandas library
1-Pandas library
2-Notepade.csv
3-Data frames
4-Reading from excel file and .csv file
5-The writer function
Reading data from excel file
You have to create this excel file, then read it and display
it in the output using python
Book4.xlsx excel file
a b c
0 1 5 3
1 1 6 8
2 2 3 7
3 3 1 8
Code to read the file, from excel
Page 1
2
You need to import some libraries that are
shown below. To install these libraries use
the pip3
#use pip3 in command window to install
# 1- pandas 2- openpyxl
# pip3 install pandas
# pip3 install openyxl
Example
Example Write the script
below to read
For excel file
import pandas as pd
df = pd.read_excel('book4.xlsx')
print(df)
print('done')
Page 2
3
###################################
For notepad with .csv extension
Create notepad commas separated data
as shown below.
When you save it by going to all
files, then add the extension .csc
Example The code to read notepad.csv
import pandas as pd
df = pd.read_csv('data2.csv')
print(df)
print('done')
Page 3
4
Data Frames
Python dataframe is a data structure constructed with rows and columns,
similar to a database or Excel spreadsheet. It consists of a dictionary of lists in
which the list each have their own identifiers or keys, such as “last name” or
“food group.”
—------------
Example
d={"Duration":{
"0":60,
"1":60,
"2":60,
"3":45,
},
"Maxpulse":{
"0":130,
"1":145,
"2":135,
"3":175,
},
"Calories":{
"0":409.1,
"1":479.0,
"2":340.0,
"3":282.4,
}
}
import json
with open("dd.json", 'w') as fout:
json_dumps_str = json.dumps(d, indent=4)
print(json_dumps_str, file=fout)
import pandas as pd
df = pd.read_json('dd.json')
Page 4
5
print(df.to_string())
Output
Duration Maxpulse Calories
0 60 130 409.1
1 60 145 479.0
2 60 135 340.0
3 45 175 282.4
To create a dataframe you must first create a dictionary.
These are the dictionary methods we use
Method Usage
Values( Return a list of all values in the dictionary
)
Update( Updates the dictionary with the specified key-value pairs
)
setdefa Returns the value of the specified key. If the key does not exist
ult() insert the key, with the specified value
Page 5
6
clear() Removes all the elements from the dictionary
keys() Returns a list containing the keys of the dictionary
pop() Removes the element with the specified key
popitem Removes the last inserted key-value pair
()
get() Returns the value of the specified key
items() Returns a list containing a tuple for each key value pair
copy() Returns a copy of the dictionary
Page 6
7
fromkey Returns a dictionary with the specified keys and value
s()
This how we create a
dictionary and print header
import pandas as pd
import numpy as np
df={
'First name':['Ali','Fred'],
'Last name':['Baba','Json'],
'Visit':['march,19','March,25'],
Page 7
8
'Leave':['April,20,Aprile,26']
}
# Get the list of all column names from headers
for column_headers in df:
print(column_headers, end='\t')
Page 8
9
Example:column by column
generate a data frame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','30days', None,np.nan],
'Discount':[1000,2300,1000,1200,2500]
}
df = pd.DataFrame(technologies)
print(df)
#notice the column generated in the output
# To Get the list of all column
names from headers
column_headers = list(df.columns.values)
print("The Column Header :", column_headers)
Output:
The Column Header : ['Courses', 'Fee', 'Duration', 'Discount']
Page 9
10
#Example Get the list of all column names
from headers
column_headers = df.columns.values.tolist()
print("The Column Header :", column_headers)
output
The Column Header : ['Courses', 'Fee', 'Duration', 'Discount']
Page 10
11
To reset, and delete
rows
import pandas as pd
# Create DataFrame from dict
df =
pd.DataFrame({'Courses':['Spark','PySpark','Java','PHP'
],
'Fee':[20000,20000,15000,10000],
'Duration':['35days','35days','40days','30days']})
print(df)
df=df.drop([2])
print(df)
df2=df.reset_index()
print(df2)
output
Courses Fee Duration
0 Spark 20000 35days
1 PySpark 20000 35days
2 Java 15000 40days
3 PHP 10000 30days
Courses Fee Duration
0 Spark 20000 35days
1 PySpark 20000 35days
3 PHP 10000 30days
index Courses Fee Duration
0 0 Spark 20000 35 days
1 1 PySpark 20000 35 days
Page 11
12
2 3 PHP 10000 30 days
#Use .scv file to drop a colun
import pandas as pd
df = pd.read_csv('data2.csv')
print(df)
df2=df.drop([2])
print(df2)
print('done')
a b c
0 1 2 3
1 2 5 7
2 3 8 3
a b c
0 1 2 3
1 2 5 7
Done
Page 12
13
0 1 5 3
1 1 6 8
2 2 3 7
3 3 1 8
a b c
0 1 5 3
1 1 6 8
3 3 1 8
Example Write Excel with Python
Pandas. Excel file will be created
You can write any data (lists, strings, numbers etc) to Excel, by first converting
it into a Pandas DataFrame and then writing the DataFrame to Excel.
To export a Pandas DataFrame as an Excel file (extension: .xlsx, .xls), use
the to_excel() method.
Page 13
14
Install the following library
$ pip install xlwt
$ pip install openpyxl
Importing openpyxl is required if you want to append it to an existing Excel file
described at the end.
import pandas as pd
import openpyxl
df = pd.DataFrame([[11, 21, 31], [12, 22, 32], [31, 32, 33]],
index=['one', 'two', 'three'], columns=['a', 'b', 'c'])
print(df)
# a b c
# one 11 21 31
# two 12 22 32
# three 31 32 33
You can specify a path as the first argument of the to_excel() method.
Note: that the data in the original file is deleted when overwriting.
The argument new_sheet_name is the name of the sheet. If omitted, it will be
named Sheet1.
Output is the excel file that is created
Page 14
15
import pandas as pd
import openpyxl
df = pd.DataFrame([[11, 21, 31], [12, 22, 32], [31, 32,
33]],
index=['one', 'two', 'three'],
columns=['a', 'b', 'c'])
print(df)
#writing to excel
df.to_excel('pandas_to_excel.xlsx',
sheet_name='new_sheet_name')
#If you do not need to write index (row name), columns
(column name),
#the argument index, columns is False
#df.to_excel('xxx_no_index_header.xlsx', index=False,
header=False)
#then use the ExcelWriter() function like
this:
Page 15
16
with pd.ExcelWriter('pandas_to_excel.xlsx') as writer:
df.to_excel(writer, sheet_name='sheet1')
df.to_excel(writer, sheet_name='sheet2')
#You don’t need to call writer.save(), writer.close()
within the blocks.
Page 16