0% found this document useful (0 votes)

48 views23 pages

Loading Pandas

The document provides a comprehensive guide on using the Pandas library in Python for data manipulation, including reading files, creating Series and DataFrames, performing arithmetic operations, and querying data. It also covers data preprocessing techniques, handling missing values, and visualizing data using Matplotlib and Seaborn. Key functionalities such as indexing, slicing, and plotting are illustrated with code examples.

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views23 pages

Loading Pandas

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

‫نحوه خواندن فایل‬

import pandas as pd
df = pd.read_csv('file_location\filename.txt', delimiter = "\t")
‫یا‬

data = pd.read_csv('output_list.txt', sep=" ", header=None)

data.columns = ["a", "b", "c", "etc."]
‫یا‬

df = pd.read_csv('output_list.txt', sep=" ", header=None, names=["a", "b", "c"])

Pandas
we're going to deepen our investigation to how Python can be used to manipulate,
clean, and query data by looking at the Pandas data tool kit

 The pandas Series

 The pandas is the base data structure of pandas. A series is similar to a NumPy
array, but it differs by having an index, which allows for much richer lookup of
items instead of just a zero-based array index value.

import pandas as pd
d=pd.Series([11,12,13,14])
d
 Multiple items can be retrieved by specifying their labels in a Python list.

import pandas as pd
d[[1,3]]
Pandas: Series

d=pd.Series([11,12,13,14],index=['a','b','c','d'])
d[['a','b']] or d[[0,1]]

 We can examine the index of a using the property:

d=d.index

 Two objects can be applied to each other with an arithmetic operation

d1=pd.Series([11,12,13,14],index=['a','b','c','d'])
d2=pd.Series([1,2,3,5],index=[‘a',‘b','c','d'])
diff=d1-d2
print(diff)

diff.mean()
diff
Pandas: DataFrame

 A pandas series can only have a single value associated with each index label.

 To have multiple values per index label we can use a data frame. A data frame
represents one or more objects aligned by index label.

 Each series will be a column in the data frame, and each column can have an
associated name

d1=pd.Series([11,12,13,14],index=['a','b','c','d'])
d2=pd.Series([1,2,3,4],index=['a','b','c','d'])
temp_df=pd.DataFrame({'value1':d1,'value2':d2})
temp_df
 Columns in a object can be accessed using an array indexer with the name
of the column or a list of column names

temp_df['value1']
temp_df[['value1','value2']]
Pandas: DataFrame

 Passing a list to the [] operator of DataFrame retrieves the specified columns

whereas a Series would return rows.

 new column can be added to DataFrame simply by assigning another Series to a

column using the array indexer [] notation

temp_dfs=pd.DataFrame()
g=temp_df['value1']-temp_df['value2']
print(g)
temp_df['diff']=temp_df['value1']-temp_df['value2']
temp_df
 The names of the columns in a DataFrame are accessible via the columns
property
temp_df.columns
Pandas: DataFrame

 The DataFrame and Series objects can be sliced to retrieve specific rows

temp_df [0:3]

temp_df.value1[0:3]

 Entire rows from a data frame can be retrieved using the .loc and .iloc properties.
.loc ensures that the lookup is by index label, where .iloc uses the 0-based position.

temp_df.loc['a']

temp_df.iloc[0]

temp_df.iloc[[1,3,5,7]].column_Name
Pandas: DataFrame
 The following code shows values in the IMO column that are greater than 7

df2.IMO>7

 Loading data from files into a DataFrame

import pandas as pd
df2 = pd.read_excel('2010.xlsx')
Df2=pd.read_csv('2010.csv')

 Get type of column

type(df2.IMO[0])
Pandas: DataFrame
 For traversing DataFrame (transposed), we use T assign

df2=df2.T

 Loading data from row

df2.loc[['IYR','IMO']]

df2.loc['IYR'][0]
Pandas: DataFrame
 Deleting data from DataFrames using drop for rows or del for columns

df2.drop('IYR')
del df2['IMO']
df = df.drop(['IMO''], axis=1) # axis is important
 Add column to DataFrames

df2['IMO']=0

 Read data from DataFrames

df2['IMO']=df2['IMO']+2
Query for DataFrames
 If you want accidents in months that is bigger than 6, we should write code below:

df2['IMO']>6

 Now mask the answers by where attribute:

dfbigger=df2.where(df2['IMO']>6)

dfbigger=df2[(df2['IMO']>6) & (df2['DAY']>10)]

dfbigger

 Set or reset index for DataFrames

dfbigger=dfbigger.set_index('IYR')
print(dfbigger)
dfbigger=dfbigger.reset_index('IYR')
dfbigger
DataFrames: preProcess
 Count non-NA cells for each column or row

df2.count(axis=0, numeric_only=False)

 Get numeric columns or object columns

df2.dtypes
df2._get_numeric_data().columns
df2.select_dtypes(include=['object'])

df4=df2.select_dtypes(include=['object'])
df2[~df2.isin(df4)]

 Find empty cell and replace with nan

DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')

df2 = df2.replace(r'\s',np.nan, regex=True)

Plot

 matplotlib.pyplot is a collection of command style functions that make matplotlib

work like MATLAB
import matplotlib.pyplot as plt
Plt.plot([1,2,3], [1,2,3], 'go-', linewidth=2)
Plt.plot([1,2,3], [1,4,9], 'rs', markersize=14)
plt.show()

 another sample

import numpy as np
import matplotlib.pyplot as plt
t = np.arange(0., 5., 0.2)
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
plt.title('some values')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(['t','t**2','t**3'])
plt.show()
plot

 another sample

t=np.arange(0,5,0.2)
df=pd.DataFrame({0:t , 1:t**1.5 , 2:t**2 , 3:t**2.5 , 4:t**3})
legend_labels=['Solid' , 'Dashed' , 'Dotted' , 'Dot-dashed' , 'Points']

df.plot(style=['r-','g--', 'b:', 'm-.' , 'k:'])

plt.legend(legend_labels )
plt.show()
matplotlib.pyplot.subplot

matplotlib.pyplot.subplots return an instance of Figure and an array of (or a single) Axes (array or
not depends on the number of subplots)
matplotlib.pyplot.subplot(*args, **kwargs)
import matplotlib.pyplot as plt
import numpy as np

# Simple data to display in various forms

x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x ** 2)

plt.close('all')

# Just a figure and one subplot

f, ax = plt.subplots()
ax.plot(x, y)
ax.set_title('Simple plot')
plt.show()
matplotlib.pyplot.subplot

 A scatter plot displays the correlation between a pair of variables

 Define two subplot

f, axarr = plt.subplots(2, sharex=True)
axarr[0].plot(x, y)
axarr[0].set_title('Sharing X axis')
axarr[1].scatter(x, y)
plt.show()

 Define two subplot in one row

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)

ax1.plot(x, y)
ax1.set_title('Sharing Y axis')
ax2.scatter(x, y)
plt.show()
matplotlib.pyplot.subplot

 Define three subplot sharing both x and y axes

f, (ax1, ax2, ax3) = plt.subplots(3, sharex=True, sharey=True)

ax1.plot(x, y)
ax1.set_title('Sharing both axes')
ax2.scatter(x, y)
ax3.scatter(x, 2 * y ** 2 - 1, color='r')
plt.show()
matplotlib.pyplot.subplot

 Define Four axes, returned as a 2-d array

f, axarr = plt.subplots(2, 2)
axarr[0, 0].plot(x, y)
axarr[0, 0].set_title('Axis [0,0]')
axarr[0, 1].scatter(x, y)
axarr[0, 1].set_title('Axis [0,1]')
axarr[1, 0].plot(x, y ** 2)
axarr[1, 0].set_title('Axis [1,0]')
axarr[1, 1].scatter(x, y ** 2)
axarr[1, 1].set_title('Axis [1,1]')
plt.show()
Calculate correlation by seaborn package

1-
colNames = ["Age", "type_employer", "fnlwgt", "Education", "Education-Num", "Martial","Occupation",
"Relationship", "Race", "Sex", "Capital Gain", "Capital Loss",
"H-per-week", "Country", "Label"]
data = pd.read_csv("adult-data.txt", names=colNames,delimiter=',',header=None)
data

2- conda install seaborn

3-
import seaborn as sns
%matplotlib inline
sns.heatmap(data.corr())
plt.show()
 Read data
from sklearn import preprocessing
import pandas as pd
df2 = pd.read_excel('2010.xlsx')
df2

 Show Numeric Columns

df2.select_dtypes(include=[np.number])

 Replace empty cells with Nan value

df2 = df2.replace(r'\s',np.nan, regex=True)
 Drop all empty columns
df2=df2.dropna(axis='columns', how='all')
#df2.isnull().mean()
df2.fillna(df2.mean(),inplace=True)

Drop all empty columns with threshshold <0.5

#df2.columns[df2.isnull().mean() < 0.8]

df2=df2[df2.columns[df2.isnull().mean() < 0.5]]
Find Missing values

 Now let's see if we have any missing value

df2.isnull()
df2.notnull()
df2.isnull()[15:20]
 It is possible to drop rows with NanValue:

df2 = df2.dropna()
df2=df2.dropna(axis='columns', how='all') //rows

 If a Column like IMO2 is all Nan, we can drop it:

df2 = df2.drop(['IMO2'], axis=1)

 Show the summery of null value for each columns

df2.isnull().sum()
Delete Missing values or replace

 Fill all nan columns with mean

df2.fillna(df2.mean(),inplace=True)

 if a column like IYR of some accidents are NaN in our dataset. Let's
change NaN to mean value of

df2.IYR.iloc[[1, 2, 3]] =np.nan // df2.at[{0,11,12,13,14,15,16}, 'IYR']=np.nan

df2=df2.fillna({'IYR': df['IYR'].mean()})
df2[1:10]

Pandas
No ratings yet
Pandas
21 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Ip Study
No ratings yet
Ip Study
18 pages
Unit IV
No ratings yet
Unit IV
49 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
32 pages
Unit III - Notes
No ratings yet
Unit III - Notes
12 pages
Module 6
No ratings yet
Module 6
48 pages
Unit 2
No ratings yet
Unit 2
81 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Subject IP
No ratings yet
Subject IP
9 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Unit2 - Pandas - Jupyter Notebook
No ratings yet
Unit2 - Pandas - Jupyter Notebook
10 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas
No ratings yet
Pandas
5 pages
PANDAS
No ratings yet
PANDAS
24 pages
Pandas Python Library Guide
No ratings yet
Pandas Python Library Guide
54 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Introduction To Pandas & Data Structures
No ratings yet
Introduction To Pandas & Data Structures
11 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas
No ratings yet
Pandas
8 pages
Unit 1 Python Pandas
No ratings yet
Unit 1 Python Pandas
20 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas
No ratings yet
Pandas
63 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
6 pages
Line by Line 12 IP
No ratings yet
Line by Line 12 IP
21 pages
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
No ratings yet
CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only
15 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Pandas
No ratings yet
Pandas
29 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
Pandas
No ratings yet
Pandas
27 pages
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
No ratings yet
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
30 pages
Pandas
No ratings yet
Pandas
13 pages
Ap Python
No ratings yet
Ap Python
12 pages
Pandas
No ratings yet
Pandas
44 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Creation of Series Using List, Dictionary & Ndarray
No ratings yet
Creation of Series Using List, Dictionary & Ndarray
65 pages
Python Unit 3 4
No ratings yet
Python Unit 3 4
92 pages
Python Pandas
No ratings yet
Python Pandas
21 pages
CPE221 (2023-2024) - Lesson 3 - Pandas
No ratings yet
CPE221 (2023-2024) - Lesson 3 - Pandas
10 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
Pandas DataFrame Basics
No ratings yet
Pandas DataFrame Basics
48 pages
Unit6 - Working With Data
No ratings yet
Unit6 - Working With Data
29 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
Pandas
No ratings yet
Pandas
12 pages
Pandas
No ratings yet
Pandas
49 pages
P E O A: Hilippine Agle Ptimization Lgorithm
No ratings yet
P E O A: Hilippine Agle Ptimization Lgorithm
34 pages
Clustering
No ratings yet
Clustering
1 page
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
بارگذاری فایل
No ratings yet
بارگذاری فایل
2 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
KNN in Python
No ratings yet
KNN in Python
11 pages
Subdivision
No ratings yet
Subdivision
5 pages
08ClassBasic v1
No ratings yet
08ClassBasic v1
46 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
03 Preprocessing
No ratings yet
03 Preprocessing
60 pages
01 Laurie Stephey
No ratings yet
01 Laurie Stephey
14 pages
02 CUDA Shared Memory
No ratings yet
02 CUDA Shared Memory
21 pages
Cloud Architect Roadmap
No ratings yet
Cloud Architect Roadmap
3 pages
Blender Interface Guide & Tips
No ratings yet
Blender Interface Guide & Tips
6 pages
Applications of Matrices To Cryptography
100% (1)
Applications of Matrices To Cryptography
27 pages
Increase Fps Android Xda
No ratings yet
Increase Fps Android Xda
3 pages
Final Document
No ratings yet
Final Document
82 pages
Direct Input Device Guide
0% (1)
Direct Input Device Guide
4 pages
Introduction To Internet & World Wide Web
No ratings yet
Introduction To Internet & World Wide Web
27 pages
Hash Tables: Concepts and Applications
No ratings yet
Hash Tables: Concepts and Applications
15 pages
DDMQBA
No ratings yet
DDMQBA
27 pages
What's New in The FTM 2019 24.2 Update
No ratings yet
What's New in The FTM 2019 24.2 Update
3 pages
Assign A Secondary Time Manager For Primary Time Managers
No ratings yet
Assign A Secondary Time Manager For Primary Time Managers
20 pages
(Ebook) Puppet 4 Essentials, 2nd Edition: Acquire Skills To Manage Your IT Infrastructure Effectively With Puppet by Felix Frank, Martin Alfke ISBN 9781785881107, 1785881108 PDF Download
100% (1)
(Ebook) Puppet 4 Essentials, 2nd Edition: Acquire Skills To Manage Your IT Infrastructure Effectively With Puppet by Felix Frank, Martin Alfke ISBN 9781785881107, 1785881108 PDF Download
39 pages
Good Games
No ratings yet
Good Games
11 pages
Manual
No ratings yet
Manual
10 pages
Interview Question Clinical Trials
100% (3)
Interview Question Clinical Trials
20 pages
Digital Marketing PPT Slides
83% (6)
Digital Marketing PPT Slides
18 pages
NM Python Programs
No ratings yet
NM Python Programs
11 pages
NetApp SnapMirror Strategic Customer Presentation PDF
No ratings yet
NetApp SnapMirror Strategic Customer Presentation PDF
20 pages
Resource Related Billing Document
No ratings yet
Resource Related Billing Document
9 pages
Aimbot Ahk
33% (3)
Aimbot Ahk
3 pages
Financial Performance Summary
No ratings yet
Financial Performance Summary
8 pages
Botany Sol (Poll 2024)
No ratings yet
Botany Sol (Poll 2024)
36 pages
GXT6000 230
No ratings yet
GXT6000 230
29 pages
新电影评论和评分
100% (2)
新电影评论和评分
7 pages
Cissp Study Guide 1638557362368
No ratings yet
Cissp Study Guide 1638557362368
73 pages
Rukmini Basic Science and Engineering Drawing RRB ALP CBT 2 Hindi
No ratings yet
Rukmini Basic Science and Engineering Drawing RRB ALP CBT 2 Hindi
378 pages
Resume - Abhilash
No ratings yet
Resume - Abhilash
2 pages
JazzScheme: Lisp for Enterprise Dev
No ratings yet
JazzScheme: Lisp for Enterprise Dev
10 pages
SGW1-IA3-MMP - Modbus Multiplexer Exemys
No ratings yet
SGW1-IA3-MMP - Modbus Multiplexer Exemys
23 pages
Installation Oracle RAC - AIX
No ratings yet
Installation Oracle RAC - AIX
48 pages

Loading Pandas

Uploaded by

Loading Pandas

Uploaded by

‫نحوه خواندن فایل‬

data = pd.read_csv('output_list.txt', sep=" ", header=None)

df = pd.read_csv('output_list.txt', sep=" ", header=None, names=["a", "b", "c"])

 The pandas Series

 We can examine the index of a using the property:

 Two objects can be applied to each other with an arithmetic operation

 Passing a list to the [] operator of DataFrame retrieves the specified columns

 new column can be added to DataFrame simply by assigning another Series to a

 Loading data from files into a DataFrame

 Get type of column

 Loading data from row

 Read data from DataFrames

 Now mask the answers by where attribute:

dfbigger=df2[(df2['IMO']>6) & (df2['DAY']>10)]

 Set or reset index for DataFrames

 Get numeric columns or object columns

 Find empty cell and replace with nan

DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')

df2 = df2.replace(r'\s',np.nan, regex=True)

 matplotlib.pyplot is a collection of command style functions that make matplotlib

df.plot(style=['r-','g--', 'b:', 'm-.' , 'k:'])

# Simple data to display in various forms

# Just a figure and one subplot

 A scatter plot displays the correlation between a pair of variables

 Define two subplot

 Define two subplot in one row

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)

 Define three subplot sharing both x and y axes

f, (ax1, ax2, ax3) = plt.subplots(3, sharex=True, sharey=True)

 Define Four axes, returned as a 2-d array

2- conda install seaborn

 Show Numeric Columns

 Replace empty cells with Nan value

Drop all empty columns with threshshold <0.5

#df2.columns[df2.isnull().mean() < 0.8]

 Now let's see if we have any missing value

 If a Column like IMO2 is all Nan, we can drop it:

df2 = df2.drop(['IMO2'], axis=1)

 Show the summery of null value for each columns

 Fill all nan columns with mean

df2.IYR.iloc[[1, 2, 3]] =np.nan // df2.at[{0,11,12,13,14,15,16}, 'IYR']=np.nan

You might also like