0% found this document useful (0 votes)

147 views9 pages

Content Pandas Cheat Sheet

This document provides an overview of key pandas functionality for reading, writing, manipulating, and analyzing data in Python. It covers topics such as installing pandas, importing data from files, creating and accessing DataFrames and Series, descriptive statistics, filtering and grouping data, joining DataFrames, and cleaning data. The document serves as a helpful introduction and reference guide for common pandas operations.

Uploaded by

Turya Ganguly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

147 views9 pages

Content Pandas Cheat Sheet

Uploaded by

Turya Ganguly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1.

Installation and Importing

Installing pip install pandas

Importing Convention import pandas as pd

2. Reading and Writing data

Reading data df = pd.read_csv('filename.csv')

# can extend for json, excel

types too using
pd.read_json/pd.read_excel, etc.

Writing data df.to_csv('filename.csv')

# can extend for json, excel and

such too using
df.to_json/df.to_excel, etc.

3. Series and Dataframes

a. Creating a series
pd.Series([‘a’, ‘b’, ‘c’])
b. Creating a dataframe

Row pd.DataFrame([['a', 1], ['b', 2]],

Oriented columns=['name', 'id'])

Column pd.DataFrame({'name':['a', 'b'], 'id':[1,

Oriented 2]})

4. Info extraction

Shape (Return a tuple representing the df.shape

dimensionality of the DataFrame.)
# e.g.-(2,3) for
2 rows and 3
columns

Head (first n rows, default 5) df.head(n)

Tail (last n rows, default 5) df.tail(n)

Info (return info of all columns) df.info()

Describe (gives statistical information of df.describe()

data)

5. Accessing and Indexing

a. Direct accessing columns and rows, as well as both together

Accessing a row df.loc[ei]

# ei here is explicit index
df.iloc[ii]
# ii here is implicit index

Accessing a column df['column_name']

# for single column

df[['col1', 'col2']]
# for multiple columns

b. Slicing

Row df.loc[1:3]
# 1 and 3 are the explicit
indices here

df.iloc[2:4]
# 2 and 4 are the implicit
indices here

Column df.loc[:, 'a':'b']

Both rows and columns df.loc[1:3, 'a':'b']
# 1 and 3 are explicit
indices here

c. Feature exploration (masking, filtering)

Masking df['col']>value
Creates a mask based on our # E.g.
required condition df['age'] > 30

Filtering df.loc[(df['col1'] ==
Filters data based on conditions val1) & (df['col2']
==val2)]

# E.g
df.loc[(df['month'] ==
'January') &
(df['year']=='2022')]
# filters out data for
january 2022

6. Dataframe Manipulation
a. Adding a new row/column

Row df.loc[(df['col1'] == val1) &

(df['col2'] ==val2)]

# E.g:.
df.loc[len(df.index)] = ['a', 1]

# this will add a row at the end

of the dataframe

Column df['new_col']=data
b. Deleting a new row/column

Row df.drop(labels=None, axis=0)

# E.g.
df.drop(3, axis=0)
# Here 3 is the explicit index,
axis=0 is for row

Column df.drop('col_name', axis=1)

c. Renaming a column

Column df.rename({'old_name':'new_name',
axis=1})

Row df.index=new_indices

d. Duplicates and dropping duplicates

i. Find duplicate rows
df.duplicated(subset=None, keep='first')
# subset can be used to specify certain column(s) for
identifying the duplicates
# keep determines which duplicates to mark
first : Mark duplicates as True except for the first
occurrence.
last : Mark duplicates as True except for the last
occurrence.
False : Mark all duplicates as True.

# Returns a boolean series for each duplicate row marked

as True

ii. Drop duplicate values

df.drop_dupicates(subset=None, keep=’first’)

# Parameters have the same meaning as in df.duplicated, except

here it will drop the rows marked duplicate
7. Operations
a. Sorting
df.sort_values([‘col1’], ascending=[True])
b. Built in ops
● Built in ops such as mean, min, max, etc.
● E.g., df[‘col1’].min(), df[‘col1’].count(), etc.
c. Apply
Applies a function along one of the axis of the dataframe
df[‘col’].apply(function)

E.g.
data[['revenue', 'budget']].apply(np.sum, axis=1)
#sums values of revenue and budget across each row
8. Joins

9.
a. Concat
pd.concat([df1, df2], axis = 0] (for concatenating horizontally, change axis
= 1)
b. Merge
df1.merge(df2, on=’foreign_key’, how=’type_of_join’)
● Optional -> left_on and right_on
● Eg. df1.merge(df2, on=’id’, how=’inner’)
10. Groupby

Grouping based on a single df.groupby(‘group_col_name’)[‘col(s)’].

aggregate aggregate_function()

E.g.
df.groupby(‘director_name’)[‘title’].cou
nt()
# Finds number of titles per director

Grouping based on multiple df.groupby([‘group_col_name’])[‘col’].a

aggregates ggregate([‘func1’, ‘func2’])

E.g.
df.groupby(['director_name'])["year"].a
ggregate(['min', 'max'])
# Finds first and recent year of movies
made by all directors

Group based filtering df.groupby(‘group_col_name’).filter(bo

olean array based oncondition)

E.g.
data.groupby('director_name').filter(la
mbda x: x["budget"].max() >= 100)

# This filters all rows of those directors

whose maximum budget is greater
than 100 million)

Group based apply df.groupby(‘group_col_name’).apply(f

unction)

E.g.
def func(x):
x["risky"] = x["budget"] -
x["revenue"].mean() >= 0
return x
data_risky =
data.groupby("director_name").apply(f
unc)

# Finds movies whose budget is

higher than its director’s average
revenue

11. Cleaning our data

a. None and nan
● “NaN” is for columns with numbers as their values
● “None” is for columns with non-number entries(e.g. String, object
type, etc.)
● Can check for null values using “isna()”
○ E.g. df.isna() # returns the dataframe with True/False for
null values in the respective element’s position
○ df.isna().sum() # returns number of null values per column.
Can modify with df.isna().sum(axis=1) for each row’s null
count
○df.isna().sum().sum() # returns total number of null values
b. Filling null values
df.fillna(n) # fills null values with value ‘n’
c. Dropping null values
df.dropna(axis = 0)
# Default axis=0, use 1 for columns
# Drops rows/columns with even a single missing value

12. Data Restructuring

Melt pd.melt(df, id_vars=[‘list of

Convert dataframe from wide to columns’]
long format
E.g.
pd.melt(data, id_vars=['Date',
'Parameter', 'Drug_Name'])
# This will melt all the columns
except the ones mentioned inside
id_vars list

Pivot df.pivot(index=[‘list of columns],

Opposite of melt, converts columns=’col_name’,
dataframe from long to wide values=’col_name’)
format
Outputs a multi-index dataframe
E.g.
data_melt.pivot(index=['Date','Dru
g_Name','Parameter'], columns =
'time', values='reading')

# This will keep the index columns

mentioned as constant, while
making new columns from the
“time” column, whose values will
be the ones in the “reading”
column.

Cut df[‘new_cat_column’]=pd.cut(df[‘co
Bins continuous data into ntinous_col’],bins=bin_values,
categorical groups labels=label_values)
E.g.
data_tidy['temp_cat'] =
pd.cut(data_tidy['Temperature'],
bins=temp_points,
labels=temp_labels)

# This will bin the temperature

column into the respective bins,
and will label the bins as per
temp_labels

Shift df[‘col’].shift(periods=n, axis=0)

Shifts the values of rows/columns
E.g.
df["Marks"].shift(periods = 1, axis
= 0)
# This shifts the values of the
Marks col by one, so basically the
value of first row will be NaN,
second row will be the one of first
row, and so on.

13. Misc Topics

a. Datetime
i. Convert to Datetime object: pd.to_datetime(df[‘col’])
ii. Extracting Information

df[‘col’][0].year Extracts the year for the 0th index

value
Here 0 is the implicit index
Use .month and .day for the
respective data

df[‘col’].dt.year Extracts the year for whole

columns (all the datetime values)

df[‘col’][0].strftime(‘%M%Y’) Formats the select data (0th index

datetime value here) into the
required data time format (month
and year in this case)

b. String functions
We can use .str to apply string functions to any column
df[‘col’].str.function()

E.g.
i. data_tidy['Date'].str.split('-')
# This will split the “Date” column into elements separated by “-”
ii. data_tidy.loc[data_tidy['Drug_Name'].str.contains('hydrochloride')]
# Will filter out rows containing the string “hydrochloride”

Pandas Dataframe Cheat Sheet
No ratings yet
Pandas Dataframe Cheat Sheet
3 pages
Introduction To Pandas Programming 2
No ratings yet
Introduction To Pandas Programming 2
3 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
Pandas Notes
No ratings yet
Pandas Notes
3 pages
Pandas
No ratings yet
Pandas
2 pages
Pandas Cheat Sheet for Data Manipulation
No ratings yet
Pandas Cheat Sheet for Data Manipulation
1 page
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas Operations Guide
No ratings yet
Pandas Operations Guide
6 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
Pandas Trampas
No ratings yet
Pandas Trampas
9 pages
Pandas
No ratings yet
Pandas
13 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Pandas
No ratings yet
Pandas
5 pages
Test 1 Datasheet
No ratings yet
Test 1 Datasheet
3 pages
Python & Pandas for Beginners
No ratings yet
Python & Pandas for Beginners
7 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
7 pages
Python Cheatsheet
No ratings yet
Python Cheatsheet
2 pages
Pandas Library: Data Manipulation & Analysis Guide
No ratings yet
Pandas Library: Data Manipulation & Analysis Guide
9 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Unit IV
No ratings yet
Unit IV
49 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas Research
No ratings yet
Pandas Research
14 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
5 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Cheat Sheet
No ratings yet
Cheat Sheet
12 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas
No ratings yet
Pandas
26 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Python Chrat Book Pandas
No ratings yet
Python Chrat Book Pandas
4 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas
No ratings yet
Pandas
25 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
EDA Cheat Sheet
No ratings yet
EDA Cheat Sheet
7 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
The Racers Life
No ratings yet
The Racers Life
74 pages
HTML
No ratings yet
HTML
68 pages
Python Pandas Tutorial
96% (28)
Python Pandas Tutorial
178 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (18)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
The Python Bible
97% (31)
The Python Bible
506 pages
Python 3 Cheat Sheet
94% (51)
Python 3 Cheat Sheet
2 pages
HTML CSS JavaScript Basics
100% (7)
HTML CSS JavaScript Basics
225 pages
Python Cheat Sheet: Mosh Hamedani
100% (8)
Python Cheat Sheet: Mosh Hamedani
14 pages
HTML Note Imp HTML
No ratings yet
HTML Note Imp HTML
165 pages
Pandas Notes Design
No ratings yet
Pandas Notes Design
5 pages
HTML-Chapter 1 - 4
No ratings yet
HTML-Chapter 1 - 4
111 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Pandas Methods
No ratings yet
Pandas Methods
6 pages
Pandas Total Notes
100% (2)
Pandas Total Notes
66 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
91% (45)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
Data Visualization in Python Preview PDF
100% (9)
Data Visualization in Python Preview PDF
58 pages
Python Basics for Beginners
100% (4)
Python Basics for Beginners
26 pages
Python Tutorial
67% (3)
Python Tutorial
107 pages
CSS in 44 Minutes
100% (5)
CSS in 44 Minutes
44 pages
HTML Tutorial
No ratings yet
HTML Tutorial
63 pages
Python Basics for Beginners
100% (12)
Python Basics for Beginners
2 pages
Core Python Cheat Sheet
100% (4)
Core Python Cheat Sheet
9 pages
HTML Handbook
No ratings yet
HTML Handbook
74 pages
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
100% (9)
AI Publishing. Python Scikit-Learn For Beginners... For Data Scientist 2021
339 pages
Python Notes For Professionals
100% (18)
Python Notes For Professionals
814 pages
HTML Notes Class 10th Cbse
No ratings yet
HTML Notes Class 10th Cbse
22 pages
Python Excercises With Solutions
100% (3)
Python Excercises With Solutions
37 pages
Introduction To HTML & CSS
94% (35)
Introduction To HTML & CSS
155 pages
Top 50 Pandas Interview Questions and Answers (2024)
No ratings yet
Top 50 Pandas Interview Questions and Answers (2024)
34 pages
CSS Basics for CS Students
100% (3)
CSS Basics for CS Students
8 pages
Parametric Sweeps in Ads
No ratings yet
Parametric Sweeps in Ads
7 pages
Bengali Plagiarism Detection Tool
No ratings yet
Bengali Plagiarism Detection Tool
11 pages
Attainment Calculation Sheet
No ratings yet
Attainment Calculation Sheet
91 pages
Modaris V7R2 EN tcm31-216804
0% (1)
Modaris V7R2 EN tcm31-216804
2 pages
4NXCI Software Update Log
No ratings yet
4NXCI Software Update Log
3 pages
Chapter4 OK
No ratings yet
Chapter4 OK
39 pages
Web Technologies Unit II Notes
No ratings yet
Web Technologies Unit II Notes
6 pages
Fastener Standards
No ratings yet
Fastener Standards
8 pages
Implementation of CRYSTALS-Kyber Post-Quantum Algorithm Using RISC-V Processor
No ratings yet
Implementation of CRYSTALS-Kyber Post-Quantum Algorithm Using RISC-V Processor
4 pages
Iso 26866
No ratings yet
Iso 26866
20 pages
TASO Ethics Guidance Consent Form Template 4 2
No ratings yet
TASO Ethics Guidance Consent Form Template 4 2
2 pages
Starting From SCRATCH: An Introduction To Computing Science - Scratching The Surface
No ratings yet
Starting From SCRATCH: An Introduction To Computing Science - Scratching The Surface
9 pages
Basics of Telephony
100% (4)
Basics of Telephony
35 pages
FW F436 P PDF
No ratings yet
FW F436 P PDF
2 pages
Aadhaar Details for Mehtab
No ratings yet
Aadhaar Details for Mehtab
1 page
Pydroid
No ratings yet
Pydroid
3 pages
SWOT Analysis of Samsung Corporation LTD
100% (1)
SWOT Analysis of Samsung Corporation LTD
5 pages
Desing, Modeling and Evaluation of Protective Relay
No ratings yet
Desing, Modeling and Evaluation of Protective Relay
5 pages
24kV Single Core Underground Cable Specs
No ratings yet
24kV Single Core Underground Cable Specs
10 pages
Assignment Week5
No ratings yet
Assignment Week5
2 pages
NS-3 Installation in Ubuntu-NWS 2022
No ratings yet
NS-3 Installation in Ubuntu-NWS 2022
5 pages
Cisco Live Introduction To SRv6 uSID Technology-2
No ratings yet
Cisco Live Introduction To SRv6 uSID Technology-2
129 pages
Curriculum Vitae: Career Objective
No ratings yet
Curriculum Vitae: Career Objective
3 pages
Airspan MicroMAX Basic Link Setup Via Netspan (Step-By-Step) PDF
100% (2)
Airspan MicroMAX Basic Link Setup Via Netspan (Step-By-Step) PDF
19 pages
Java Unit Wise Questions
No ratings yet
Java Unit Wise Questions
4 pages
Project Cost Management Template
100% (3)
Project Cost Management Template
8 pages
02SOP-Outlook Android
No ratings yet
02SOP-Outlook Android
8 pages
Design Value Improves ROI
100% (1)
Design Value Improves ROI
7 pages
E Peas AEM10941 Datasheet Solar Energy Harvesting
No ratings yet
E Peas AEM10941 Datasheet Solar Energy Harvesting
25 pages
Wireshark Lab 1.2 Import and Examine PCAP File (V1.1)
No ratings yet
Wireshark Lab 1.2 Import and Examine PCAP File (V1.1)
9 pages

Content Pandas Cheat Sheet

Uploaded by

Content Pandas Cheat Sheet

Uploaded by

1.

Installation and Importing

Installing pip install pandas

Importing Convention import pandas as pd

2. Reading and Writing data

Reading data df = pd.read_csv('filename.csv')

# can extend for json, excel

Writing data df.to_csv('filename.csv')

# can extend for json, excel and

3. Series and Dataframes

Row pd.DataFrame([['a', 1], ['b', 2]],

Column pd.DataFrame({'name':['a', 'b'], 'id':[1,

Shape (Return a tuple representing the df.shape

Head (first n rows, default 5) df.head(n)

Tail (last n rows, default 5) df.tail(n)

Info (return info of all columns) df.info()

Describe (gives statistical information of df.describe()

5. Accessing and Indexing

Accessing a row df.loc[ei]

Accessing a column df['column_name']

Column df.loc[:, 'a':'b']

c. Feature exploration (masking, filtering)

Row df.loc[(df['col1'] == val1) &

# this will add a row at the end

Row df.drop(labels=None, axis=0)

Column df.drop('col_name', axis=1)

d. Duplicates and dropping duplicates

# Returns a boolean series for each duplicate row marked

ii. Drop duplicate values

# Parameters have the same meaning as in df.duplicated, except

Grouping based on a single df.groupby(‘group_col_name’)[‘col(s)’].

Grouping based on multiple df.groupby([‘group_col_name’])[‘col’].a

Group based filtering df.groupby(‘group_col_name’).filter(bo

# This filters all rows of those directors

Group based apply df.groupby(‘group_col_name’).apply(f

# Finds movies whose budget is

11. Cleaning our data

12. Data Restructuring

Melt pd.melt(df, id_vars=[‘list of

Pivot df.pivot(index=[‘list of columns],

# This will keep the index columns

# This will bin the temperature

Shift df[‘col’].shift(periods=n, axis=0)

13. Misc Topics

df[‘col’][0].year Extracts the year for the 0th index

df[‘col’].dt.year Extracts the year for whole

df[‘col’][0].strftime(‘%M%Y’) Formats the select data (0th index

You might also like