KEMBAR78
Pandas Cheatsheet | PDF | Computer Data | Data
0% found this document useful (0 votes)
51 views11 pages

Pandas Cheatsheet

The document provides a comprehensive overview of various pandas functions for data manipulation and analysis in Python. It covers reading and writing data from different file formats, viewing and selecting data, handling missing values, sorting, grouping, merging DataFrames, and performing statistical operations. Additionally, it includes plotting functions for visualizing data.

Uploaded by

AVISHEKH GANGULY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views11 pages

Pandas Cheatsheet

The document provides a comprehensive overview of various pandas functions for data manipulation and analysis in Python. It covers reading and writing data from different file formats, viewing and selecting data, handling missing values, sorting, grouping, merging DataFrames, and performing statistical operations. Additionally, it includes plotting functions for visualizing data.

Uploaded by

AVISHEKH GANGULY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Tajamul Khan

@Tajamulkhann
pd.read_csv(filename): Read data from a
CSV file.
pd.read_table(filename): Read data from
a delimited text file.
pd.read_excel(filename): Read data from
an Excel file.
pd.read_sql(query, connection_object):
Read data from a SQL table/database.
pd.read_json(json_string): Read data
from a JSON formatted string, URL, or file.
pd.read_html(url): Parse an HTML URL,
string, or file to extract tables to a list of
DataFrames.
pd.DataFrame(dict): Create a DataFrame
from a dictionary (keys as column names,
values as lists).
df.to_csv(filename): Write to a CSV file.
df.to_excel(filename): Write to an Excel
file.
df.to_sql(table_nm, connection_object):
Write to a SQL table.
df.to_json(filename): Write to a file in
JSON format.

@Tajamulkhann
df.head(): View the first 5 rows of
the DataFrame.
df.tail(): View the last 5 rows of the
DataFrame.
df.sample(): View the random 5
rows of the DataFrame.
df.shape: Get the dimensions of the
DataFrame.
df.info(): Get a concise summary of
the DataFrame.
df.describe(): Summary statistics
for numerical columns.
df.dtypes: Check data types of
columns.
df.columns: List column names.
df.index: Display the index range.

@Tajamulkhann
df['column']: Select a single column.
df[['col1', 'col2']]: Select multiple
columns.
df.iloc[0]: Select the first row by
position.
df.loc[0]: Select the first row by
index label.
df.iloc[0, 0]: Select a specific
element by position.
df.loc[0, 'column']: Select a specific
element by label.
df[df['col'] > 5]: Filter rows where
column > 5.
df.iloc[0:5, 0:2]: Slice rows and
columns.
df.set_index('column'): Set a
column as the index.

@Tajamulkhann
df.isnull(): Check for null values.
df.notnull(): Check for non-null
values.
df.dropna(): Drop rows with null
values.
df.fillna(value): Replace null values
with a specific value.
df.replace(1, 'one'): Replace specific
values.
df.rename(columns={'old': 'new'}):
Rename columns.
df.astype('int'): Change data type of
a column.
df.drop_duplicates(): Remove
duplicate rows.
df.reset_index(): Reset the index.

@Tajamulkhann
df.sort_values('col'): Sort by column
in ascending order.
df.sort_values('col',
ascending=False): Sort by column in
descending order.
df.sort_values(['col1', 'col2'],
ascending=[True, False]): Sort by
multiple columns.
df[df['col'] > 5]: Filter rows based on
condition.
df.query('col > 5'): Filter using a
query string.
df.sample(5): Randomly select 5
rows.
df.nlargest(3, 'col'): Get top 3 rows
by column.
df.nsmallest(3, 'col'): Get bottom 3
rows by column.
df.filter(like='part'): Filter columns
by substring.

@Tajamulkhann
df.groupby('col'): Group by a
column.
df.groupby('col').mean(): Mean of
groups.
df.groupby('col').sum(): Sum of
groups.
df.groupby('col').count(): Count
non-null values in groups.
df.groupby('col')['other_col'].max():
Max value in another column for
groups.
df.pivot_table(values='col',
index='group', aggfunc='mean'):
Create a pivot table.
df.agg({'col1': 'mean', 'col2': 'sum'}):
Aggregate multiple columns.
df.apply(np.mean): Apply a function
to columns.
df.transform(lambda x: x + 10):
Transform data column-wise.

@Tajamulkhann
pd.concat([df1, df2]): Concatenate
DataFrames vertically.
pd.concat([df1, df2], axis=1):
Concatenate DataFrames
horizontally.
df1.merge(df2, on='key'): Merge two
DataFrames on a key.
df1.join(df2): SQL-style join.
df1.append(df2): Append rows of
one DataFrame to another.
pd.merge(df1, df2, how='outer',
on='key'): Outer join.
pd.merge(df1, df2, how='inner',
on='key'): Inner join.
pd.merge(df1, df2, how='left',
on='key'): Left join.
pd.merge(df1, df2, how='right',
on='key'): Right join.

@Tajamulkhann
df.mean(): Column-wise mean.
df.median(): Column-wise
median.
df.std(): Column-wise standard
deviation.
df.var(): Column-wise variance.
df.sum(): Column-wise sum.
df.min(): Column-wise
minimum.
df.max(): Column-wise
maximum.
df.count(): Count of non-null
values per column.
df.corr(): Correlation matrix.

@Tajamulkhann
df.plot(kind='line'): Line plot.
df.plot(kind='bar'): Vertical bar
plot.
df.plot(kind='barh'): Horizontal
bar plot.
df.plot(kind='hist'): Histogram.
df.plot(kind='box'): Box plot.
df.plot(kind='kde'): Kernel
density estimation plot.
df.plot(kind='pie', y='col'): Pie
chart.
df.plot.scatter(x='c1', y='c2'):
Scatter plot.
df.plot(kind='area'): Area plot.

@Tajamulkhann
Follow for more!

You might also like