Tajamul Khan
@Tajamulkhann
pd.read_csv(filename): Read data from a
CSV file.
pd.read_table(filename): Read data from
a delimited text file.
pd.read_excel(filename): Read data from
an Excel file.
pd.read_sql(query, connection_object):
Read data from a SQL table/database.
pd.read_json(json_string): Read data
from a JSON formatted string, URL, or
file.
pd.read_html(url): Parse an HTML URL,
string, or file to extract tables to a
list of DataFrames.
pd.DataFrame(dict): Create a DataFrame
from a dictionary (keys as column
names, values as lists).
df.to_csv(filename): Write to a CSV
file.
df.to_excel(filename): Write to an
Excel file.
df.to_sql(table_nm, connection_object):
Write to a SQL table.
df.to_json(filename): Write to a file
in JSON format.
@Tajamulkhann
df.head(): View the first 5 rows
of the DataFrame.
df.tail(): View the last 5 rows
of the DataFrame.
df.sample(): View the random 5
rows of the DataFrame.
df.shape: Get the dimensions of
the DataFrame.
df.info(): Get a concise summary
of the DataFrame.
df.describe(): Summary
statistics for numerical
columns.
df.dtypes: Check data types of
columns.
df.columns: List column names.
df.index: Display the index
range.
@Tajamulkhann
df['column']: Select a single
column.
df[['col1', 'col2']]: Select
multiple columns.
df.iloc[0]: Select the first row
by position.
df.loc[0]: Select the first row
by index label.
df.iloc[0, 0]: Select a specific
element by position.
df.loc[0, 'column']: Select a
specific element by label.
df[df['col'] > 5]: Filter rows
where column > 5.
df.iloc[0:5, 0:2]: Slice rows
and columns.
df.set_index('column'): Set a
column as the index.
@Tajamulkhann
df.isnull(): Check for null
values.
df.notnull(): Check for non-null
values.
df.dropna(): Drop rows with null
values.
df.fillna(value): Replace null
values with a specific value.
df.replace(1, 'one'): Replace
specific values.
df.rename(columns={'old':
'new'}): Rename columns.
df.astype('int'): Change data
type of a column.
df.drop_duplicates(): Remove
duplicate rows.
df.reset_index(): Reset the
index.
@Tajamulkhann
df.sort_values('col'): Sort by
column in ascending order.
df.sort_values('col',
ascending=False): Sort by column
in descending order.
df.sort_values(['col1', 'col2'],
ascending=[True, False]): Sort
by multiple columns.
df[df['col'] > 5]: Filter rows
based on condition.
df.query('col > 5'): Filter
using a query string.
df.sample(5): Randomly select 5
rows.
df.nlargest(3, 'col'): Get top 3
rows by column.
df.nsmallest(3, 'col'): Get
bottom 3 rows by column.
df.filter(like='part'): Filter
columns by substring.
@Tajamulkhann
df.groupby('col'): Group by a
column.
df.groupby('col').mean(): Mean
of groups.
df.groupby('col').sum(): Sum of
groups.
df.groupby('col').count(): Count
non-null values in groups.
df.groupby('col')
['other_col'].max(): Max value
in another column for groups.
df.pivot_table(values='col',
index='group', aggfunc='mean'):
Create a pivot table.
df.agg({'col1': 'mean', 'col2':
'sum'}): Aggregate multiple
columns.
df.apply(np.mean): Apply a
function to columns.
df.transform(lambda x: x + 10):
Transform data column-wise.
@Tajamulkhann
pd.concat([df1, df2]):
Concatenate DataFrames
vertically.
pd.concat([df1, df2], axis=1):
Concatenate DataFrames
horizontally.
df1.merge(df2, on='key'): Merge
two DataFrames on a key.
df1.join(df2): SQL-style join.
df1.append(df2): Append rows of
one DataFrame to another.
pd.merge(df1, df2, how='outer',
on='key'): Outer join.
pd.merge(df1, df2, how='inner',
on='key'): Inner join.
pd.merge(df1, df2, how='left',
on='key'): Left join.
pd.merge(df1, df2, how='right',
on='key'): Right join.
@Tajamulkhann
df.mean(): Column-wise mean.
df.median(): Column-wise
median.
df.std(): Column-wise standard
deviation.
df.var(): Column-wise
variance.
df.sum(): Column-wise sum.
df.min(): Column-wise minimum.
df.max(): Column-wise maximum.
df.count(): Count of non-null
values per column.
df.corr(): Correlation matrix.
@Tajamulkhann
df.plot(kind='line'): Line
plot.
df.plot(kind='bar'): Vertical
bar plot.
df.plot(kind='barh'):
Horizontal bar plot.
df.plot(kind='hist'):
Histogram.
df.plot(kind='box'): Box
plot.
df.plot(kind='kde'): Kernel
density estimation plot.
df.plot(kind='pie', y='col'):
Pie chart.
df.plot.scatter(x='c1',
y='c2'): Scatter plot.
df.plot(kind='area'): Area
plot.
@Tajamulkhann
Follow for more!