Pandas Notes for Machine Learning
Pandas is a powerful library for data manipulation in Python, widely used in Machine Learning
projects.
This document contains the most commonly used pandas methods with explanations and examples.
Data Loading
pd.read_csv() - Read CSV files
Example: pd.read_csv()()
pd.read_excel() - Read Excel files
Example: pd.read_excel()()
pd.read_json() - Read JSON files
Example: pd.read_json()()
pd.read_sql() - Read SQL tables
Example: pd.read_sql()()
pd.read_html() - Read HTML tables
Example: pd.read_html()()
Data Inspection
df.head(n) - Display top n rows
Example: df.head(n)()
df.tail(n) - Display last n rows
Example: df.tail(n)()
df.info() - Summary of DataFrame
Example: df.info()()
df.describe() - Statistical summary
Example: df.describe()()
df.shape - Number of rows and columns
Example: df.shape()
df.columns - List of column names
Example: df.columns()
df.dtypes - Data types of each column
Example: df.dtypes()
Data Cleaning
df.isnull() - Check missing values
Example: df.isnull()()
df.dropna() - Remove missing values
Example: df.dropna()()
df.fillna(value) - Replace missing values
Example: df.fillna(value)()
df.duplicated() - Check duplicate rows
Example: df.duplicated()()
df.drop_duplicates() - Remove duplicate rows
Example: df.drop_duplicates()()
df.rename() - Rename columns
Example: df.rename()()
Data Selection
df['column'] - Select single column
Example: df['column']()
df[['column1', 'column2']] - Select multiple columns
Example: df[['column1', 'column2']]()
df.loc[] - Select by labels
Example: df.loc[]()
df.iloc[] - Select by index
Example: df.iloc[]()
df.at[] - Access single value by labels
Example: df.at[]()
df.iat[] - Access single value by index
Example: df.iat[]()
Data Filtering
df[df['column'] > value] - Filter rows based on condition
Example: df[df['column'] > value]()
df.query() - Query method for filtering
Example: df.query()()
df[df['column'].isin()] - Filter rows with specific values
Example: df[df['column'].isin()]()
Data Transformation
df.apply() - Apply custom functions
Example: df.apply()()
df.map() - Apply functions to each element
Example: df.map()()
df.replace() - Replace values
Example: df.replace()()
df.astype() - Change data type
Example: df.astype()()
df.sort_values() - Sort data by columns
Example: df.sort_values()()
Grouping and Aggregation
df.groupby() - Group data
Example: df.groupby()()
df.agg() - Aggregate data
Example: df.agg()()
df.pivot_table() - Create pivot tables
Example: df.pivot_table()()
df.crosstab() - Cross tabulation
Example: df.crosstab()()
Merging and Joining
pd.concat() - Concatenate DataFrames
Example: pd.concat()()
pd.merge() - Merge DataFrames
Example: pd.merge()()
df.join() - Join DataFrames on index
Example: df.join()()
Data Visualization
df.plot() - Simple plotting
Example: df.plot()()
df.hist() - Histogram plotting
Example: df.hist()()
df.boxplot() - Box plot
Example: df.boxplot()()
Exporting Data
df.to_csv() - Export to CSV
Example: df.to_csv()()
df.to_excel() - Export to Excel
Example: df.to_excel()()
df.to_json() - Export to JSON
Example: df.to_json()()
df.to_sql() - Export to SQL
Example: df.to_sql()()
Conclusion
These are the essential pandas methods used in Machine Learning projects.
Practice them on different datasets to gain confidence. Happy Learning!