📘 Pandas Tutorial: From Beginner to
Advanced
📌 Table of Contents
1. Introduction
2. Installing Pandas
3. Getting Started
4. Data Structures in Pandas
○ Series
○ DataFrame
5. Basic Operations
6. Indexing and Selecting Data
7. Handling Missing Data
8. Data Manipulation
9. GroupBy Operations
10.Merging & Joining
11.Time Series
12.Input/Output (I/O)
13.Visualization
14.Advanced Topics
15.Conclusion
🔍 1. Introduction
pandas is an open-source Python library that provides data structures and data analysis tools
for working with structured data (like Excel spreadsheets, SQL tables, CSV files, etc.).
💻 2. Installing Pandas
Install via pip:
pip install pandas
Or in Jupyter:
!pip install pandas
🚀 3. Getting Started
Import pandas:
import pandas as pd
Check version:
print(pd.__version__)
📦 4. Data Structures in Pandas
🧵 Series (1D)
s = pd.Series([1, 3, 5, None, 6, 8])
print(s)
📊 DataFrame (2D)
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NY', 'LA', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
⚙️ 5. Basic Operations
df.head() # First 5 rows
df.tail(2) # Last 2 rows
df.shape # (rows, columns)
df.info() # Info about DataFrame
df.describe() # Statistical summary
df.columns # Column names
df.dtypes # Data types
🔍 6. Indexing and Selecting Data
df['Name'] # Select column
df[['Name', 'City']] # Multiple columns
df.loc[0] # Row by label
df.iloc[1] # Row by index
df.iloc[0:2] # Slicing rows
df[df['Age'] > 25] # Conditional filter
❓ 7. Handling Missing Data
df.isnull() # Detect missing values
df.dropna() # Drop rows with NaN
df.fillna(0) # Fill NaNs with 0
df['Age'].fillna(df['Age'].mean()) # Fill with mean
🛠️ 8. Data Manipulation
Add new column
df['Salary'] = [50000, 60000, 70000]
Rename columns
df.rename(columns={'Name': 'FullName'}, inplace=True)
Drop columns/rows
df.drop('Salary', axis=1) # Drop column
df.drop([0, 1], axis=0) # Drop rows
Sorting
df.sort_values('Age', ascending=False)
🔄 9. GroupBy Operations
grouped = df.groupby('City')
grouped['Age'].mean()
df.groupby('City').agg({'Age': 'mean', 'Salary': 'sum'})
🔗 10. Merging, Joining, Concatenating
Concatenate
pd.concat([df1, df2]) # Vertical
pd.concat([df1, df2], axis=1) # Horizontal
Merge
pd.merge(df1, df2, on='id')
Join
df1.join(df2, on='id', how='left')
⏰ 11. Time Series
dates = pd.date_range('2023-01-01', periods=6)
df = pd.DataFrame({'value': range(6)}, index=dates)
df.resample('D').sum() # Daily resample
df['2023'] # Select by year
💾 12. Input / Output (I/O)
Read files
pd.read_csv('data.csv')
pd.read_excel('data.xlsx')
pd.read_json('data.json')
pd.read_sql(query, connection)
Write files
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx')
📈 13. Visualization
Basic plotting (uses Matplotlib):
import matplotlib.pyplot as plt
df['Age'].plot(kind='hist')
df.plot(x='Name', y='Salary', kind='bar')
plt.show()
🧠 14. Advanced Topics
● Pivot Tables
df.pivot_table(values='Salary', index='City', aggfunc='mean')
● Apply functions
df['Age'].apply(lambda x: x + 1)
● Categorical data
df['City'] = df['City'].astype('category')
● MultiIndexing
df.set_index(['City', 'Name'], inplace=True)
✅ 15. Conclusion
pandas is essential for any data science or data analysis workflow. It handles:
● Data cleaning
● Transformation
● Aggregation
● Input/output
● Time-series and categorical data