Pandas Library - Beginner Guide
Step 1: What is Pandas?
Pandas is a Python library used for data manipulation and analysis. It helps you work with
data in a table-like format — similar to Excel or SQL tables.
You mainly work with two data structures in Pandas:
- Series: 1D labeled array (like a single column)
- DataFrame: 2D labeled data (like a full table)
Step 2: Installing and Importing Pandas
# Install it (run this in terminal or notebook if not installed)
!pip install pandas
# Import it
import pandas as pd
Step 3: Creating Data Structures
1. Series (1D array):
import pandas as pd
data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)
With labels:
s = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(s)
2. DataFrame (2D Table):
From dictionary:
data = {'Name': ['Mayur', 'Aniket', 'Jayesh'], 'Marks': [85, 90, 78]}
df = pd.DataFrame(data)
print(df)
From list of lists:
data = [['Mayur', 85], ['Aniket', 90], ['Jayesh', 78]]
df = pd.DataFrame(data, columns=['Name', 'Marks'])
print(df)
Step 4: Reading Data from File
# CSV File
df = pd.read_csv('your_file.csv')
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rows
Step 5: Exploring the Dataset
print(df.shape) # (rows, columns)
print(df.columns) # Column names
print(df.info()) # Data types and non-null counts
print(df.describe()) # Summary stats for numerical columns
Step 6: Selecting Data
# Single column
df['Name'] # Returns a Series
# Multiple columns
df[['Name', 'Marks']] # Returns a DataFrame
# Access rows using iloc (by index)
df.iloc[0] # First row
df.iloc[1:3] # Rows 1 to 2
# Access using loc (by label)
df.loc[0, 'Name']
Step 7: Filtering Rows
df[df['Marks'] > 80] # Students with marks > 80
Step 8: Adding New Column
df['Grade'] = ['A', 'A+', 'B']
print(df)
Step 9: Handling Missing Data
df.isnull() # Shows True for missing values
df.dropna() # Removes rows with missing values
df.fillna(0) # Replace missing with 0
Step 10: Sorting
df.sort_values(by='Marks', ascending=False)
Optional but Useful Topics (if you have time):
1. Group By (Grouping and Aggregating)
Useful for summarizing data.
df.groupby('Grade')['Marks'].mean()
2. Apply Function (Custom Functions on Data)
Apply a lambda or custom function to a column.
df['Bonus'] = df['Marks'].apply(lambda x: x + 5)
3. Merging / Joining DataFrames
Similar to SQL joins.
pd.merge(df1, df2, on='ID', how='inner')
4. Pivot Tables
For advanced summarization.
df.pivot_table(values='Marks', index='Grade', aggfunc='mean')
5. Exporting to File
Save the modified DataFrame to CSV or Excel.
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)