Perfect! Let’s start from the first section: Basics of Pandas.
I’ll cover the key topics,
explanations, and example usages so you can revise efficiently.
1️⃣ Pandas Basics
a) Series and DataFrames
Series: 1D labeled array
import pandas as pd
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s['b']) # Output: 20
DataFrame: 2D labeled data structure (like Excel table)
data = {'Name': ['Jay', 'Ali'], 'Age': [23, 25]}
df = pd.DataFrame(data)
print(df)
b) Reading and Writing Data
Read data:
df = pd.read_csv('file.csv')
df = pd.read_excel('file.xlsx')
df = pd.read_json('file.json')
Write data:
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)
df.to_json('output.json')
c) Inspecting Data
df.head(n) → first n rows
df.tail(n) → last n rows
df.info() → data types, non-null counts
df.shape → (rows, columns)
df.describe() → summary statistics
d) Selecting Rows and Columns
Select a column: df['Name'] or df.Name
Select multiple columns: df[['Name', 'Age']]
Select rows by index: df.iloc[0] → first row
Select rows by label: df.loc[0] → first row (if index is default numeric)
e) Adding, Renaming, Dropping Columns
Add a column:
df['Salary'] = [50000, 60000]
Rename a column:
df.rename(columns={'Salary': 'Income'}, inplace=True)
Drop column/row:
df.drop('Income', axis=1, inplace=True) # drop column
df.drop(0, axis=0, inplace=True) # drop row
This covers all the basic operations you must know before moving to intermediate topics.
I can now move to 2️⃣ Indexing and Selection, which is heavily asked in interviews, with
examples and tricks.
Do you want me to continue with that?
Great! Let’s move on to 2️⃣ Indexing and Selection in Pandas—this is very important for
interviews.
2️⃣ Indexing and Selection
a) Boolean Indexing / Filtering
Select rows based on conditions:
import pandas as pd
data = {'Name': ['Jay', 'Ali', 'Mia'], 'Age': [23, 25, 22]}
df = pd.DataFrame(data)
# Age greater than 23
df_filtered = df[df['Age'] > 23]
print(df_filtered)
b) Conditional Selection
Combine multiple conditions using & (and) / | (or):
df_filtered = df[(df['Age'] > 22) & (df['Name'] != 'Mia')]
print(df_filtered)
c) Using .isin()
Filter rows where a column’s value is in a list:
df_filtered = df[df['Name'].isin(['Jay', 'Mia'])]
d) Using .between()
Filter rows where values lie in a range:
df_filtered = df[df['Age'].between(22, 24)]
e) Using .query()
Filter using a query string (interviewers like this):
df_filtered = df.query('Age > 22 and Name != "Mia"')
f) .loc[] and .iloc[]
.loc[] → label-based selection:
df.loc[0, 'Name'] # row 0, column 'Name'
df.loc[:, ['Name', 'Age']] # all rows, selected columns
.iloc[] → integer index-based selection:
df.iloc[0, 1] # first row, second column
df.iloc[0:2, 0:2] # slice rows and columns
g) Setting and Resetting Index
Change the index of DataFrame:
df.set_index('Name', inplace=True)
df.reset_index(inplace=True)
h) MultiIndex / Hierarchical Indexing
Used when DataFrame has multiple levels of index:
arrays = [['A', 'A', 'B'], [1, 2, 1]]
index = pd.MultiIndex.from_arrays(arrays, names=('letter', 'number'))
df_multi = pd.DataFrame({'Value': [10, 20, 30]}, index=index)
✅ Key Interview Tips:
Know the difference between .loc and .iloc.
Practice filtering with multiple conditions.
.query() is a neat trick often appreciated in interviews.
Be ready for MultiIndex questions if you’re applying for data-heavy roles.
I can now continue with 3️⃣ Data Cleaning, which is another frequent interview topic.
Do you want me to continue?
Perfect! Let’s move on to 3️⃣ Data Cleaning in Pandas—this is heavily asked in interviews
because real-world data is often messy.
3️⃣ Data Cleaning
a) Handling Missing Data
Detect missing values:
df.isnull() # Returns True for NaN values
df.notnull() # Returns True for non-NaN
Drop missing values:
df.dropna(axis=0, inplace=True) # Drop rows with NaN
df.dropna(axis=1, inplace=True) # Drop columns with NaN
Fill missing values:
df['Age'].fillna(0, inplace=True) # Fill with constant
df['Age'].fillna(df['Age'].mean(), inplace=True) # Fill with mean
b) Handling Duplicates
Check duplicates:
df.duplicated()
Drop duplicates:
df.drop_duplicates(inplace=True)
c) Type Conversion
Convert column types:
df['Age'] = df['Age'].astype(int)
df['Salary'] = df['Salary'].astype(float)
d) String Operations
Many interview questions involve cleaning string columns:
df['Name'].str.upper() # Convert to uppercase
df['Name'].str.lower() # Convert to lowercase
df['Name'].str.contains('Jay') # Check if contains substring
df['Name'].str.replace(' ', '_') # Replace characters
e) Renaming Columns and Values
Rename columns:
df.rename(columns={'Age': 'Years'}, inplace=True)
Replace values in a column:
df['Department'].replace({'HR': 'Human Resources'}, inplace=True)
f) Removing Whitespace
Trim spaces from string columns:
df['Name'] = df['Name'].str.strip()
✅ Key Interview Tips:
Expect questions like: “How do you handle missing values?” or “How do you remove
duplicates?”
Practice filling NaN values using group-wise mean or median.
Be ready to clean string columns using str methods.
I can now continue with 4️⃣ Data Transformation, which includes sorting, grouping, merging,
and reshaping—another very important topic.
Do you want me to continue?
Great! Let’s move on to 4️⃣ Data Transformation, a core topic for interviews.
4️⃣ Data Transformation
a) Sorting
Sort by values:
df.sort_values(by='Age', ascending=True, inplace=True)
Sort by index:
df.sort_index(ascending=False, inplace=True)
b) Aggregation / Grouping
Group by column and aggregate:
df.groupby('Department')['Salary'].mean()
df.groupby(['Department', 'Gender'])['Salary'].sum()
Using multiple aggregation functions:
df.groupby('Department')['Salary'].agg(['mean', 'max', 'min'])
c) Pivot Tables
Reshape data for aggregation:
pd.pivot_table(df, index='Department', columns='Gender', values='Salary', aggfunc='mean')
Melt → Convert wide data to long format:
pd.melt(df, id_vars=['Name'], value_vars=['Math', 'Science'])
d) Apply, Map, Applymap, Lambda Functions
Apply → Apply function to rows/columns:
df['Bonus'] = df['Salary'].apply(lambda x: x*0.1)
Map → Map values in a Series:
df['Department'] = df['Department'].map({'HR': 'Human Resources'})
Applymap → Apply function element-wise to entire DataFrame:
df.applymap(lambda x: x*2 if isinstance(x,int) else x)
e) Merging and Joining
Merge two DataFrames (like SQL JOIN):
pd.merge(df1, df2, on='EmployeeID', how='inner') # inner, left, right, outer
Concatenate DataFrames:
pd.concat([df1, df2], axis=0) # vertical
pd.concat([df1, df2], axis=1) # horizontal
Join → Join on index:
df1.join(df2, how='left')
f) Reshaping
Stack / Unstack → Pivot rows into columns and vice versa:
df_stack = df.stack()
df_unstack = df_stack.unstack()
✅ Key Interview Tips:
Grouping and pivot tables are very commonly asked.
Know the difference between merge, concat, and join.
Practice applying functions using lambda for quick transformations.
Be prepared for real-world tasks, like computing average sales per category, or top 3
performers per department.
I can now continue with 5️⃣ Statistical & Analytical Functions, which includes summary
statistics, correlation, value counts, etc.—very commonly asked in interviews.
Do you want me to continue?
Perfect! Let’s move on to 5️⃣ Statistical & Analytical Functions in Pandas—these are very
frequently asked in interviews.
5️⃣ Statistical & Analytical Functions
a) Summary Statistics
Basic stats on numeric columns:
df['Salary'].mean() # Mean
df['Salary'].median() # Median
df['Salary'].sum() # Sum
df['Salary'].min() # Minimum
df['Salary'].max() # Maximum
df['Salary'].count() # Non-null count
Descriptive statistics for all numeric columns:
df.describe()
b) Correlation & Covariance
Correlation → Measures linear relationship between columns:
df.corr() # Pearson correlation by default
Covariance → Measures how two variables change together:
df.cov()
c) Value Counts
Count occurrences of each value in a column:
df['Department'].value_counts()
Include missing values:
df['Department'].value_counts(dropna=False)
d) Unique and Distinct Values
Get unique values:
df['Department'].unique()
Count unique values:
df['Department'].nunique()
e) Ranking & Sorting Within Groups
Rank values:
df['SalaryRank'] = df['Salary'].rank(ascending=False)
Sort within groups:
df.groupby('Department')['Salary'].rank(ascending=False)
f) Cumulative & Rolling Functions
Cumulative sum/product:
df['CumSum'] = df['Salary'].cumsum()
Rolling average:
df['RollingAvg'] = df['Salary'].rolling(window=3).mean()
✅ Key Interview Tips:
Questions often involve finding top/bottom values, averages, and counts.
Be ready to combine groupby with aggregation for real-world scenarios.
Correlation and covariance may be asked for basic data analysis understanding.
Practice value_counts, unique, and nunique for categorical data questions.
I can now continue with 6️⃣ Time Series in Pandas, which is highly relevant for data roles and
often comes in interviews.
Do you want me to continue?
Great! Let’s move on to 6️⃣ Time Series in Pandas, which is very important for roles involving
temporal data.
6️⃣ Time Series in Pandas
a) DateTime Conversion
Convert strings to datetime objects:
df['Date'] = pd.to_datetime(df['Date'])
Extract datetime components:
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['Weekday'] = df['Date'].dt.weekday # 0=Monday, 6=Sunday
df['DayName'] = df['Date'].dt.day_name()
b) Setting Date as Index
Makes time series operations easier:
df.set_index('Date', inplace=True)
c) Resampling
Aggregate data at different time frequencies:
df.resample('M').mean() # Monthly mean
df.resample('W').sum() # Weekly sum
Frequency codes:
o D → day, W → week, M → month, Q → quarter, Y → year, H → hour
d) Shifting / Lagging
Shift data to compare with previous values:
df['PrevMonthSales'] = df['Sales'].shift(1)
df['Diff'] = df['Sales'] - df['PrevMonthSales']
e) Rolling / Moving Windows
Compute rolling statistics:
df['RollingAvg'] = df['Sales'].rolling(window=3).mean()
df['RollingSum'] = df['Sales'].rolling(window=3).sum()
Useful for smoothing and trend analysis.
f) Time-based Filtering
Filter by date ranges:
df['2025-01-01':'2025-03-31'] # Select rows between dates
df[df.index.year == 2025] # Select specific year
df[df.index.month == 8] # Select specific month
✅ Key Interview Tips:
Expect questions like:
o “How do you resample daily data to monthly?”
o “How do you calculate moving averages?”
o “How do you extract the day, month, or year from a date column?”
Practice lag/shift and rolling functions; they’re common in time series interviews.
Setting datetime as index is crucial for efficient time series operations.
I can now continue with 7️⃣ Visualization with Pandas, which is simpler but often comes in
interviews for quick plotting and insights.
Do you want me to continue?