0% found this document useful (0 votes)

17 views13 pages

Pandas

Uploaded by

bhushan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views13 pages

Pandas

Uploaded by

bhushan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as RTF, PDF, TXT or read online on Scribd

You are on page 1/ 13

Perfect! Let’s start from the first section: Basics of Pandas.

I’ll cover the key topics,

explanations, and example usages so you can revise efficiently.

1️⃣ Pandas Basics

a) Series and DataFrames

 Series: 1D labeled array

 import pandas as pd

 s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

 print(s['b']) # Output: 20

 DataFrame: 2D labeled data structure (like Excel table)

 data = {'Name': ['Jay', 'Ali'], 'Age': [23, 25]}

 df = pd.DataFrame(data)

 print(df)

b) Reading and Writing Data

 Read data:

 df = pd.read_csv('file.csv')

 df = pd.read_excel('file.xlsx')

 df = pd.read_json('file.json')

 Write data:

 df.to_csv('output.csv', index=False)

 df.to_excel('output.xlsx', index=False)

 df.to_json('output.json')

c) Inspecting Data

 df.head(n) → first n rows

 df.tail(n) → last n rows

 df.info() → data types, non-null counts

 df.shape → (rows, columns)

 df.describe() → summary statistics

d) Selecting Rows and Columns

 Select a column: df['Name'] or df.Name

 Select multiple columns: df[['Name', 'Age']]

 Select rows by index: df.iloc[0] → first row

 Select rows by label: df.loc[0] → first row (if index is default numeric)

e) Adding, Renaming, Dropping Columns

 Add a column:

 df['Salary'] = [50000, 60000]

 Rename a column:

 df.rename(columns={'Salary': 'Income'}, inplace=True)

 Drop column/row:

 df.drop('Income', axis=1, inplace=True) # drop column

 df.drop(0, axis=0, inplace=True) # drop row

This covers all the basic operations you must know before moving to intermediate topics.

I can now move to 2️⃣ Indexing and Selection, which is heavily asked in interviews, with
examples and tricks.

Do you want me to continue with that?

Great! Let’s move on to 2️⃣ Indexing and Selection in Pandas—this is very important for
interviews.

2️⃣ Indexing and Selection

a) Boolean Indexing / Filtering

 Select rows based on conditions:

import pandas as pd

data = {'Name': ['Jay', 'Ali', 'Mia'], 'Age': [23, 25, 22]}

df = pd.DataFrame(data)

# Age greater than 23

df_filtered = df[df['Age'] > 23]

print(df_filtered)

b) Conditional Selection

 Combine multiple conditions using & (and) / | (or):

df_filtered = df[(df['Age'] > 22) & (df['Name'] != 'Mia')]

print(df_filtered)

c) Using .isin()

 Filter rows where a column’s value is in a list:

df_filtered = df[df['Name'].isin(['Jay', 'Mia'])]

d) Using .between()

 Filter rows where values lie in a range:

df_filtered = df[df['Age'].between(22, 24)]

e) Using .query()

 Filter using a query string (interviewers like this):

df_filtered = df.query('Age > 22 and Name != "Mia"')

f) .loc[] and .iloc[]

 .loc[] → label-based selection:

df.loc[0, 'Name'] # row 0, column 'Name'

df.loc[:, ['Name', 'Age']] # all rows, selected columns

 .iloc[] → integer index-based selection:

df.iloc[0, 1] # first row, second column

df.iloc[0:2, 0:2] # slice rows and columns

g) Setting and Resetting Index

 Change the index of DataFrame:

df.set_index('Name', inplace=True)

df.reset_index(inplace=True)

h) MultiIndex / Hierarchical Indexing

 Used when DataFrame has multiple levels of index:

arrays = [['A', 'A', 'B'], [1, 2, 1]]

index = pd.MultiIndex.from_arrays(arrays, names=('letter', 'number'))

df_multi = pd.DataFrame({'Value': [10, 20, 30]}, index=index)

✅ Key Interview Tips:

 Know the difference between .loc and .iloc.

 Practice filtering with multiple conditions.

 .query() is a neat trick often appreciated in interviews.

 Be ready for MultiIndex questions if you’re applying for data-heavy roles.

I can now continue with 3️⃣ Data Cleaning, which is another frequent interview topic.

Do you want me to continue?

Perfect! Let’s move on to 3️⃣ Data Cleaning in Pandas—this is heavily asked in interviews
because real-world data is often messy.

3️⃣ Data Cleaning

a) Handling Missing Data

 Detect missing values:

df.isnull() # Returns True for NaN values

df.notnull() # Returns True for non-NaN

 Drop missing values:

df.dropna(axis=0, inplace=True) # Drop rows with NaN

df.dropna(axis=1, inplace=True) # Drop columns with NaN

 Fill missing values:

df['Age'].fillna(0, inplace=True) # Fill with constant

df['Age'].fillna(df['Age'].mean(), inplace=True) # Fill with mean

b) Handling Duplicates

 Check duplicates:

df.duplicated()

 Drop duplicates:

df.drop_duplicates(inplace=True)

c) Type Conversion
 Convert column types:

df['Age'] = df['Age'].astype(int)

df['Salary'] = df['Salary'].astype(float)

d) String Operations

 Many interview questions involve cleaning string columns:

df['Name'].str.upper() # Convert to uppercase

df['Name'].str.lower() # Convert to lowercase

df['Name'].str.contains('Jay') # Check if contains substring

df['Name'].str.replace(' ', '_') # Replace characters

e) Renaming Columns and Values

 Rename columns:

df.rename(columns={'Age': 'Years'}, inplace=True)

 Replace values in a column:

df['Department'].replace({'HR': 'Human Resources'}, inplace=True)

f) Removing Whitespace

 Trim spaces from string columns:

df['Name'] = df['Name'].str.strip()

✅ Key Interview Tips:

 Expect questions like: “How do you handle missing values?” or “How do you remove
duplicates?”

 Practice filling NaN values using group-wise mean or median.

 Be ready to clean string columns using str methods.

I can now continue with 4️⃣ Data Transformation, which includes sorting, grouping, merging,
and reshaping—another very important topic.

Do you want me to continue?

Great! Let’s move on to 4️⃣ Data Transformation, a core topic for interviews.

4️⃣ Data Transformation

a) Sorting

 Sort by values:

df.sort_values(by='Age', ascending=True, inplace=True)

 Sort by index:

df.sort_index(ascending=False, inplace=True)

b) Aggregation / Grouping

 Group by column and aggregate:

df.groupby('Department')['Salary'].mean()

df.groupby(['Department', 'Gender'])['Salary'].sum()

 Using multiple aggregation functions:

df.groupby('Department')['Salary'].agg(['mean', 'max', 'min'])

c) Pivot Tables

 Reshape data for aggregation:

pd.pivot_table(df, index='Department', columns='Gender', values='Salary', aggfunc='mean')

 Melt → Convert wide data to long format:

pd.melt(df, id_vars=['Name'], value_vars=['Math', 'Science'])

d) Apply, Map, Applymap, Lambda Functions

 Apply → Apply function to rows/columns:

df['Bonus'] = df['Salary'].apply(lambda x: x*0.1)

 Map → Map values in a Series:

df['Department'] = df['Department'].map({'HR': 'Human Resources'})

 Applymap → Apply function element-wise to entire DataFrame:

df.applymap(lambda x: x*2 if isinstance(x,int) else x)

e) Merging and Joining

 Merge two DataFrames (like SQL JOIN):

pd.merge(df1, df2, on='EmployeeID', how='inner') # inner, left, right, outer

 Concatenate DataFrames:

pd.concat([df1, df2], axis=0) # vertical

pd.concat([df1, df2], axis=1) # horizontal

 Join → Join on index:

df1.join(df2, how='left')

f) Reshaping

 Stack / Unstack → Pivot rows into columns and vice versa:

df_stack = df.stack()

df_unstack = df_stack.unstack()

✅ Key Interview Tips:

 Grouping and pivot tables are very commonly asked.

 Know the difference between merge, concat, and join.

 Practice applying functions using lambda for quick transformations.

 Be prepared for real-world tasks, like computing average sales per category, or top 3
performers per department.

I can now continue with 5️⃣ Statistical & Analytical Functions, which includes summary
statistics, correlation, value counts, etc.—very commonly asked in interviews.

Do you want me to continue?

Perfect! Let’s move on to 5️⃣ Statistical & Analytical Functions in Pandas—these are very
frequently asked in interviews.

5️⃣ Statistical & Analytical Functions

a) Summary Statistics

 Basic stats on numeric columns:

df['Salary'].mean() # Mean

df['Salary'].median() # Median

df['Salary'].sum() # Sum

df['Salary'].min() # Minimum

df['Salary'].max() # Maximum

df['Salary'].count() # Non-null count

 Descriptive statistics for all numeric columns:

df.describe()

b) Correlation & Covariance

 Correlation → Measures linear relationship between columns:

df.corr() # Pearson correlation by default

 Covariance → Measures how two variables change together:

df.cov()

c) Value Counts

 Count occurrences of each value in a column:

df['Department'].value_counts()

 Include missing values:

df['Department'].value_counts(dropna=False)

d) Unique and Distinct Values

 Get unique values:

df['Department'].unique()

 Count unique values:

df['Department'].nunique()

e) Ranking & Sorting Within Groups

 Rank values:

df['SalaryRank'] = df['Salary'].rank(ascending=False)

 Sort within groups:

df.groupby('Department')['Salary'].rank(ascending=False)

f) Cumulative & Rolling Functions

 Cumulative sum/product:

df['CumSum'] = df['Salary'].cumsum()

 Rolling average:
df['RollingAvg'] = df['Salary'].rolling(window=3).mean()

✅ Key Interview Tips:

 Questions often involve finding top/bottom values, averages, and counts.

 Be ready to combine groupby with aggregation for real-world scenarios.

 Correlation and covariance may be asked for basic data analysis understanding.

 Practice value_counts, unique, and nunique for categorical data questions.

I can now continue with 6️⃣ Time Series in Pandas, which is highly relevant for data roles and
often comes in interviews.

Do you want me to continue?

Great! Let’s move on to 6️⃣ Time Series in Pandas, which is very important for roles involving
temporal data.

6️⃣ Time Series in Pandas

a) DateTime Conversion

 Convert strings to datetime objects:

df['Date'] = pd.to_datetime(df['Date'])

 Extract datetime components:

df['Year'] = df['Date'].dt.year

df['Month'] = df['Date'].dt.month

df['Day'] = df['Date'].dt.day

df['Weekday'] = df['Date'].dt.weekday # 0=Monday, 6=Sunday

df['DayName'] = df['Date'].dt.day_name()
b) Setting Date as Index

 Makes time series operations easier:

df.set_index('Date', inplace=True)

c) Resampling

 Aggregate data at different time frequencies:

df.resample('M').mean() # Monthly mean

df.resample('W').sum() # Weekly sum

 Frequency codes:

o D → day, W → week, M → month, Q → quarter, Y → year, H → hour

d) Shifting / Lagging

 Shift data to compare with previous values:

df['PrevMonthSales'] = df['Sales'].shift(1)

df['Diff'] = df['Sales'] - df['PrevMonthSales']

e) Rolling / Moving Windows

 Compute rolling statistics:

df['RollingAvg'] = df['Sales'].rolling(window=3).mean()

df['RollingSum'] = df['Sales'].rolling(window=3).sum()

 Useful for smoothing and trend analysis.

f) Time-based Filtering

 Filter by date ranges:

df['2025-01-01':'2025-03-31'] # Select rows between dates

df[df.index.year == 2025] # Select specific year

df[df.index.month == 8] # Select specific month

✅ Key Interview Tips:

 Expect questions like:

o “How do you resample daily data to monthly?”

o “How do you calculate moving averages?”

o “How do you extract the day, month, or year from a date column?”

 Practice lag/shift and rolling functions; they’re common in time series interviews.

 Setting datetime as index is crucial for efficient time series operations.

I can now continue with 7️⃣ Visualization with Pandas, which is simpler but often comes in
interviews for quick plotting and insights.

Do you want me to continue?

Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Pandas Dataframe All Operations 1735471870
No ratings yet
Pandas Dataframe All Operations 1735471870
4 pages
Pandas Trampas
No ratings yet
Pandas Trampas
9 pages
Pandas Operations Guide
No ratings yet
Pandas Operations Guide
6 pages
EDA Cheat Sheet
No ratings yet
EDA Cheat Sheet
7 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
.2 Dse
No ratings yet
.2 Dse
14 pages
Pandas Data Manipulation Extended CheatSheet 1731972219
No ratings yet
Pandas Data Manipulation Extended CheatSheet 1731972219
9 pages
Pandas Library: Data Manipulation & Analysis Guide
No ratings yet
Pandas Library: Data Manipulation & Analysis Guide
9 pages
Informatics Practices Practical File
No ratings yet
Informatics Practices Practical File
8 pages
Pandas
No ratings yet
Pandas
5 pages
Data Analyst Interview Q&A Guide
No ratings yet
Data Analyst Interview Q&A Guide
20 pages
Pandas Full Tutorial
No ratings yet
Pandas Full Tutorial
9 pages
Pandas Dataframe Cheat Sheet
No ratings yet
Pandas Dataframe Cheat Sheet
3 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
6 pages
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Data Analysis with Pandas
No ratings yet
Data Analysis with Pandas
31 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
Data Cleaning
No ratings yet
Data Cleaning
40 pages
Introduction To Pandas Programming 2
No ratings yet
Introduction To Pandas Programming 2
3 pages
Pandas Practise Problems
No ratings yet
Pandas Practise Problems
8 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
No ratings yet
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
10 pages
Pandas Test
No ratings yet
Pandas Test
6 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Unit - 4 - Part 2
No ratings yet
Unit - 4 - Part 2
36 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
Data Wrangling & Data Manipulation With Pandas
No ratings yet
Data Wrangling & Data Manipulation With Pandas
6 pages
Pandas
No ratings yet
Pandas
2 pages
Pandas For Machine Learning
No ratings yet
Pandas For Machine Learning
10 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
5 pages
Python Assignment-2
No ratings yet
Python Assignment-2
3 pages
Pandas & PyNumS Essentials
No ratings yet
Pandas & PyNumS Essentials
10 pages
Python Cheat Sheet # 1: (: Essential Syntax)
No ratings yet
Python Cheat Sheet # 1: (: Essential Syntax)
12 pages
Python Cheat Sheet 2.0
100% (2)
Python Cheat Sheet 2.0
10 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
File, Directory, and Filesystem Commands: Command Name: - List Directory Contents
No ratings yet
File, Directory, and Filesystem Commands: Command Name: - List Directory Contents
8 pages
Creating A Linux EC2 Instance and Connect
No ratings yet
Creating A Linux EC2 Instance and Connect
7 pages
Deploy A Static Website On The Render Cloud
No ratings yet
Deploy A Static Website On The Render Cloud
11 pages
Python Rapid Fire
No ratings yet
Python Rapid Fire
8 pages
Spark Commands
No ratings yet
Spark Commands
1 page
RDBMS Rapid Fire Sheet Questions
No ratings yet
RDBMS Rapid Fire Sheet Questions
50 pages
Kafka Interview Questions
No ratings yet
Kafka Interview Questions
4 pages
RDBMS Rapid Fire Sheet Questions (Copy)
No ratings yet
RDBMS Rapid Fire Sheet Questions (Copy)
5 pages
Interview Questions Chatgpt
No ratings yet
Interview Questions Chatgpt
3 pages
Data Engineering Interview Question and Ans Chatgpt
No ratings yet
Data Engineering Interview Question and Ans Chatgpt
21 pages
RS232 Protocol for C210 Inkjet Printer
No ratings yet
RS232 Protocol for C210 Inkjet Printer
25 pages
Uci 102 Group Work Assignment II
No ratings yet
Uci 102 Group Work Assignment II
2 pages
Vehicle Database Management System (VDMS)
No ratings yet
Vehicle Database Management System (VDMS)
18 pages
SIWES Report: Laptop & Phone Repair
No ratings yet
SIWES Report: Laptop & Phone Repair
54 pages
TRAIN Enquiry
No ratings yet
TRAIN Enquiry
24 pages
CA1 Not Scratch Codes
No ratings yet
CA1 Not Scratch Codes
2 pages
EFTS A8 V5 20200907 v5 Lite
No ratings yet
EFTS A8 V5 20200907 v5 Lite
2 pages
HP Probook 650 G8 Notebook PC: Modern Design For The Enterprise
No ratings yet
HP Probook 650 G8 Notebook PC: Modern Design For The Enterprise
4 pages
EDME 22.1.0 Installation Guide 02
No ratings yet
EDME 22.1.0 Installation Guide 02
97 pages
5G Mobile Network Architecture Guide
No ratings yet
5G Mobile Network Architecture Guide
31 pages
Adity Pandey
No ratings yet
Adity Pandey
1 page
Living in The IT Era
No ratings yet
Living in The IT Era
5 pages
What Are The Critical Factors To Consider During..
No ratings yet
What Are The Critical Factors To Consider During..
2 pages
Atmega AVR Projects
No ratings yet
Atmega AVR Projects
204 pages
DSSSB TGT Cs Syllabus
No ratings yet
DSSSB TGT Cs Syllabus
8 pages
AnsKey SQP1 12 CS YK
No ratings yet
AnsKey SQP1 12 CS YK
14 pages
OPC UA vs MQTT: Interoperability Challenges
No ratings yet
OPC UA vs MQTT: Interoperability Challenges
7 pages
Beginning JSP 2-From Novice To Professional
No ratings yet
Beginning JSP 2-From Novice To Professional
39 pages
B.Tech CSE Web Tech Lab File
No ratings yet
B.Tech CSE Web Tech Lab File
1 page
Microprocessor Programming Solution
No ratings yet
Microprocessor Programming Solution
39 pages
Unit 4 Societal Impacts
No ratings yet
Unit 4 Societal Impacts
11 pages
Network Protocols
No ratings yet
Network Protocols
30 pages
Implementing Central Finance in SAP S4HANA S4F61 - EN - Col17 - 20
No ratings yet
Implementing Central Finance in SAP S4HANA S4F61 - EN - Col17 - 20
1 page
IbrahimAhmedKhan CV
No ratings yet
IbrahimAhmedKhan CV
1 page
Merise - MCP, MLC, MLD - Engl
100% (1)
Merise - MCP, MLC, MLD - Engl
7 pages
Python 3.6.1 Updates for Developers
No ratings yet
Python 3.6.1 Updates for Developers
145 pages
UNSW Master of Analytics
No ratings yet
UNSW Master of Analytics
22 pages
Template For Literature Survey Ver.1-1
No ratings yet
Template For Literature Survey Ver.1-1
12 pages
Servicenow Interview Question and Answers
No ratings yet
Servicenow Interview Question and Answers
72 pages
Module 01: Django Introduction
No ratings yet
Module 01: Django Introduction
3 pages