KEMBAR78
Pandas | PDF | Time Series | Moving Average
0% found this document useful (0 votes)
17 views13 pages

Pandas

Uploaded by

bhushan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views13 pages

Pandas

Uploaded by

bhushan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 13

Perfect! Let’s start from the first section: Basics of Pandas.

I’ll cover the key topics,


explanations, and example usages so you can revise efficiently.

1️⃣ Pandas Basics

a) Series and DataFrames

 Series: 1D labeled array

 import pandas as pd

 s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

 print(s['b']) # Output: 20

 DataFrame: 2D labeled data structure (like Excel table)

 data = {'Name': ['Jay', 'Ali'], 'Age': [23, 25]}

 df = pd.DataFrame(data)

 print(df)

b) Reading and Writing Data

 Read data:

 df = pd.read_csv('file.csv')

 df = pd.read_excel('file.xlsx')

 df = pd.read_json('file.json')

 Write data:

 df.to_csv('output.csv', index=False)

 df.to_excel('output.xlsx', index=False)

 df.to_json('output.json')

c) Inspecting Data

 df.head(n) → first n rows

 df.tail(n) → last n rows

 df.info() → data types, non-null counts


 df.shape → (rows, columns)

 df.describe() → summary statistics

d) Selecting Rows and Columns

 Select a column: df['Name'] or df.Name

 Select multiple columns: df[['Name', 'Age']]

 Select rows by index: df.iloc[0] → first row

 Select rows by label: df.loc[0] → first row (if index is default numeric)

e) Adding, Renaming, Dropping Columns

 Add a column:

 df['Salary'] = [50000, 60000]

 Rename a column:

 df.rename(columns={'Salary': 'Income'}, inplace=True)

 Drop column/row:

 df.drop('Income', axis=1, inplace=True) # drop column

 df.drop(0, axis=0, inplace=True) # drop row

This covers all the basic operations you must know before moving to intermediate topics.

I can now move to 2️⃣ Indexing and Selection, which is heavily asked in interviews, with
examples and tricks.

Do you want me to continue with that?

Great! Let’s move on to 2️⃣ Indexing and Selection in Pandas—this is very important for
interviews.

2️⃣ Indexing and Selection

a) Boolean Indexing / Filtering


 Select rows based on conditions:

import pandas as pd

data = {'Name': ['Jay', 'Ali', 'Mia'], 'Age': [23, 25, 22]}

df = pd.DataFrame(data)

# Age greater than 23

df_filtered = df[df['Age'] > 23]

print(df_filtered)

b) Conditional Selection

 Combine multiple conditions using & (and) / | (or):

df_filtered = df[(df['Age'] > 22) & (df['Name'] != 'Mia')]

print(df_filtered)

c) Using .isin()

 Filter rows where a column’s value is in a list:

df_filtered = df[df['Name'].isin(['Jay', 'Mia'])]

d) Using .between()

 Filter rows where values lie in a range:

df_filtered = df[df['Age'].between(22, 24)]

e) Using .query()

 Filter using a query string (interviewers like this):

df_filtered = df.query('Age > 22 and Name != "Mia"')


f) .loc[] and .iloc[]

 .loc[] → label-based selection:

df.loc[0, 'Name'] # row 0, column 'Name'

df.loc[:, ['Name', 'Age']] # all rows, selected columns

 .iloc[] → integer index-based selection:

df.iloc[0, 1] # first row, second column

df.iloc[0:2, 0:2] # slice rows and columns

g) Setting and Resetting Index

 Change the index of DataFrame:

df.set_index('Name', inplace=True)

df.reset_index(inplace=True)

h) MultiIndex / Hierarchical Indexing

 Used when DataFrame has multiple levels of index:

arrays = [['A', 'A', 'B'], [1, 2, 1]]

index = pd.MultiIndex.from_arrays(arrays, names=('letter', 'number'))

df_multi = pd.DataFrame({'Value': [10, 20, 30]}, index=index)

✅ Key Interview Tips:

 Know the difference between .loc and .iloc.

 Practice filtering with multiple conditions.

 .query() is a neat trick often appreciated in interviews.

 Be ready for MultiIndex questions if you’re applying for data-heavy roles.


I can now continue with 3️⃣ Data Cleaning, which is another frequent interview topic.

Do you want me to continue?

Perfect! Let’s move on to 3️⃣ Data Cleaning in Pandas—this is heavily asked in interviews
because real-world data is often messy.

3️⃣ Data Cleaning

a) Handling Missing Data

 Detect missing values:

df.isnull() # Returns True for NaN values

df.notnull() # Returns True for non-NaN

 Drop missing values:

df.dropna(axis=0, inplace=True) # Drop rows with NaN

df.dropna(axis=1, inplace=True) # Drop columns with NaN

 Fill missing values:

df['Age'].fillna(0, inplace=True) # Fill with constant

df['Age'].fillna(df['Age'].mean(), inplace=True) # Fill with mean

b) Handling Duplicates

 Check duplicates:

df.duplicated()

 Drop duplicates:

df.drop_duplicates(inplace=True)

c) Type Conversion
 Convert column types:

df['Age'] = df['Age'].astype(int)

df['Salary'] = df['Salary'].astype(float)

d) String Operations

 Many interview questions involve cleaning string columns:

df['Name'].str.upper() # Convert to uppercase

df['Name'].str.lower() # Convert to lowercase

df['Name'].str.contains('Jay') # Check if contains substring

df['Name'].str.replace(' ', '_') # Replace characters

e) Renaming Columns and Values

 Rename columns:

df.rename(columns={'Age': 'Years'}, inplace=True)

 Replace values in a column:

df['Department'].replace({'HR': 'Human Resources'}, inplace=True)

f) Removing Whitespace

 Trim spaces from string columns:

df['Name'] = df['Name'].str.strip()

✅ Key Interview Tips:

 Expect questions like: “How do you handle missing values?” or “How do you remove
duplicates?”

 Practice filling NaN values using group-wise mean or median.

 Be ready to clean string columns using str methods.


I can now continue with 4️⃣ Data Transformation, which includes sorting, grouping, merging,
and reshaping—another very important topic.

Do you want me to continue?

Great! Let’s move on to 4️⃣ Data Transformation, a core topic for interviews.

4️⃣ Data Transformation

a) Sorting

 Sort by values:

df.sort_values(by='Age', ascending=True, inplace=True)

 Sort by index:

df.sort_index(ascending=False, inplace=True)

b) Aggregation / Grouping

 Group by column and aggregate:

df.groupby('Department')['Salary'].mean()

df.groupby(['Department', 'Gender'])['Salary'].sum()

 Using multiple aggregation functions:

df.groupby('Department')['Salary'].agg(['mean', 'max', 'min'])

c) Pivot Tables

 Reshape data for aggregation:

pd.pivot_table(df, index='Department', columns='Gender', values='Salary', aggfunc='mean')

 Melt → Convert wide data to long format:


pd.melt(df, id_vars=['Name'], value_vars=['Math', 'Science'])

d) Apply, Map, Applymap, Lambda Functions

 Apply → Apply function to rows/columns:

df['Bonus'] = df['Salary'].apply(lambda x: x*0.1)

 Map → Map values in a Series:

df['Department'] = df['Department'].map({'HR': 'Human Resources'})

 Applymap → Apply function element-wise to entire DataFrame:

df.applymap(lambda x: x*2 if isinstance(x,int) else x)

e) Merging and Joining

 Merge two DataFrames (like SQL JOIN):

pd.merge(df1, df2, on='EmployeeID', how='inner') # inner, left, right, outer

 Concatenate DataFrames:

pd.concat([df1, df2], axis=0) # vertical

pd.concat([df1, df2], axis=1) # horizontal

 Join → Join on index:

df1.join(df2, how='left')

f) Reshaping

 Stack / Unstack → Pivot rows into columns and vice versa:

df_stack = df.stack()

df_unstack = df_stack.unstack()

✅ Key Interview Tips:

 Grouping and pivot tables are very commonly asked.


 Know the difference between merge, concat, and join.

 Practice applying functions using lambda for quick transformations.

 Be prepared for real-world tasks, like computing average sales per category, or top 3
performers per department.

I can now continue with 5️⃣ Statistical & Analytical Functions, which includes summary
statistics, correlation, value counts, etc.—very commonly asked in interviews.

Do you want me to continue?

Perfect! Let’s move on to 5️⃣ Statistical & Analytical Functions in Pandas—these are very
frequently asked in interviews.

5️⃣ Statistical & Analytical Functions

a) Summary Statistics

 Basic stats on numeric columns:

df['Salary'].mean() # Mean

df['Salary'].median() # Median

df['Salary'].sum() # Sum

df['Salary'].min() # Minimum

df['Salary'].max() # Maximum

df['Salary'].count() # Non-null count

 Descriptive statistics for all numeric columns:

df.describe()

b) Correlation & Covariance

 Correlation → Measures linear relationship between columns:


df.corr() # Pearson correlation by default

 Covariance → Measures how two variables change together:

df.cov()

c) Value Counts

 Count occurrences of each value in a column:

df['Department'].value_counts()

 Include missing values:

df['Department'].value_counts(dropna=False)

d) Unique and Distinct Values

 Get unique values:

df['Department'].unique()

 Count unique values:

df['Department'].nunique()

e) Ranking & Sorting Within Groups

 Rank values:

df['SalaryRank'] = df['Salary'].rank(ascending=False)

 Sort within groups:

df.groupby('Department')['Salary'].rank(ascending=False)

f) Cumulative & Rolling Functions

 Cumulative sum/product:

df['CumSum'] = df['Salary'].cumsum()

 Rolling average:
df['RollingAvg'] = df['Salary'].rolling(window=3).mean()

✅ Key Interview Tips:

 Questions often involve finding top/bottom values, averages, and counts.

 Be ready to combine groupby with aggregation for real-world scenarios.

 Correlation and covariance may be asked for basic data analysis understanding.

 Practice value_counts, unique, and nunique for categorical data questions.

I can now continue with 6️⃣ Time Series in Pandas, which is highly relevant for data roles and
often comes in interviews.

Do you want me to continue?

Great! Let’s move on to 6️⃣ Time Series in Pandas, which is very important for roles involving
temporal data.

6️⃣ Time Series in Pandas

a) DateTime Conversion

 Convert strings to datetime objects:

df['Date'] = pd.to_datetime(df['Date'])

 Extract datetime components:

df['Year'] = df['Date'].dt.year

df['Month'] = df['Date'].dt.month

df['Day'] = df['Date'].dt.day

df['Weekday'] = df['Date'].dt.weekday # 0=Monday, 6=Sunday

df['DayName'] = df['Date'].dt.day_name()
b) Setting Date as Index

 Makes time series operations easier:

df.set_index('Date', inplace=True)

c) Resampling

 Aggregate data at different time frequencies:

df.resample('M').mean() # Monthly mean

df.resample('W').sum() # Weekly sum

 Frequency codes:

o D → day, W → week, M → month, Q → quarter, Y → year, H → hour

d) Shifting / Lagging

 Shift data to compare with previous values:

df['PrevMonthSales'] = df['Sales'].shift(1)

df['Diff'] = df['Sales'] - df['PrevMonthSales']

e) Rolling / Moving Windows

 Compute rolling statistics:

df['RollingAvg'] = df['Sales'].rolling(window=3).mean()

df['RollingSum'] = df['Sales'].rolling(window=3).sum()

 Useful for smoothing and trend analysis.

f) Time-based Filtering

 Filter by date ranges:

df['2025-01-01':'2025-03-31'] # Select rows between dates

df[df.index.year == 2025] # Select specific year


df[df.index.month == 8] # Select specific month

✅ Key Interview Tips:

 Expect questions like:

o “How do you resample daily data to monthly?”

o “How do you calculate moving averages?”

o “How do you extract the day, month, or year from a date column?”

 Practice lag/shift and rolling functions; they’re common in time series interviews.

 Setting datetime as index is crucial for efficient time series operations.

I can now continue with 7️⃣ Visualization with Pandas, which is simpler but often comes in
interviews for quick plotting and insights.

Do you want me to continue?

You might also like