Python Interview Questions for Business Analyst Role (CarInfo)
Q: How do you load a large CSV file using Python and filter rows based on a condition?
A: Use pandas:
'''python
import pandas as pd
df = pd.read_csv('data.csv')
filtered = df[df['column_name'] > 100]
'''
Q: What's the difference between .loc[] and .iloc[] in Pandas?
A: '.loc[]' is label-based indexing; '.iloc[]' is integer-position based. Example:
'''python
df.loc[1] # by label/index name
df.iloc[1] # by position
'''
Q: How would you group data by a column and calculate aggregate metrics like sum or mean?
A: '''python
df.groupby('category_column')['value_column'].sum()
df.groupby('category_column')['value_column'].mean()
'''
Q: How can you merge two dataframes in Pandas? What types of joins are available?
A: Use 'pd.merge()'. Joins: inner, outer, left, right.
'''python
pd.merge(df1, df2, on='id', how='left')
'''
Q: Explain how you would create a new column based on conditions from other columns.
A: '''python
df['new_col'] = df['col1'].apply(lambda x: 'High' if x > 100 else 'Low')
'''
Q: Can you pivot and unpivot a table using Pandas? When would you do that?
A: 'pivot_table()' reshapes data; use 'melt()' to unpivot.
'''python
pd.pivot_table(df, values='val', index='row', columns='col')
df.melt(id_vars=['id'])
'''
Q: How do you handle missing, duplicate, or outlier values in a dataset?
A: '''python
df.dropna(), df.fillna(0), df.duplicated(), df.drop_duplicates()
# Outliers: use IQR or Z-score
'''
Q: How would you convert a column from string to datetime, and extract parts like day, month?
A: '''python
df['date'] = pd.to_datetime(df['date_str'])
df['day'] = df['date'].dt.day
'''
Q: How can you detect and remove outliers using IQR or Z-score in Python?
A: '''python
Q1 = df['col'].quantile(0.25)
Q3 = df['col'].quantile(0.75)
IQR = Q3 - Q1
filtered = df[(df['col'] >= Q1 - 1.5 * IQR) & (df['col'] <= Q3 + 1.5 * IQR)]
'''
Q: You're asked to replicate a SQL GROUP BY + HAVING clause in Pandas. How would you do it?
A: '''python
grouped = df.groupby('group_col').agg({'val_col': 'sum'})
filtered = grouped[grouped['val_col'] > 100]
'''
Q: How can you perform a window function (like moving average) in Pandas?
A: '''python
df['moving_avg'] = df['value'].rolling(window=3).mean()
'''
Q: Have you ever automated a reporting process in Python? How did you schedule or trigger it?
A: Use 'schedule', 'APScheduler', or cron jobs. Write a script and schedule it with cron (Linux) or Task
Scheduler (Windows).
Q: How would you connect to a SQL database and run a query using Python?
A: '''python
import mysql.connector
conn = mysql.connector.connect(...)
cursor = conn.cursor()
cursor.execute('SELECT * FROM table')
'''
Q: How would you send automated email reports or alert messages from a Python script?
A: Use 'smtplib' or libraries like 'yagmail'. Example:
'''python
import smtplib
# setup SMTP and send
'''
Q: Write a function to calculate the percentage change between two values.
A: '''python
def percent_change(old, new):
return ((new - old) / old) * 100
'''
Q: How do you read a JSON file and extract specific fields?
A: '''python
import json
with open('file.json') as f:
data = json.load(f)
print(data['key'])
'''
Q: What are Python list comprehensions? Give an example.
A: A concise way to create lists:
'''python
squares = [x**2 for x in range(10)]
'''
Q: A dashboard shows drop in user renewals. How will you use Python to investigate the reason?
A: Analyze churn data, compare previous periods, check feature usage, demographics, and user activity.
Q: You notice a spike in app installs but no rise in purchases. What Python scripts would you run to
analyze user behavior?
A: Cohort analysis, funnel drop-off analysis, session tracking, and check conversion ratios.
Q: You need to identify which car brand's insurance sales dropped in the last month. How will you do
this using Python + SQL?
A: Query monthly sales by brand using SQL, load into Pandas, and compare month-over-month sales using
grouping and diff().