5. What is the CRISP-DM framework?
Answer:
“CRISP-DM stands for Cross-Industry Standard Process for Data Mining. It has six stages:
1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Modeling
5. Evaluation
6. Deployment
It helps structure data projects to align with business goals and ensure consistency.”
6. Explain the difference between supervised and unsupervised learning.
Answer:
“In supervised learning, the model learns from labeled data — where both inputs and outputs are known (e.g.,
regression, classification).
In unsupervised learning, the data has no labels, and the goal is to find patterns or structure, like clustering or
association rules.”
7. What’s the difference between a data analyst and a data scientist?
Answer:
A Data Analyst primarily interprets existing data to generate business reports, dashboards, and
insights.
A Data Scientist builds models using machine learning and predictive algorithms, dealing with both
structured and unstructured data to solve complex problems.
8. How do you handle missing data in a dataset?
Answer:
“I first explore the pattern of missingness—whether it’s MCAR, MAR, or MNAR. Depending on the context, I
might:
Remove rows/columns with excessive missing values
Use mean/median/mode imputation
Use advanced techniques like KNN or regression imputation
Or flag them as a separate category (for categorical variables)”
9. What are some common KPIs you might track for a retail business?
Answer:
Customer Lifetime Value (CLV)
Conversion Rate
Average Order Value
Customer Retention Rate
Gross Margin
Inventory Turnover
10. How do you explain a complex model to a non-technical stakeholder?
Answer:
“I use analogies and visualizations to simplify the concept. For example, I compare a decision tree to a series of
yes/no questions leading to an outcome. I focus on the business impact rather than the technical details and often
use dashboards to present results.”
🔹 SECTION C: Tools-Based Questions
11. How proficient are you in Python? What libraries do you use?
Answer:
“I’m comfortable with Python and have used libraries like:
Pandas and NumPy for data manipulation
Matplotlib and Seaborn for visualization
Scikit-learn for machine learning
Statsmodels for statistical analysis
I’ve also used Jupyter Notebooks extensively for presenting work.”
12. How do you use SQL in data analysis?
Answer:
“I use SQL to query and manipulate relational databases. Tasks include joining tables, filtering rows,
aggregating data, and writing subqueries or window functions. For example, in a project analyzing sales, I used
SQL to extract monthly revenue trends by region.”
13. What visualization tools do you know?
Answer:
“I’ve worked with Tableau, Power BI, and Matplotlib/Seaborn in Python. I use them to create interactive
dashboards and data stories to help stakeholders quickly grasp trends, outliers, and KPIs.”
🔹 SECTION D: Scenario-Based / Case Questions
14. How would you approach a business problem like declining sales in a region?
Answer:
“I would:
1. Understand the business context – time frame, regions affected, product categories
2. Collect relevant data – sales, customer feedback, inventory, marketing
3. Perform exploratory analysis
4. Segment the problem – product-wise, geography-wise
5. Identify patterns or anomalies
6. Suggest data-backed solutions such as promotions, pricing, or product adjustments”
15. How do you ensure data quality in your projects?
Answer:
Cleaning and handling missing values
Removing duplicates
Validating data types and ranges
Consistency checks across datasets
Exploratory data analysis to detect anomalies