A.
Basic Conceptual Questions
1. What is Exploratory Data Analysis (EDA) and why is it important?
2. What is the difference between EDA and data cleaning?
3. What are the key steps involved in performing EDA?
4. Name the types of data (categorical, numerical) and explain how EDA differs for each.
5. What is the difference between univariate, bivariate, and multivariate analysis?
B. Descriptive Statistics Questions
6. How do mean, median, and mode help in understanding data?
7. What is standard deviation and what does it tell you?
8. When would you prefer using median over mean?
9. What does a high variance in a feature suggest?
10. How can skewness in data affect analysis?
C. Data Cleaning & Handling Questions
11. How do you handle missing values in a dataset?
12. What is an outlier, and how can it be detected and treated?
13. What is the purpose of data normalization or standardization in EDA?
14. How do you treat duplicate records?
15. What are common data types in pandas, and why do they matter during EDA?
D. Visualization and Interpretation Questions
16. Which plot is best suited for showing the distribution of a single numerical variable?
17. How would you visualize the relationship between two numerical variables?
18. What is a boxplot and what insights can you derive from it?
19. How do heatmaps help in EDA?
20. When should you use pie charts and why are they generally discouraged?
E. Real-World Application Questions
21. If you were given a customer transaction dataset, what EDA steps would you perform?
22. In what ways can EDA help a business make data-driven decisions?
23. How would you use EDA to detect seasonality or trends in sales data?
24. What insights can you gain by analyzing time-based features like "Day of Week" or
"Hour of Purchase"?
25. What role does correlation analysis play in EDA?
F. Advanced Thinking Questions
26. How do you choose which features to visualize first?
27. Can visualizations be misleading? Give an example.
28. What are some pitfalls to avoid during EDA?
29. How would you perform EDA on a dataset with thousands of features (e.g., genomic
data)?
30. What tools or libraries do you use for EDA in Python and why?