Q1. What is the aim of Data Visualization in Data Science?
a) Making data more confusing
b) Making data understandable (Correct)
c) Hiding data patterns
d) None of the above
Q2. What does Descriptive Statistics aim to do?
a) Predict future trends
b) Summarize data (Correct)
c) Classify data
d) None of the above
Q3. Which visual encoding property can be used to represent a region or area in a map or chart?
a) Color
b) Position
c) Size (Correct)
d) Shape
Q4. What is the primary objective of "feature engineering" in data analysis?
a) Preprocessing data for analysis
b) Generating additional data points
c) Enhancing the performance of machine learning models (Correct)
d) Data collection and retrieval
Q5. In the context of data collection, what does the term "bias" refer to?
a) The accuracy of the collected data
b) The extent to which collected data represents the true population (Correct)
c) The variation in data values
d) The size of the dataset
Q6. Which industry has not seen significant applications of data science?
a) Finance
b) Education
c) Retail
d) Space Exploration (Correct)
Q7. How does ROC-AUC score interpret model performance?
a) By measuring area under the curve (Correct)
b) By calculating model accuracy
c) By testing model assumptions
d) None of the above
Q8. In data visualization, what is the purpose of data encoding?
a) Transferring data between computers
b) Making data visually interpretable (Correct)
c) Encrypting sensitive data
d) Deleting unnecessary data
Q9. What is the role of Mode in Descriptive Statistics?
a) Measure of central tendency (Correct)
b) Measure of data spread
c) Measure of data variability
d) None of the above
Q10. How does Data Cleaning affect data analysis?
a) By reducing data quality
b) By improving data accuracy (Correct)
c) By increasing data complexity
d) None of the above
Q11. Which of the following is an example of nominal data?
a) Temperature in Celsius
b) Colors (e.g., red, blue, green) (Correct)
c) Age of individuals
d) Time of day
Q12. A positively skewed distribution has its tail pointing:
a) Left
b) Right (Correct)
c) Upward
d) Downward
Q13. Which evaluation metric is most suitable for regression tasks when dealing with outliers?
a) Mean Absolute Error (MAE) (Correct)
b) Mean Squared Error (MSE)
c) R-squared (R2)
d) Root Mean Squared Error (RMSE)
Q14. Which technique is used for handling missing data in a dataset?
a) Data Discretization
b) Data Normalization
c) Data Imputation (Correct)
d) Data Sampling
Q15. What is the term for the process of examining data to discover patterns, anomalies, or insights?
a) Recording
b) Data cleaning
c) Data preprocessing
d) Data exploration (Correct)
Q16. What is the first step in the data science process?
a) Data modeling
b) Data collection (Correct)
c) Data visualization
d) Data interpretation
Q17. Where might multicollinearity occur in multiple regression?
a) When independent variables are highly correlated (Correct)
b) When the dependent variable has outliers
c) When there are too few data points
d) When residuals are normally distributed
Q18. What is the most suitable data encoding for representing data using a map?
a) Color (Correct)
b) Shape
c) Texture
d) Position
Q19. What is the purpose of Confusion Matrix in model evaluation?
a) To create confusion among users
b) To visualize model performance (Correct)
c) To confuse model predictions
d) None of the above
Q20. In a normal distribution, what percentage of data falls within one standard deviation of the mean?
a) Approximately 34%
b) Approximately 68% (Correct)
c) Approximately 95%
d) Approximately 99.7%
Q21. What is the purpose of a Heat Map in Data Analysis?
a) To visualize correlations (Correct)
b) To plot distributions
c) To display frequencies
d) None of the above
Q22. Which of the following is a limitation of using variance as a measure of data spread?
a) It is sensitive to outliers (Correct)
b) It is not affected by the range of data values
c) It always results in a value of 1
d) It is difficult to calculate
Q23. In the Central Limit Theorem (CLT), what is the main benefit of having a large sample size?
a) It guarantees a perfect normal distribution
b) It eliminates the need for statistical analysis
c) It makes the sampling distribution of the sample mean more normal (Correct)
d) It reduces the sample mean to zero
Q24. Polynomial regression is used when:
a) There is a linear relationship between variables
b) Data is best described by a straight line
c) The relationship between variables is nonlinear (Correct)
d) There is no relationship between variables
Q25. In Bokeh, what is a "callback"?
a) A type of error message
b) A function that responds to events triggered by user interactions (Correct)
c) A data storage format
d) A type of data encoding
Q26. What does the term "Underfitting" mean in model evaluation?
a) Model is too complex
b) Model fits training data too closely
c) Model is too simple (Correct)
d) None of the above
Q27. What is the primary goal of data pre-processing?
a) To reduce the size of the dataset
b) To make data more visually appealing
c) To prepare data for analysis by addressing quality and structure issues (Correct)
d) To create new data from existing data