Data Science quiz questions
1. What is data science primarily concerned with?
- A) Collecting data
- B) Analyzing data
- C) Storing data
- D) Creating data
**Answer: B) Analyzing data**
2. Which of the following is NOT a step in the data science process?
- A) Data collection
- B) Data cleaning
- C) Data deletion
- D) Data analysis
**Answer: C) Data deletion**
3. Which statistical measure describes the central tendency of a dataset?
- A) Mean
- B) Median
- C) Mode
- D) Range
**Answer: A) Mean**
4. What is the process of converting raw data into a more structured format
called?
- A) Data wrangling
- B) Data munging
- C) Data manipulation
- D) Data cleansing
**Answer: A) Data wrangling**
5. Which programming language is widely used for data analysis and
manipulation?
- A) Python
- B) Java
- C) C++
- D) HTML
**Answer: A) Python**
6. What type of analysis is used to identify patterns or relationships in data?
- A) Descriptive analysis
- B) Predictive analysis
- C) Inferential analysis
- D) Exploratory analysis
**Answer: D) Exploratory analysis**
7. Which type of data visualization is suitable for displaying trends over time?
- A) Scatter plot
- B) Histogram
- C) Line chart
- D) Pie chart
**Answer: C) Line chart**
8. What is the process of filling in missing data in a dataset called?
- A) Data integration
- B) Data imputation
- C) Data interpolation
- D) Data extrapolation
**Answer: B) Data imputation**
9. What is the term for a statistical model used to predict future outcomes based
on historical data?
- A) Descriptive model
- B) Predictive model
- C) Inferential model
- D) Exploratory model
**Answer: B) Predictive model**
10. Which of the following is NOT a supervised learning algorithm?
- A) Linear regression
- B) K-means clustering
- C) Decision tree
- D) Support vector machine
**Answer: B) K-means clustering**
11. What is the measure of a model's performance in making correct predictions
called?
- A) Accuracy
- B) Precision
- C) Recall
- D) F1 score
**Answer: A) Accuracy**
12. Which method is used to evaluate the performance of a classification
model?
- A) Confusion matrix
- B) Root mean squared error
- C) Mean absolute error
- D) R-squared
**Answer: A) Confusion matrix**
13. Which technique is used to reduce the dimensionality of a dataset?
- A) Principal component analysis (PCA)
- B) Linear regression
- C) Random forest
- D) Gradient boosting
**Answer: A) Principal component analysis (PCA)**
14. What type of learning algorithm does not require labeled training data?
- A) Supervised learning
- B) Unsupervised learning
- C) Semi-supervised learning
- D) Reinforcement learning
**Answer: B) Unsupervised learning**
15. Which algorithm is used for finding frequent itemsets in transactional
databases?
- A) Apriori
- B) K-nearest neighbors
- C) Naive Bayes
- D) Gradient descent
**Answer: A) Apriori**
16. What is the process of transforming categorical variables into numerical
values called?
- A) Feature engineering
- B) One-hot encoding
- C) Label encoding
- D) Normalization
**Answer: B) One-hot encoding**
17. What is the term for an extreme value that falls far from the majority of
other data points?
- A) Outlier
- B) Anomaly
- C) Noise
- D) Deviation
**Answer: A) Outlier**
18. Which method is used for dividing a dataset into training and testing sets?
- A) Cross-validation
- B) Holdout method
- C) Stratified sampling
- D) Bootstrap method
**Answer: B) Holdout method**
19. What is the process of scaling numerical features to a standard range called?
- A) Standardization
- B) Normalization
- C) Min-max scaling
- D) Feature scaling
**Answer: A) Standardization**
20. Which technique is used to address the problem of overfitting in machine
learning models?
- A) Regularization
- B) Data augmentation
- C) Dropout
- D) Ensemble learning
**Answer: A) Regularization**