Lecture 1: Introduction to Machine Learning -
MCQ Study Guide
Key Concepts Explained Simply
What is Machine Learning?
Machine learning is teaching computers to learn from data without being ex-
plicitly programmed. Instead of writing specific rules, we show the computer
examples and it learns patterns.
Types of Machine Learning
1. Supervised Learning
• What it is: Learning from labeled data (data with correct answers)
• Goal: Predict labels for new data
• Example: Predicting house prices based on features like size, location
• Common algorithms: Linear regression, logistic regression, decision
trees
2. Unsupervised Learning
• What it is: Learning from unlabeled data
• Goal: Find patterns or structure in data
• Example: Grouping customers with similar buying habits
• Common algorithms: K-means clustering, hierarchical clustering, DB-
SCAN
3. Semi-supervised Learning
• What it is: Learning from both labeled and unlabeled data
• When to use: When labeling data is expensive or time-consuming
4. Reinforcement Learning
• What it is: Learning through trial and error with rewards/penalties
• Example: Teaching a computer to play games
The Machine Learning Process
1. Define the problem: What are you trying to predict or understand?
2. Collect data: Gather relevant information
3. Prepare data: Clean, transform, and split into training/testing sets
4. Choose a model: Select an algorithm appropriate for your problem
5. Train the model: Feed the training data to the algorithm
6. Evaluate the model: Test on new data
7. Tune the model: Adjust parameters to improve performance
1
8. Deploy the model: Use it in the real world
MCQ Practice Questions
Question 1
Which type of machine learning would be most appropriate for clus-
tering customer segments without prior labels? - A) Supervised Learning
- B) Unsupervised Learning - C) Reinforcement Learning - D) Transfer Learning
Answer: B) Unsupervised Learning
Explanation: Unsupervised learning is used when we don’t have labeled data
and want to find patterns or groups in the data. Clustering is a typical unsu-
pervised learning task.
Question 2
In the machine learning process, what comes immediately after data
preprocessing? - A) Data Collection - B) Model Selection - C) Model Deploy-
ment - D) Model Evaluation
Answer: B) Model Selection
Explanation: The typical machine learning workflow is: Define problem →
Collect data → Preprocess data → Select model → Train model → Evaluate
model → Tune model → Deploy model.
Question 3
What is the main difference between classification and regression in
supervised learning? - A) Classification uses neural networks while regres-
sion uses decision trees - B) Classification predicts categorical outputs while re-
gression predicts continuous values - C) Classification requires more data than
regression - D) Classification is supervised while regression is unsupervised
Answer: B) Classification predicts categorical outputs while regression predicts
continuous values
Explanation: Classification assigns data to categories (e.g., spam/not spam),
while regression predicts continuous numerical values (e.g., house prices).
Question 4
Which of the following is NOT a step in the machine learning process?
- A) Data Collection - B) Feature Engineering - C) Database Normalization -
D) Model Evaluation
Answer: C) Database Normalization
2
Explanation: Database normalization is a concept from database design, not
a standard step in the machine learning process.
Question 5
What does high variance in a machine learning model typically indi-
cate? - A) The model is underfitting the data - B) The model is overfitting the
data - C) The model has perfect balance - D) The model has too few parameters
Answer: B) The model is overfitting the data
Explanation: High variance means the model is too sensitive to the training
data and doesn’t generalize well to new data, which is characteristic of overfit-
ting.
Question 6
Metric Used For Type Which of the following is a common evaluation metric for regression
Accuracy ClassificationCategorical problems? - A) Accuracy - B) Precision - C) Mean Squared Error (MSE) - D)
Precision ClassificationCategorical
MSE Regression Numerical
F1 Score
F1 Score ClassificationCategorical
Answer: C) Mean Squared Error (MSE)
Explanation: MSE measures the average squared difference between predicted
and actual values, making it appropriate for regression problems where we pre-
dict continuous values.
Question 7
In machine learning, what does the term “feature” refer to? - A)
The algorithm used to train the model - B) The output variable we’re trying to
predict - C) An input variable or attribute used for prediction - D) The accuracy
of the model
Answer: C) An input variable or attribute used for prediction
Explanation: Features are the input variables or attributes that the model
uses to make predictions. For example, in house price prediction, features might
include square footage, number of bedrooms, etc.
Calculation Problems
Problem 1: Train-Test Split
If you have a dataset with 1000 instances and use a 70-30 train-test
split, how many instances will be in your training set and test set?
Solution: - Training set: 1000 × 0.7 = 700 instances - Test set: 1000 × 0.3 =
300 instances
3
Problem 2: Accuracy Calculation
A classification model made the following predictions on a test set: -
True Positives (TP): 150 - True Negatives (TN): 200 - False Positives
(FP): 50 - False Negatives (FN): 100 What is the accuracy of this
model?
Solution: Accuracy = (TP + TN) / (TP + TN + FP + FN) Accuracy = (150
+ 200) / (150 + 200 + 50 + 100) Accuracy = 350 / 500 = 0.7 or 70%
Problem 3: Bias-Variance Tradeoff
If a model has high bias, which of the following is most likely true? - High Bias = Model is too simple
A) The model is too complex and overfits the training data - B) The model is too Effect: Ignores important patterns in the
data
simple and underfits the training data - C) The model has the right complexity
Leads to: Underfitting — poor
and generalizes well - D) The model needs more training data performance on both training and test
data
Answer: B) The model is too simple and underfits the training data Example: Using a straight line to fit
curved data
Explanation: High bias indicates that the model makes strong assumptions
about the data and is too simple to capture the underlying patterns, leading to
underfitting.
Key Formulas to Remember
1. Accuracy: (TP + TN) / (TP + TN + FP + FN)
2. Precision: TP / (TP + FP)
3. Recall: TP / (TP + FN)
4. F1 Score: 2 × (Precision × Recall) / (Precision + Recall)
5. Mean Squared Error: (1/n) × Σ(actual - predicted)²
6. Mean Absolute Error: (1/n) × Σ|actual - predicted|
Tips for MCQ Questions
1. Understand the terminology: Many MCQ questions test your under-
standing of basic terms.
2. Know the algorithms: Be familiar with which algorithms are used for
which types of problems.
3. Remember the process: Know the steps of the machine learning work-
flow in order.
4. Practice calculations: Be comfortable with simple calculations for eval-
uation metrics.
5. Identify the type of problem: Determine if a problem is classification,
regression, clustering, etc.