MACHINE LEARNING QUESTION BANK
UNIT 1
1. Define Machine Learning. Explain key tasks in machine learning with examples.
2. Define Machine Learning. Discuss its importance, essential concepts, and key terminology with
appropriate examples.
3. What are the essential steps in developing a machine learning application?
4. Explain Supervised Learning with the help of the k-Nearest Neighbours (k-NN) algorithm.
5. Differentiate between binary classification and multi-label classification.
UNIT 2
6. Explain the process of normalizing numeric values and why it is important.
7. Explain with example the process of creating scatter plots using Matplotlib.
8. Explain the construction of Decision Trees step-by-step, including tree plotting in Python.
9. What is Information Gain? How is it used in Decision Trees?
10. Explain Naïve Bayesian Decision Theory with an example.
11. How do you classify text using Naïve Bayes? Explain with an example.
UNIT 3
12. What is the basic goal of a Support Vector Machine (SVM)?
13. Explain the concepts of Support Vectors, Hyperplane, and Margin in SVM.
14. Describe the working of an SVM classifier with a simple diagram.
15. Compare and contrast different SVM Kernels: Linear, Polynomial, and RBF kernels.
16. What are the advantages and disadvantages of SVM classifiers?
17. Explain the concept of Hyper parameter tuning in SVM.
18. Describe the role of Kernel functions in SVM. Explain different types of Kernels with real-world
examples (Linear, Polynomial, RBF).
UNIT 4
19. What is a Recommender System? Give examples.
20. What is a Recommender System? Discuss its types, importance, and real-life applications in e-
commerce and entertainment
21. Explain Item-Based Collaborative Filtering. Describe the process with an example of a movie or
product recommendation system.
Differentiate between Content-Based Filtering and User-Based Collaborative Filtering.
22. What are the challenges or issues faced in Recommendation Systems?
23. Explain Item-Based Collaborative Filtering with an example.
MCQ with Answers
1. Machine learning is a subfield of:
a) Biology
b) Computer Science
c) Physics
d) Chemistry
2. Which of the following is NOT a key task of machine learning?
a) Classification
b) Regression
c) Replication
d) Clustering
3. In supervised learning, the training data must have:
a) Only input features
b) No labels
c) Input-output pairs
d) Only outputs
4. k-Nearest Neighbours is an example of:
a) Unsupervised Learning
b) Reinforcement Learning
c) Supervised Learning
d) Deep Learning
5. Which step comes first in developing a machine learning application?
a) Model Training
b) Data Collection
c) Model Deployment
d) Model Evaluation
6. Multi-label classification deals with:
a) One label per instance
b) Multiple labels per instance
c) No labels
d) Only numerical labels
7. Machine learning algorithm that predicts a discrete label is called:
a) Regression
b) Classification
c) Clustering
d) Dimensionality reduction
8. Which of these is an application of machine learning?
a) Handwriting recognition
b) Typing text
c) PowerPoint Presentation
d) Static Website
9. Binary classification predicts:
a) Multiple categories
b) A yes or no outcome
c) Only numerical output
d) A continuous range
10. The process of evaluating model accuracy is called:
a) Training
b) Testing
c) Clustering
d) Labeling
11. In machine learning, "feature" refers to:
a) The output variable
b) A property or characteristic of the data
c) A label
d) The target
12. k-NN algorithm uses what kind of distance calculation?
a) Manhattan distance
b) Euclidean distance
c) Cosine similarity
d) Hamming distance
13. Which Python library is commonly used for creating scatter plots?
a) NumPy
b) Matplotlib
c) Scikit-learn
d) TensorFlow
14. Normalizing data is necessary because:
a) It improves plotting
b) It speeds up training and increases accuracy
c) It decorates the data
d) It is mandatory by Python
15. In a decision tree, splitting of nodes is done using:
a) Randomness
b) Entropy and Information Gain
c) Graphs
d) Neural Networks
16. Entropy measures:
a) Homogeneity of data
b) Model accuracy
c) Size of dataset
d) Speed of learning
17. Gini index is used in:
a) SVM
b) Decision Trees
c) Recommender Systems
d) Naïve Bayes
18. Information Gain is the reduction in:
a) Accuracy
b) Entropy
c) Size
d) Memory
19. A decision tree ends at:
a) An input node
b) A leaf node
c) A root node
d) A branch node
20. Naïve Bayes classifier assumes:
a) All features are dependent
b) Features are independent
c) Features are related
d) Features are random
21. Naïve Bayes is based on which principle?
a) Maximum Likelihood
b) Bayes' Theorem
c) Regression Theory
d) SVM Theory
22. The conditional probability P(A|B) is:
a) Probability of A given B has occurred
b) Probability of B given A has occurred
c) Probability of A and B together
d) Probability of neither A nor B
23. Which model works well for text classification problems?
a) SVM
b) Naïve Bayes
c) k-NN
d) Decision Tree
24. The "Tree construction" process in Decision Trees involves:
a) Random guessing
b) Repeated splitting based on attributes
c) Linear approximation
d) Merging clusters
25. Testing and storing classifiers in Python can be done using:
a) Matplotlib
b) Pickle
c) TensorFlow
d) NumPy
26. Which technique is best suited for classifying spam emails?
a) Decision Trees
b) Naïve Bayes
c) k-NN
d) Linear Regression
27. SVM stands for:
a) Super Vision Machine
b) Support Vector Machine
c) Simple Vector Machine
d) Support Vision Model
28. The goal of SVM is to:
a) Reduce accuracy
b) Maximize margin between classes
c) Minimize features
d) Increase entropy
29. A hyperplane is:
a) A random plane
b) A line separating classes
c) A plane maximizing margin
d) A point in space
30. Support Vectors are:
a) All points in dataset
b) Points lying closest to hyperplane
c) Points furthest from hyperplane
d) Outliers
31. Margin in SVM refers to:
a) The data size
b) Distance between support vectors and hyperplane
c) Distance between data points
d) Weight of a feature
32. Which of these is NOT a kernel in SVM?
a) Linear Kernel
b) Polynomial Kernel
c) RBF Kernel
d) Logistic Kernel
33. Radial Basis Function (RBF) is commonly used because:
a) It creates linear boundaries
b) It handles non-linear relationships
c) It reduces the size of the dataset
d) It increases noise
34. Kernel trick helps SVM to:
a) Perform linear classification
b) Map data to higher-dimensional space
c) Minimize training time
d) Remove missing values
35. SVM can suffer when:
a) Data is linearly separable
b) Data is noisy
c) Data is normalized
d) Data is small
36. Hyperparameter C in SVM controls:
a) Size of dataset
b) Trade-off between margin and misclassification
c) Learning rate
d) Feature selection
37. Regularization in SVM helps to:
a) Overfit the model
b) Balance bias and variance
c) Reduce model accuracy
d) Increase noise
38. Bias-variance tradeoff affects:
a) Only training data
b) Only test data
c) Model generalization
d) Data normalization
39. In SVM, high bias indicates:
a) Underfitting
b) Overfitting
c) Noise
d) Balanced model
40. Pros of SVM include:
a) Works well with unstructured data
b) Efficient memory usage
c) Very simple model structure
d) Very fast training on huge datasets
41. Recommender systems are used to:
a) Delete user data
b) Suggest items users may like
c) Increase storage
d) Train models faster
42. In Content-Based Filtering, recommendations are made based on:
a) User similarity
b) Item features
c) Random choice
d) User IDs
43. User-Based Collaborative Filtering relies on:
a) Item similarities
b) User preferences
c) Random user selection
d) Content information
44. Cold start problem in recommender systems occurs when:
a) New items or users are introduced
b) Internet is slow
c) Server crashes
d) Algorithm is wrong
45. Item-based collaborative filtering is best when:
a) Items have many ratings
b) Few users exist
c) Items are unknown
d) Random predictions are acceptable
46. Collaborative filtering requires:
a) Detailed item descriptions
b) Ratings or interactions by users
c) No prior data
d) Only item names
47. One major issue in recommendation systems is:
a) Cold start
b) High entropy
c) Hyperplane optimization
d) Gini index selection
48. Matrix factorization is commonly used in:
a) Decision Trees
b) Collaborative Filtering
c) k-NN
d) Entropy Calculation
49. In recommender systems, implicit feedback is:
a) Given through ratings
b) Given through user behavior like clicks
c) Manually typed
d) Collected from surveys
50. Scalability is a challenge in recommender systems when:
a) The number of users/items grows very large
b) Dataset is empty
c) Single-user systems exist
d) Only few recommendations are needed