Machine Learning Viva Guide (Fully
Expanded)
Module 1: Supervised Learning
Supervised learning is one of the main types of machine learning where the model is
trained on labeled data. Labeled data means each input data point has a corresponding
correct output (label). The model learns a mapping function from inputs to outputs, so
that when given new inputs, it can predict the corresponding output.
Example: Email classification as 'spam' or 'not spam'.
Regression vs Classification:
- Regression: Predicts continuous values (e.g., house prices).
- Classification: Predicts discrete classes (e.g., tumor type).
Distance-Based Methods: Classify or cluster data based on distance. Example: KNN.
Nearest Neighbors (KNN): Predicts output based on 'K' closest data points.
Decision Trees: Splits data based on features to predict output.
Naïve Bayes: Uses Bayes theorem assuming feature independence.
Linear Models: Includes Linear Regression, Logistic Regression, and GLM.
Support Vector Machines (SVM): Finds hyperplane separating classes.
Kernel Methods: Projects data to higher dimensions for separability.
Multi-class Classification: Solves problems with more than two classes.
Module 2: Unsupervised Learning
Unsupervised learning finds patterns in data without labels.
K-means Clustering: Divides data into K groups minimizing intra-cluster distance.
PCA: Reduces data dimensions while retaining variance.
Kernel PCA: Non-linear dimensionality reduction.
Matrix Factorization: Predicts missing data, e.g., movie recommendations.
Generative Models: Generate new data based on learned distributions.
Module 3: Model Evaluation & Selection
Evaluation checks model generalization to unseen data.
Cross-Validation: Divides data into folds for reliable estimates.
Ensemble Learning: Combines models for better performance.
Bagging: Trains models on bootstrap samples to reduce variance.
Boosting: Sequentially corrects errors to build stronger models.
Module 4: Sparse Modeling, Sequence Data & Deep Learning
Sparse Modeling: Uses only essential features to avoid overfitting.
Time-Series Data: Data indexed in time order, requires special handling.
Deep Learning: Uses neural networks for complex data patterns.
Feature Representation Learning: Automatically extracts features from raw data.
Module 5: Scalable & Advanced ML Techniques
Scalable ML: Efficiently trains models on large data across machines.
Semi-supervised Learning: Combines small labeled with large unlabeled data.
Active Learning: Model queries for labels on uncertain data points.
Reinforcement Learning: Agent learns via rewards interacting with environment.
Bayesian Learning: Updates beliefs with new evidence using Bayes theorem.
Module 6: Recent Trends in ML
Transfer Learning: Fine-tunes pre-trained models on new tasks.
Federated Learning: Decentralized learning across user devices.
Explainable AI (XAI): Makes AI decisions understandable.
AutoML: Automates model building and tuning.
TinyML: Runs ML models on low-power edge devices.
Important Concepts
Bias: Error due to simplistic assumptions (underfitting).
Variance: Error due to model sensitivity to training data (overfitting).
Underfitting: Model too simple; poor on training and test data.
Overfitting: Model too complex; excellent on training but poor on test data.
Hyperplane: Decision boundary in multi-dimensional space separating classes.