Here is a complete and in-depth guide on Support Vector Machines
(SVM) tailored for your PG-DBDA Practical Machine Learning module. These
notes cover everything from basic concepts to exam-ready technical details, with
highlights for MCQs, theory, and lab viva questions.
🧠 SUPPORT VECTOR MACHINE (SVM) – COMPLETE
NOTES
✅ 1. What is SVM?
Supervised learning algorithm used for both classification and
regression.
Finds the best separating hyperplane that maximizes the margin
between classes.
Effective in high-dimensional spaces and non-linear classification
(with kernel trick).
✅ 2. Key Concepts
Term Meaning
Hyperplane Decision boundary that separates classes
Distance between hyperplane and nearest points (support
Margin
vectors)
Data points closest to hyperplane – they influence the
Support Vectors
position/orientation
Maximum
SVM chooses the hyperplane with the largest margin
Margin
✅ 3. SVM Objective (Hard Margin)
For linearly separable data:
Minimize 12∥w∥2subject to yi(w⋅xi+b)≥1\text{Minimize } \frac{1}{2} \|w\|^2 \
quad \text{subject to } y_i(w \cdot x_i + b) \geq 1
Where:
ww: weight vector
bb: bias
xix_i: data points
yi∈{−1,+1}y_i \in \{-1, +1\}: class labels
✅ 4. Soft Margin SVM (C parameter)
Used when data is not perfectly separable.
Minimize 12∥w∥2+C∑ξi\text{Minimize } \frac{1}{2} \|w\|^2 + C \sum \xi_i
ξi\xi_i: slack variable (margin violation)
C: regularization parameter
Large C → low bias, high variance (overfitting)
Small C → high bias, low variance (underfitting)
🧠 MCQ Tip: C is a penalty for misclassification.
✅ 5. Kernel Trick
SVM uses kernels to handle non-linear data by mapping to higher dimensions.
Common Kernels:
Kernel Formula / Notes Use When…
Data is linearly
Linear K(x,x′)=x⋅x′K(x, x') = x \cdot x'
separable
K(x,x′)=(x⋅x′+c)dK(x, x') = (x \cdot x' + Feature interactions
Polynomial
c)^d important
RBF K(x,x′)=exp(−γ∥x−x′∥2)K(x, x') = \exp(-\
Most commonly used
(Gaussian) gamma \|x - x'\|^2)
Rare, inspired by neural
Sigmoid tanh(αx⋅x′+c)\tanh(\alpha x \cdot x' + c)
nets
γ (gamma): Controls shape of decision boundary in RBF
High γ → overfitting (tight boundaries)
Low γ → underfitting (loose boundaries)
✅ 6. SVM for Regression (SVR)
Predicts a continuous value within an ε-margin of tolerance.
∣yi−f(xi)∣≤ϵ|y_i - f(x_i)| \leq \epsilon
Only points outside ε-tube are considered in loss function.
✅ 7. Advantages
✅ Effective in high-dimensional spaces
✅ Works well with clear margin of separation
✅ Memory efficient (uses only support vectors)
✅ Versatile (supports kernels)
✅ 8. Disadvantages
❌ Not suitable for very large datasets (slow training)
❌ Requires feature scaling
❌ Choosing right kernel and parameters is tricky
❌ Poor performance with noisy data and overlapping classes
✅ 9. SVM vs Logistic Regression
Feature SVM Logistic Regression
Type Maximum margin classifier Probabilistic classifier
Output Class label only Class probability
Works on linear data ✅ Yes ✅ Yes
Works on non-linear data ✅ With kernel ❌ (needs transformation)
Feature scaling required ✅ Yes ✅ Yes
Handles outliers ❌ Sensitive ✅ Moderately robust
✅ 10. Feature Scaling Required
SVM is distance-based → scale features using StandardScaler or
MinMaxScaler
Especially needed for RBF kernel
✅ 11. Parameter Tuning
Parameter Role
C Regularization strength
γ Kernel coefficient (RBF)
kernel Type of kernel (linear, rbf)
Use GridSearchCV or RandomizedSearchCV for tuning.
✅ 12. Scikit-learn SVM Example
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
model = SVC(kernel='rbf', C=1.0, gamma='scale')
model.fit(X_scaled, y)
predictions = model.predict(X_test_scaled)
✅ 13. SVM Works Best When
Number of features > number of samples (text/NLP, genomics)
Clear margin between classes
Need high accuracy in binary classification
✅ 14. SVM Struggles When
Dataset is large (slow training)
Data is noisy or overlapping
Output probabilities are needed
Many irrelevant features (feature selection needed)
✅ 15. SVM Evaluation Metrics
Classification: Accuracy, Precision, Recall, F1, ROC-AUC
Regression (SVR): MSE, RMSE, MAE, R²
✅ 16. Common MCQ Points
Support vectors are data points that influence hyperplane
C controls regularization
Kernel trick avoids computing high-dimensional features explicitly
RBF kernel is most widely used
γ is used only in RBF/Poly kernels, not linear
Needs scaling (unlike Decision Trees)
✅ 17. SVM vs Decision Tree
Feature SVM Decision Tree
Non-linear data ✅ With kernel ✅ Yes
Feature scaling ✅ Needed ❌ Not needed
Interpretability ❌ Low ✅ High
Overfitting risk ✅ Medium ✅ High (if unpruned)
Handles outliers ❌ Poor ✅ Good
✅ 18. Applications of SVM
Handwritten digit recognition (MNIST)
Text classification (Spam vs Ham)
Bioinformatics (gene classification)
Image classification
Fraud detection
📝 Summary: When to Use SVM
Scenario Use SVM?
Binary classification ✅ Yes
Non-linear separation ✅ Yes (with kernel)
Text classification (sparse) ✅ Yes
Large dataset ❌ No (training slow)
Need probabilities ❌ Use Logistic
Interpretability needed ❌ Poor
Would you like me to:
📘 Export these notes as a PDF?
🧠 Create MCQs or a quiz on SVM?
📊 Prepare visual diagrams of kernel trick and margin?
Let me know how you'd like to revise next!