0% found this document useful (0 votes)

15 views83 pages

Updated Module2 - OTML Updated

The document outlines a course on Optimization Techniques in Machine Learning, focusing on model fitting, error measurement, and optimization methods such as gradient descent and Lagrange multipliers. It covers linear regression as an optimization problem, dimensionality reduction techniques, and addresses issues like overfitting and regularization. The course aims to equip students with the skills to analyze and operationalize machine learning models effectively.

Uploaded by

Srinivas Redyy Sarvigari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views83 pages

Updated Module2 - OTML Updated

Uploaded by

Srinivas Redyy Sarvigari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 83

A8751 – Optimization Techniques in

Machine Learning
Course Overview:
The students will be able to understand and analyze how to deal
with changing data. They will also be able to identify and interpret
potential unintended effects in your project. They will understand
and define procedures to operationalize and maintain your
applied machine learning model.

Edited By Mr S Srinivas Reddy Asst Professor

Module 2:
Linear Regression as an Optimization Problem

Based on Mathematics for Machine Learning by

Deisenroth et al.

Chapters Referenced:
Chapter 3 (Linear Models) & Chapter 7 (Probability &
Bayesian Models)
Syllabus
Module 1: Model Fitting and Error Measurement
Optimization Using Gradient Descent, Constrained Optimization and Lagrange Multipliers, Convex Optimization,
Data, Models, and Learning, Empirical Risk Minimization, Parameter Estimation, Probabilistic Modelling and
Inference Directed Graphical Models.

Module 2: Linear Regression as an Optimization Problem

Problem Formulation, Parameter Estimation, Bayesian Linear Regression, Maximum Likelihood as Orthogonal
Projection

Module 3: Dimensionality Reduction and Optimization

Problem Setting, Maximum Variance Perspective, Projection Perspective, Eigenvector Computation and Low-Rank
Approximations, PCA in High Dimensions, Key Steps of PCA in Practice, Latent Variable Perspective
Course Outcomes
A8751.1. Understand the fundamentals of model fitting, empirical risk minimization, and
optimization techniques including gradient descent and Lagrange multipliers.

A8751.2. Formulate linear regression as an optimization problem and apply parameter

estimation techniques including Bayesian and Maximum Likelihood methods.

A8751.3. Apply dimensionality reduction techniques such as PCA using optimization-based

approaches and understand the mathematical foundations of eigenvector computation.

A8751.4. Analyze unsupervised learning problems using Gaussian Mixture Models and the
Expectation Maximization algorithm for parameter estimation.

A8751.5. Evaluate and implement large-margin classifiers including Support Vector

Machines using primal and dual optimization frameworks and kernel methods.
TOPICS TO BE DISCUSSED ARE AS
FOLLOWS :
 Problem Formulation
 Parameter Estimation
 Bayesian Linear Regression
 Maximum Likelihood as Orthogonal
Projection
In upcoming lecture, CSD Sec A /B/C students will
learn :

 Understand how prediction problems can be

modeled using linear functions

 Formulate the linear regression model

 Define an optimization objective for learning

model parameters
Linear Model Representation
We assume that our model makes predictions using a linear
equation:
Main Equation (Vector Form):
Parameter Estimation
📚 Mathematics for Machine Learning by
Deisenroth et al.
🔖 Based on Chapter 3, Section 3.2 – Least
Squares Estimation
Simple Numerical Example
Let’s take 2 data points for ease: | x |y |
| --- | --- |
Assume model to fit a line : |1 |2 |
y^=θ +θ x |2 |3 |
0 1
What is Maximum Likelihood Estimation (MLE)?

Maximum Likelihood Estimation (MLE) is a statistical

method for estimating parameters of a model by
maximizing the probability (likelihood) of observing
the given data under that model.
In the context of linear regression, the goal is to
estimate the parameter vector θ that makes the
observed data most probable.
Overfitting in Linear Regression
When you use too many features (especially ones that are
not informative), linear regression starts to memorize
noise in the data, not just patterns.

Why Overfitting Happens:

When features ≥ samples,
the model has too much flexibility.
It can achieve zero training error, but will perform poorly
on new data (poor generalization).
Remedies for Overfitting:

1.Feature Selection:
Remove non-informative or noisy features,
Objective: Eliminate irrelevant or redundant input
variables that do not contribute meaningfully to the
output.
2.Regularization:
Penalize large coefficients to prevent the model from
becoming overly sensitive to specific features.
Add penalty terms like L2 (Ridge) or L1 (Lasso)
3.Bayesian Linear Regression:
Introduce prior distributions over the parameters to
control model complexity.
4.Model Selection:
Use cross-validation to select the right model complexity
2. Regularization in Linear Regression
( to be discussed more for otml)
Regularization is used to prevent overfitting in linear
regression by penalizing large model coefficients.
This helps keep the
model simpler and more generalizable.
Why Regularization?
When the number of features is large or the features
are highly correlated*, the parameter estimates θ can
become unstable and the model may overfit the
training data.
Regularization adds a penalty term to the objective
function to control model complexity.
(* in the context of linear regression refers to the
situation where two or more input variables (features)
carry similar or redundant information.)
Key Assumptions of Linear Regression"
Assumption(properties) Simple Meaning Why it Matters
The relationship between predictors XXX and output yyy must be linear.
Straight-line
Linearity In Bayesian regression, the model is still linear in parameters, but
relationship
uncertainty is added.
Data points don’t Each observation (data point) must be independent of others.
Independence
affect each other This avoids hidden biases or autocorrelations.
Equal Variance Error is same The variance of errors should remain constant across all input values.
(Homoscedasticity) across all inputs This ensures fair treatment across all levels of input.
Prediction mistakes Residuals (errors) are normally distributed.
Normal Errors
follow bell curve Important for inference: p-values, confidence intervals, etc.
Predictors should not be too correlated with each other.
Inputs are not too
No Multicollinearity In Bayesian regression, multicollinearity still increases posterior
similar
uncertainty
No missing or extra
Correct Model All relevant variables are included and irrelevant ones excluded.
variables
Missing key variables leads to bias; extra ones add noise.
Input variables should not be correlated with the error term.
Inputs and errors
𝜃
No Endogeneity Helps in producing trustworthy estimates of
are separate
If Variance Is Equal (Homoscedasticity)
House x: Size (sq ft) y: Price (in lakhs) y^: Predicted Price Error e=y−y^
A 1000 50 52 -2
B 1500 70 72 -2
C 2000 90 92 -2

If Variance Is Not Equal (Heteroscedasticity)

House x y y^ e
A 1000 50 52 -2
B 1500 70 74 -4
C 2000 90 95 -5

If variance is not constant:

•Standard errors of coefficients become wrong
•Confidence intervals and hypothesis tests become
unreliable
What Is
Multicollinearity?
•It means: two or more
predictors (input
variables) are highly
correlated.
•That is, they carry similar
information.
•This makes it difficult to
tell which variable is
responsible for the effect
on the target y.
Packing for a Trip

Scenario:
You're going on a 5-day trip and need to pack smartly because your
suitcase has a weight limit (just like a regularization constraint).

Lasso (L1): Pick a Few Essential Items

You say: "I'll take only the most important things — 2 pairs of jeans,
2 shirts, and skip formal shoes, books, gym gear..."
You pack fewer items, but each one is useful.
Your suitcase has zero of many items.

Result: Simpler, lighter suitcase. Fewer items, more space — like

feature selection.
Ridge (L2): Take Everything, But Lighter
You say: "I want a little of everything, but will reduce
the size —
travel-sized shampoo, thin t-shirts, foldable shoes...“

You don't skip anything, but minimize everything.

Result: Everything fits, but in a compressed

form — like shrinking coefficients.
Bayesian Linear Regression – Theory and Interpretation
In Bayesian Linear Regression, we make two main assumptions:
(a) Likelihood (How data is generated)
We assume that each output ynis generated as:
(b) Prior on Parameters
We don't fix θ, instead we assume a
prior distribution over it:
Goal: Compute Posterior Mean μand Posterior Covariance
Σ
Maximum Likelihood as Orthogonal Projection
1. Background:
In Linear Regression, we aim to fit a line (or hyperplane) that best
represents the data.
Let:
X = matrix of input features (with each row as a data point)
y = vector of observed outputs (target values)
θ = parameter vector (weights of the model)
Maximum Likelihood (ML) Estimation as
Orthogonal Projection in the subject Optimization
Techniques in Machine Learning (OTML) arises
from a need to
connect statistical learning with geometric
intuition and optimization theory.
Why Do We Study Maximum Likelihood as Orthogonal Projection in
OTML?
Limitation of Purely Statistical Interpretation
•Maximum Likelihood Estimation (MLE) is traditionally taught as a
statistical method to estimate parameters that maximize the likelihood
of observing data.
However, this viewpoint:
• Lacks geometric intuition.
• Makes it hard for students to visualize the optimization process.

To overcome this, ML estimation in linear regression is interpreted as a

geometric projection—specifically, orthogonal projection of observed
outputs onto the column space of the input matrix.
Limitations that Force This Study
Limitation in Standard ML How Orthogonal Projection Helps
Estimation
Hard to visualize likelihood Provides geometric interpretation
maximization
Abstract algebra in cost Links to physical projection of data
minimization

Failure of least squares in high- Shows where projection breaks and needs
dimension regularization

Disconnection between vector Bridges linear algebra, calculus, and ML optimization

calculus and learning

Confusion about residuals and Projection shows residuals are orthogonal, satisfying
optimality optimality
•Step 1: Matrix Representation
•Step 2: Normal Equation Setup
•Step 3: Compute Parameter
•Step 4: Predict Output Vector
•Step 5: Calculate Residual Vector
•Step 6: Orthogonality Check

Unit-2 ML
No ratings yet
Unit-2 ML
199 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
CSE545 sp23 (7) Regressions To Transformers 3-29
No ratings yet
CSE545 sp23 (7) Regressions To Transformers 3-29
188 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
No ratings yet
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
12 pages
MFML Unit-4 Notes - 14-06-2024
No ratings yet
MFML Unit-4 Notes - 14-06-2024
36 pages
Data Science Course Syllabus
No ratings yet
Data Science Course Syllabus
104 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
56 pages
Linear Regression and Classification
No ratings yet
Linear Regression and Classification
8 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Linear Regression Lecture Notes
No ratings yet
Linear Regression Lecture Notes
34 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
UNIT II Regration
No ratings yet
UNIT II Regration
62 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Chapter Regression
No ratings yet
Chapter Regression
10 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Regression
No ratings yet
Regression
6 pages
Module 2-Supervised Learning
No ratings yet
Module 2-Supervised Learning
74 pages
Lec 6
No ratings yet
Lec 6
19 pages
Andrew Rosenberg - Lecture 5: Linear Regression With Regularization CSC 84020 - Machine Learning
No ratings yet
Andrew Rosenberg - Lecture 5: Linear Regression With Regularization CSC 84020 - Machine Learning
38 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
46 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
148 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Linear Regression A Foundational ML Algorithm
No ratings yet
Linear Regression A Foundational ML Algorithm
10 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Unit 2&3 - 250421 - 215911
No ratings yet
Unit 2&3 - 250421 - 215911
60 pages
Unit 2
No ratings yet
Unit 2
19 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
ML Unit-2
No ratings yet
ML Unit-2
34 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
MBA Report on Agricultural Loans
No ratings yet
MBA Report on Agricultural Loans
65 pages
Conceptual Framework Thesis PDF
100% (2)
Conceptual Framework Thesis PDF
4 pages
Towards Better Road Safety Management Le
No ratings yet
Towards Better Road Safety Management Le
18 pages
Learning Approach Evaluation Answers
No ratings yet
Learning Approach Evaluation Answers
3 pages
UNIT 01 Introduction To Management Accounting
No ratings yet
UNIT 01 Introduction To Management Accounting
10 pages
Current Trends of Unethical Behavior Within Organizations
No ratings yet
Current Trends of Unethical Behavior Within Organizations
8 pages
JNTU Old Question Papers 2007
94% (18)
JNTU Old Question Papers 2007
8 pages
Reality Therapy
100% (3)
Reality Therapy
69 pages
Public Policy Essay Guide
100% (2)
Public Policy Essay Guide
6 pages
Break Out Session 3 - 28th - COCSSO
No ratings yet
Break Out Session 3 - 28th - COCSSO
57 pages
Hamlet Unit Plan
100% (2)
Hamlet Unit Plan
3 pages
DSC Unit 3 Cse
No ratings yet
DSC Unit 3 Cse
9 pages
Engineering Stats & Analysis Guide
No ratings yet
Engineering Stats & Analysis Guide
5 pages
The Use of Question and Answer Method by Using Pictures in Memorizing Vocabulary
No ratings yet
The Use of Question and Answer Method by Using Pictures in Memorizing Vocabulary
3 pages
Corporate Foresight and Its Effect On Innovation, Strategic Decision Making and Organizational Performance - Banking
No ratings yet
Corporate Foresight and Its Effect On Innovation, Strategic Decision Making and Organizational Performance - Banking
18 pages
Deep Learning for Mental Illness Prediction
No ratings yet
Deep Learning for Mental Illness Prediction
58 pages
Usfd New
No ratings yet
Usfd New
145 pages
What Is The Definition of Successful Aging?: Staying Healthy As You Age
No ratings yet
What Is The Definition of Successful Aging?: Staying Healthy As You Age
3 pages
How To Do Philosophy
No ratings yet
How To Do Philosophy
19 pages
Exercises About Numerical Series
No ratings yet
Exercises About Numerical Series
2 pages
Ocean Strategy & VRIO
No ratings yet
Ocean Strategy & VRIO
5 pages
CE Struc Seismic-Fragility Final
100% (1)
CE Struc Seismic-Fragility Final
7 pages
Pakistan's Public Sector Woes
No ratings yet
Pakistan's Public Sector Woes
11 pages
A Study To Assess The Level of Emotional Intelligence Among Nursing Students at Selected College, Thrissur
No ratings yet
A Study To Assess The Level of Emotional Intelligence Among Nursing Students at Selected College, Thrissur
5 pages
Thesis List of Topics
100% (3)
Thesis List of Topics
7 pages
D-Important Personalities in Anthropologydiss
No ratings yet
D-Important Personalities in Anthropologydiss
16 pages
Game-Based Writing for Grade 8
No ratings yet
Game-Based Writing for Grade 8
5 pages
MRM Presentation
No ratings yet
MRM Presentation
13 pages
Literature Review PPT Presentation
100% (2)
Literature Review PPT Presentation
8 pages
NDT Course Proposal for PAF School
No ratings yet
NDT Course Proposal for PAF School
21 pages

Updated Module2 - OTML Updated

Uploaded by

Updated Module2 - OTML Updated

Uploaded by

A8751 – Optimization Techniques in

Edited By Mr S Srinivas Reddy Asst Professor

Based on Mathematics for Machine Learning by

Module 2: Linear Regression as an Optimization Problem

Module 3: Dimensionality Reduction and Optimization

A8751.2. Formulate linear regression as an optimization problem and apply parameter

A8751.3. Apply dimensionality reduction techniques such as PCA using optimization-based

A8751.5. Evaluate and implement large-margin classifiers including Support Vector

 Understand how prediction problems can be

 Formulate the linear regression model

 Define an optimization objective for learning

Maximum Likelihood Estimation (MLE) is a statistical

Why Overfitting Happens:

If Variance Is Not Equal (Heteroscedasticity)

If variance is not constant:

Lasso (L1): Pick a Few Essential Items

Result: Simpler, lighter suitcase. Fewer items, more space — like

You don't skip anything, but minimize everything.

Result: Everything fits, but in a compressed

To overcome this, ML estimation in linear regression is interpreted as a

Disconnection between vector Bridges linear algebra, calculus, and ML optimization

You might also like