100% found this document useful (1 vote)

114 views61 pages

Linear Regression Models Guide

1. This document discusses linear models for regression, including linear basis function models using polynomials and Gaussians as basis functions. 2. It also covers maximum likelihood and least squares estimation for linear regression, as well as regularized least squares and its use of Lasso regularization. 3. Bayesian linear regression is introduced, including derivation of the posterior distribution and examples of how the posterior changes as more data is observed.

Uploaded by

longfei zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

114 views61 pages

Linear Regression Models Guide

Uploaded by

longfei zhang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 61

PATTERN RECOGNITION

AND MACHINE LEARNING

CHAPTER 3: LINEAR MODELS FOR REGRESSION
Linear Basis Function Models (1)

Example: Polynomial Curve Fitting

确定映射关系
Sum-of-Squares Error Function

确定目标函数

确定 w 使目标函数最小化
0th Order Polynomial
1st Order Polynomial
3rd Order Polynomial
9th Order Polynomial

避免过拟合： 1 、增加训练集 2 、增加惩罚项

Linear Basis Function Models (2)
Generally

where Áj(x) are known as basis functions.

Typically, Á0(x) = 1, so that w0 acts as a bias.
In the simplest case, we use linear basis
functions : Ád(x) = xd.
Linear Basis Function Models (3)
Polynomial basis functions:

These are global; a small

change in x affect all basis
functions.
Linear Basis Function Models (4)
Gaussian basis functions:

These are local; a small change

in x only affect nearby basis
functions. ¹j and s control
location and scale (width).
Linear Basis Function Models (5)
Sigmoidal basis functions:

where

Also these are local; a small

change in x only affect nearby
basis functions. ¹j and s control
location and scale (slope).
Maximum Likelihood and Least Squares (1)

Assume observations from a deterministic function

with added Gaussian noise:
方差
where

which is the same as saying,

Given observed inputs, , and targets,

, we obtain the likelihood function

最大化似然函数
Maximum Likelihood and Least Squares (2)
在高斯噪声下，才能写成该形
Taking the logarithm, we get 式，此时最小均方误差和最大
似然估计等价

where

is the sum-of-squares error.

Maximum Likelihood and Least Squares (3)

Computing the gradient and setting it to zero yields

Solving for w, we get The Moore-Penrose

pseudo-inverse, .

where

多个（ x,y ）对应

Geometry of Least Squares
Consider Φ 相当于空间中的基，称其为基函数

N-dimensional
M-dimensional

S is spanned by .
wML minimizes the distance
between t and its orthogonal
projection on S, i.e. y.

Y 的每一个值为 M 个向量的线型组合
Sequential Learning
Data items considered one at a time (a.k.a.
online learning); use stochastic (sequential)
gradient descent:

This is known as the least-mean-squares (LMS)

algorithm. Issue: how to choose ´?
1st Order Polynomial
3rd Order Polynomial
9th Order Polynomial
Over-fitting

Root-Mean-Square (RMS) Error:

Polynomial Coefficients
Regularized Least Squares (1)
Consider the error function:
添加惩罚项

Data term + Regularization term

With the sum-of-squares error function and a

quadratic regularizer, we get

¸ is called the
which is minimized by regularization
coefficient.
Regularized Least Squares (1)

Is it true that or
Regularized Least Squares (2)
With a more general regularizer, we have

Lasso Quadratic
Regularized Least Squares (3)
Lasso tends to generate sparser solutions than a
quadratic
regularizer.
Understanding Lasso Regularizer

• Lasso regularizer plays the role of

thresholding
Multiple Outputs (1)
Analogously to the single output case we have:

Given observed inputs, , and targets,

, we obtain the log likelihood function
Multiple Outputs (2)
Maximizing with respect to W, we obtain

If we consider a single target variable, tk, we see that

where , which is identical with the

single output case.
The Bias-Variance Decomposition (1)
Recall the expected squared loss,

where

The second term of E[L] corresponds to the noise

inherent in the random variable t.
What about the first term?
The Bias-Variance Decomposition (2)
Suppose we were given multiple data sets, each of
size N. Any particular data set, D, will give a
particular function y(x;D). We then have
The Bias-Variance Decomposition (3)
Taking the expectation over D yields

偏差描述与真实值之间的差距
方差描述与观测值均值之间的差距
The Bias-Variance Decomposition (4)
Thus we can write

where
Bias-Variance Tradeoff
The Bias-Variance Decomposition (5)
Example: 25 data sets from the sinusoidal, varying
the degree of regularization, ¸.
The Bias-Variance Decomposition (6)
Example: 25 data sets from the sinusoidal, varying
the degree of regularization, ¸.
The Bias-Variance Decomposition (7)
Example: 25 data sets from the sinusoidal, varying
the degree of regularization, ¸.
The Bias-Variance Trade-off
From these plots, we note
that an over-regularized
model (large ¸) will have a
high bias, while an under-
regularized model (small ¸)
will have a high variance.
Bayesian Linear Regression (1)
Define a conjugate prior over w

Combining this with the likelihood function and using

results for marginal and conditional Gaussian
distributions, gives the posterior

where
Bayesian Linear Regression (2)
A common choice for the prior is

for which

Next we consider an example …

Bayesian Linear Regression (3)
0 data points observed
Prior Data Space
Bayesian Linear Regression (4)
1 data point observed
Likelihood Posterior Data Space
Bayesian Linear Regression (5)
2 data points observed
Likelihood Posterior Data Space
Bayesian Linear Regression (6)
20 data points observed
Likelihood Posterior Data Space
Predictive Distribution (1)
Predict t for new values of x by integrating
over w:

where
Predictive Distribution (1)
How to compute the predictive distribution ?
Predictive Distribution (2)
Example: Sinusoidal data, 9 Gaussian basis functions,
1 data point
Predictive Distribution (3)
Example: Sinusoidal data, 9 Gaussian basis functions,
2 data points
Predictive Distribution (4)
Example: Sinusoidal data, 9 Gaussian basis functions,
4 data points
Predictive Distribution (5)
Example: Sinusoidal data, 9 Gaussian basis functions,
25 data points
Salesman’s Problem
Given
• a consumer a described by a vector x
• a product b to sell with base cost c
• estimated price distribution of b in the mind
of a is Pr(t|x, w) = N(xTw, 2)

What is price should we offer to a ?

Salesman’s Problem
Bayesian Model Comparison (1)
How do we choose the ‘right’ model?
Bayesian Model Comparison (1)
How do we choose the ‘right’ model?

Assume we want to compare models Mi, i=1, …,L,

using data D; this requires computing

Posterior Prior Model evidence or

marginal likelihood
Bayesian Model Comparison (1)
How do we choose the ‘right’ model?

Posterior Prior Model evidence or

marginal likelihood
Bayesian Model Comparison (3)
For a model with parameters w, we get the
model evidence by marginalizing over w
Bayesian Model Comparison (4)
For a given model with a
single parameter, w, con-
sider the approximation

where the posterior is

assumed to be sharply
peaked.
Bayesian Model Comparison (4)
For a given model with a
single parameter, w, con-
sider the approximation

where the posterior is

assumed to be sharply
peaked.
Bayesian Model Comparison (5)
Taking logarithms, we obtain

Negative

With M parameters, all assumed to have the same

ratio , we get

Negative and linear in M.

Bayesian Model Comparison (5)

Data matching Model Complexity

Bayesian Model Comparison (6)
Matching data and model complexity
What You Should Know
• Least square and its maximum likelihood
interpretation
• Sequential learning for regression
• Regularization and its effect
• Bayesian regression
• Predictive distribution
• Bias variance tradeoff
• Bayesian model selection

Linear - Regression
100% (1)
Linear - Regression
39 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Intro to Machine Learning Basics
100% (1)
Intro to Machine Learning Basics
52 pages
Regression Analysis Essentials
100% (1)
Regression Analysis Essentials
2 pages
ECG Image Classification with ML
100% (1)
ECG Image Classification with ML
16 pages
Assignment Updated 101
100% (1)
Assignment Updated 101
24 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
14 pages
Vinee
100% (1)
Vinee
28 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
PR01
100% (1)
PR01
41 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
Patient Data Management System
100% (1)
Patient Data Management System
27 pages
Book
100% (1)
Book
480 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Credit Card Fraud Detection Using Machine Learning
100% (1)
Credit Card Fraud Detection Using Machine Learning
82 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
Weather Impact on Radio Links
100% (1)
Weather Impact on Radio Links
8 pages
Applied Data Science Camp - Info
100% (1)
Applied Data Science Camp - Info
12 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
SAT and GPA Regression Analysis
100% (1)
SAT and GPA Regression Analysis
1 page
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
Telecom Customer Churn Dataset Analysis
100% (1)
Telecom Customer Churn Dataset Analysis
5 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Data Analytics Time Table V2
100% (1)
Data Analytics Time Table V2
6 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
NYC Taxi Fare Data Cleaning
100% (1)
NYC Taxi Fare Data Cleaning
8 pages
ML Guide: Boston House Price Prediction
100% (1)
ML Guide: Boston House Price Prediction
15 pages
HW1
100% (1)
HW1
8 pages
Simple Linear Regression Guide
100% (1)
Simple Linear Regression Guide
23 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Boosting Algorithms in Machine Learning
100% (1)
Boosting Algorithms in Machine Learning
41 pages
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
100% (1)
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
151 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
Classification
100% (1)
Classification
37 pages
Import As
100% (1)
Import As
27 pages
Machine Learning Methods To Weather Forecasting To Predict Apparent Temperature A Review
100% (1)
Machine Learning Methods To Weather Forecasting To Predict Apparent Temperature A Review
6 pages
Weather Prediction Based On LSTM Model Implemented AWS Machine Learning Platform
100% (1)
Weather Prediction Based On LSTM Model Implemented AWS Machine Learning Platform
10 pages
3) Code For ID3 Algorithm Implementation
100% (1)
3) Code For ID3 Algorithm Implementation
8 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Study Plan - SBL 12 Week - PER
100% (1)
Study Plan - SBL 12 Week - PER
1 page
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
Quiz Feedback1 - Coursera
100% (1)
Quiz Feedback1 - Coursera
7 pages
SVM Guide for Data Science Enthusiasts
100% (1)
SVM Guide for Data Science Enthusiasts
28 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Hands-On Bayesian Neural Networks
No ratings yet
Hands-On Bayesian Neural Networks
24 pages
Artificial Intelligence in Internal Controls and Risk Management
No ratings yet
Artificial Intelligence in Internal Controls and Risk Management
17 pages
Aml CS 9 PRV
No ratings yet
Aml CS 9 PRV
47 pages
Zondst2D: User Manual
No ratings yet
Zondst2D: User Manual
20 pages
Enhancing Android Malware Detection Throught Ensemble Stakcking
No ratings yet
Enhancing Android Malware Detection Throught Ensemble Stakcking
11 pages
State - vs. Community-Led Land Tenure Regularization in Tanzania
No ratings yet
State - vs. Community-Led Land Tenure Regularization in Tanzania
113 pages
CUSTOMER SEGMENTATION ANALYSIS OF DATA AND MACHINE LEARNING APPROACH With Plugorism
No ratings yet
CUSTOMER SEGMENTATION ANALYSIS OF DATA AND MACHINE LEARNING APPROACH With Plugorism
6 pages
GPR Imaging for Engineers & Researchers
No ratings yet
GPR Imaging for Engineers & Researchers
23 pages
Matrix Analysis and Applications 1st Edition Xian-Da Zhang PDF Download
100% (6)
Matrix Analysis and Applications 1st Edition Xian-Da Zhang PDF Download
56 pages
Diffuse Optical Tomography Thesis
100% (3)
Diffuse Optical Tomography Thesis
6 pages
Numerical Linear Algebra and The Applications
No ratings yet
Numerical Linear Algebra and The Applications
128 pages
Machine Learning Unit - 2 Supervised Learning
No ratings yet
Machine Learning Unit - 2 Supervised Learning
7 pages
Lasso Regularization for Statisticians
No ratings yet
Lasso Regularization for Statisticians
14 pages
Final Research Paper
No ratings yet
Final Research Paper
16 pages
Calorie Burnt
No ratings yet
Calorie Burnt
45 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Learning From Noisy Labels With Deep Neural Networks Survey
No ratings yet
Learning From Noisy Labels With Deep Neural Networks Survey
19 pages
An Introduction To The Mathematical Theory of Inverse Problems 3rd Edition Andreas Kirsch
No ratings yet
An Introduction To The Mathematical Theory of Inverse Problems 3rd Edition Andreas Kirsch
55 pages
The Power of Data in QML
No ratings yet
The Power of Data in QML
34 pages
Zach2008 VMV Fast Global Labeling
No ratings yet
Zach2008 VMV Fast Global Labeling
12 pages
Data Science - Full-Time PDF
No ratings yet
Data Science - Full-Time PDF
34 pages
Artificial Neural Networks Unit 5: Radial Basis Function Networks Cover'S Theorem On The Separability of Patterns
No ratings yet
Artificial Neural Networks Unit 5: Radial Basis Function Networks Cover'S Theorem On The Separability of Patterns
14 pages
CST395 - ML Syllabus
No ratings yet
CST395 - ML Syllabus
13 pages
Practical Issues in NN Training
No ratings yet
Practical Issues in NN Training
7 pages
Unit-4 (NLP)
No ratings yet
Unit-4 (NLP)
47 pages
2009MScThesisLorenz PDF
No ratings yet
2009MScThesisLorenz PDF
127 pages
A Seismic Sensor Based Human Activity Recognition Framework Using Deep Learning
No ratings yet
A Seismic Sensor Based Human Activity Recognition Framework Using Deep Learning
8 pages
Advanced Optimization Techniques
No ratings yet
Advanced Optimization Techniques
70 pages
DONG Et Al 2022 A Neural Network Boosting Regression Model Based On XGBoost
No ratings yet
DONG Et Al 2022 A Neural Network Boosting Regression Model Based On XGBoost
11 pages
Statistical Machine Learning For Quantitative Finance
No ratings yet
Statistical Machine Learning For Quantitative Finance
25 pages

Linear Regression Models Guide

Uploaded by

Linear Regression Models Guide

Uploaded by

PATTERN RECOGNITION

AND MACHINE LEARNING

Example: Polynomial Curve Fitting

避免过拟合： 1 、增加训练集 2 、增加惩罚项

where Áj(x) are known as basis functions.

These are global; a small

These are local; a small change

Also these are local; a small

Assume observations from a deterministic function

which is the same as saying,

Given observed inputs, , and targets,

is the sum-of-squares error.

Computing the gradient and setting it to zero yields

Solving for w, we get The Moore-Penrose

多个（ x,y ）对应

This is known as the least-mean-squares (LMS)

Root-Mean-Square (RMS) Error:

Data term + Regularization term

With the sum-of-squares error function and a

• Lasso regularizer plays the role of

Given observed inputs, , and targets,

If we consider a single target variable, tk, we see that

where , which is identical with the

The second term of E[L] corresponds to the noise

Combining this with the likelihood function and using

Next we consider an example …

What is price should we offer to a ?

Assume we want to compare models Mi, i=1, …,L,

Posterior Prior Model evidence or

Posterior Prior Model evidence or

where the posterior is

where the posterior is

With M parameters, all assumed to have the same

Negative and linear in M.

Data matching Model Complexity

You might also like