0% found this document useful (0 votes)

196 views12 pages

Hessian Matrix in Neural Networks

The document discusses the Hessian matrix and its role in machine learning. It defines the gradient and Hessian, and describes how backpropagation can be used to compute the Hessian. It then outlines several methods for evaluating the Hessian matrix, including diagonal approximation, outer product approximation, and exact evaluation using backpropagation.

Uploaded by

Suvij

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

196 views12 pages

Hessian Matrix in Neural Networks

Uploaded by

Suvij

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Machine Learning Srihari

The Hessian Matrix

Sargur Srihari

1
Machine Learning Srihari

Definitions of Gradient and Hessian

•  First derivative of a scalar function E(w) with respect to a vector
w=[w1,w2]T is a vector called the Gradient of E(w)
⎡ ∂E ⎤
⎢ ⎥
d ⎢ ∂w1 ⎥
∇E(w) = E(w) = ⎢ If there are M elements in the vector
dw
⎢ ∂E ⎥⎥ then Gradient is a M x 1 vector
⎢ ∂w2 ⎥
⎣ ⎦
•  Second derivative of E(w) is a matrix called the Hessian of
E(w)
⎡ ∂2 E ∂2 E ⎤
⎢ ⎥
d 2
⎢ ∂w1
2
∂w1 ∂w2 ⎥
H = ∇∇E(w) = 2
E(w) = ⎢ ⎥
dw ⎢ ∂E
2
∂2 E ⎥ Hessian is a matrix with
⎢ ∂w2 ∂w1 ∂w2 2 ⎥ M2 elements
⎣ ⎦
•  Jacobian is a matrix consisting of first derivatives wrt a vector
Machine Learning Srihari

Computing the Hessian using Backpropagation

•  We have shown how backpropagation can be used to obtain
first derivatives of error function wrt weights in network
•  Backpropagation can also be used to derive second derivatives
∂2 E
∂w jiwlk
•  If all weights and bias parameters are elements wi of single
vector w then the second derivatives form the elements Hij of
Hessian matrix H where i,j ε {1,..W} and W is the total no of
weights and biases
Machine Learning Srihari

Role of Hessian in Neural Computing

1.  Several nonlinear optimization algorithms for neural
networks
•  are based on second order properties of error surface
2.  Basis for fast procedure for retraining with small
change of training data
3.  Identifying least significant weights
•  For network pruning requires inverse of Hessian
4.  Bayesian neural network
•  Central role in Laplace approximation
•  Hessian inverse is used to determine the predictive distribution for a
trained network
4
•  Hessian eigenvalues determine the values of hyperparameters
•  Hessian determinant is used to evaluate the model evidence
Machine Learning Srihari

Evaluating the Hessian Matrix

•  Full Hessian matrix can be difficult to compute in practice
•  quasi-Newton algorithms have been developed that use approximations
to the Hessian
•  Various approximation techniques have been used to evaluate
the Hessian for a neural network
•  calculated exactly using an extension of backpropagation
•  Important consideration is efficiency
•  With W parameters (weights and biases) matrix has dimension W x W
•  Efficient methods have O(W2)

5
Machine Learning Srihari

Methods for evaluating the Hessian Matrix

•  Diagonal Approximation
•  Outer Product Approximation
•  Inverse Hessian
•  Finite Differences
•  Exact Evaluation using Backpropagation
•  Fast multiplication by the Hessian

6
Machine Learning Srihari

Diagonal Approximation

•  In many case inverse of Hessian is needed

•  If Hessian is approximated by a diagonal matrix (i.e., off-
diagonal elements are zero), its inverse is trivially computed
•  Complexity is O(W) rather than O(W2) for full Hessian

7
Machine Learning Srihari

Outer product approximation

•  Neural networks commonly use sum-of-squared
errors function
1 N
E = ∑ (yn − tn )2
2 n=1
•  Can write Hessian matrix in the form
n
H ≈ ∑ bn bn
T

n=1

•  Where b = ∇y = ∇a
n n n
•  Elements can be found in O(W2) steps
8
Machine Learning Srihari

Inverse Hessian

•  Use outer product approximation to obtain

computationally efficient procedure for approximating
inverse of Hessian

9
Machine Learning Srihari

Finite Differences

•  Using backprop, complexity is reduced from O(W3)

to O(W2)

10
Machine Learning Srihari

Exact Evaluation of the Hessian

•  Using an extension of backprop

•  Complexity is O(W2)

11
Machine Learning Srihari

Fast Multiplication by the Hessian

•  Application of the Hessian involve multiplication by

the Hessian
•  The vector vTH has only W elements
•  Instead of computing H as an intermediate step, find
efficient method to compute product

Chap5.4 Hessian
No ratings yet
Chap5.4 Hessian
15 pages
NC Hessian
No ratings yet
NC Hessian
13 pages
8.6 SecondOrder
No ratings yet
8.6 SecondOrder
14 pages
NN 1
No ratings yet
NN 1
21 pages
Backpropagation for Neural Nets
No ratings yet
Backpropagation for Neural Nets
30 pages
4.2 Gradient-Based Optimization
No ratings yet
4.2 Gradient-Based Optimization
35 pages
Jacobi An
No ratings yet
Jacobi An
5 pages
Iterative Reweighted Least Squares: Sargur N. Srihari
No ratings yet
Iterative Reweighted Least Squares: Sargur N. Srihari
22 pages
Fast Curvature Matrix-Vector Products For Second-Order Gradient Descent
No ratings yet
Fast Curvature Matrix-Vector Products For Second-Order Gradient Descent
16 pages
Regularization in Neural Networks: Sargur Srihari Srihari@buffalo - Edu
No ratings yet
Regularization in Neural Networks: Sargur Srihari Srihari@buffalo - Edu
31 pages
Linear Algebra
No ratings yet
Linear Algebra
50 pages
Linear Algebra For Machine Learning: Sargur N. Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Linear Algebra For Machine Learning: Sargur N. Srihari Srihari@cedar - Buffalo.edu
62 pages
Chap5 3-BackProp
No ratings yet
Chap5 3-BackProp
41 pages
Optimization Algorithm 0401
No ratings yet
Optimization Algorithm 0401
26 pages
Gjto 2018
No ratings yet
Gjto 2018
8 pages
Linear and Nonlinear Programming
No ratings yet
Linear and Nonlinear Programming
7 pages
Deep Neural Network (DNN)
100% (1)
Deep Neural Network (DNN)
80 pages
20791-Article Text-24804-1-2-20220628
No ratings yet
20791-Article Text-24804-1-2-20220628
8 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
18 pages
5.3 MLBasics Hyperparam
No ratings yet
5.3 MLBasics Hyperparam
13 pages
Part (A) - Differences Between Scalars, Vectors, Ma
No ratings yet
Part (A) - Differences Between Scalars, Vectors, Ma
11 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
Ch13 4-LinearDynamicalSystems
No ratings yet
Ch13 4-LinearDynamicalSystems
20 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
4 4-Laplace
No ratings yet
4 4-Laplace
25 pages
Ann Cae-3
No ratings yet
Ann Cae-3
22 pages
Vector/Matrix Calculus Guide
No ratings yet
Vector/Matrix Calculus Guide
10 pages
Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Approximate Inference: Sargur Srihari Srihari@cedar - Buffalo.edu
18 pages
Enigma Submission
No ratings yet
Enigma Submission
3 pages
Back Propogation
No ratings yet
Back Propogation
43 pages
WoodFisher - Efficient Second-Order Approximation For Neural Network Compression
No ratings yet
WoodFisher - Efficient Second-Order Approximation For Neural Network Compression
44 pages
Deep Learning's Evolution and Impact
No ratings yet
Deep Learning's Evolution and Impact
6 pages
Part 2 Module 2 DL BP
No ratings yet
Part 2 Module 2 DL BP
66 pages
8.2 NNOptimization
No ratings yet
8.2 NNOptimization
17 pages
Lecture 7 Newton
No ratings yet
Lecture 7 Newton
44 pages
6.86x Machine Learning With Python: Linear Classifiers
No ratings yet
6.86x Machine Learning With Python: Linear Classifiers
7 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
Taylorapproximations For Descent Gradient
No ratings yet
Taylorapproximations For Descent Gradient
22 pages
Deep Learning Optimization Course
No ratings yet
Deep Learning Optimization Course
59 pages
Slides Concepts 1 Differentiability
No ratings yet
Slides Concepts 1 Differentiability
14 pages
5.4 MLBasics Estimators
No ratings yet
5.4 MLBasics Estimators
23 pages
SVMs: Techniques & Applications
No ratings yet
SVMs: Techniques & Applications
42 pages
Deep Neural Networks - 2
No ratings yet
Deep Neural Networks - 2
55 pages
Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
No ratings yet
Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
13 pages
Kernel Methods for ML Experts
No ratings yet
Kernel Methods for ML Experts
29 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
25 pages
Math for ML: Vectors & Probability
No ratings yet
Math for ML: Vectors & Probability
1 page
CS 182 Berkeley 2021 Discussion 2
No ratings yet
CS 182 Berkeley 2021 Discussion 2
9 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Learning 3
No ratings yet
Learning 3
98 pages
Cornell - Policy Research
No ratings yet
Cornell - Policy Research
68 pages
AIML ML Session 4 - Student Common Reference (With More Additional Reading Materials) Part 2
No ratings yet
AIML ML Session 4 - Student Common Reference (With More Additional Reading Materials) Part 2
45 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
Lab 1
No ratings yet
Lab 1
2 pages
Lecture 2
No ratings yet
Lecture 2
57 pages
11-Nonlinear Models (Neural Networks)
No ratings yet
11-Nonlinear Models (Neural Networks)
6 pages
Day 1
No ratings yet
Day 1
41 pages
ML Algorithm For PM
No ratings yet
ML Algorithm For PM
8 pages
Solution To Exercise 6.2
No ratings yet
Solution To Exercise 6.2
2 pages
Cse UNIT 5
No ratings yet
Cse UNIT 5
28 pages
Linear Algebra Lecture Notes - Shanmugavelan
100% (10)
Linear Algebra Lecture Notes - Shanmugavelan
408 pages
Csir-Net Mathematics Dec 2015
No ratings yet
Csir-Net Mathematics Dec 2015
20 pages
Notes For Projection of Points and Lines
100% (1)
Notes For Projection of Points and Lines
14 pages
Ques
No ratings yet
Ques
2 pages
Calculus: Surface Area of Revolution
No ratings yet
Calculus: Surface Area of Revolution
18 pages
Sheet-5-Optimility Conditions Solution
No ratings yet
Sheet-5-Optimility Conditions Solution
16 pages
Frequently Asked Questions On Wavelets: Q1. What Is The Wavelet Transform?
No ratings yet
Frequently Asked Questions On Wavelets: Q1. What Is The Wavelet Transform?
10 pages
Vectors
No ratings yet
Vectors
13 pages
Control Systems Engineering: Stability
No ratings yet
Control Systems Engineering: Stability
21 pages
Algebra & Equations Notes Gr 11
No ratings yet
Algebra & Equations Notes Gr 11
26 pages
Unit-9 Representation & Description
No ratings yet
Unit-9 Representation & Description
50 pages
Advanced Math
No ratings yet
Advanced Math
5 pages
Matrices
No ratings yet
Matrices
24 pages
Click Here To Get Your Free Novapdf Lite Registration Key
No ratings yet
Click Here To Get Your Free Novapdf Lite Registration Key
4 pages
Matrix Determinant Essentials
No ratings yet
Matrix Determinant Essentials
13 pages
M001 Midterm
No ratings yet
M001 Midterm
9 pages
Chapter 4 - Function of Random Variables: EE385 Class Notes 7/6/2015 John Stensby
No ratings yet
Chapter 4 - Function of Random Variables: EE385 Class Notes 7/6/2015 John Stensby
43 pages
Mathematical Programming Lecture Notes
No ratings yet
Mathematical Programming Lecture Notes
35 pages
Ibps RRB Book
100% (1)
Ibps RRB Book
554 pages
3 Marks PDF
No ratings yet
3 Marks PDF
62 pages
CH 3 Matrices Question Bank PDF
75% (4)
CH 3 Matrices Question Bank PDF
11 pages
Samuels (2012) The Effectiveness of Local Linearity As A Cognitive Root For The Derivative in A Redesigned First-Semester Calculus Course
No ratings yet
Samuels (2012) The Effectiveness of Local Linearity As A Cognitive Root For The Derivative in A Redesigned First-Semester Calculus Course
7 pages
Chapter 1
No ratings yet
Chapter 1
8 pages
Exact Trig Values & Unit Circle Guide
100% (1)
Exact Trig Values & Unit Circle Guide
14 pages
Lmy 2024 Coam
No ratings yet
Lmy 2024 Coam
26 pages
ME469 Finite Element Analysis
No ratings yet
ME469 Finite Element Analysis
3 pages
Linear Programming Linear Programming
No ratings yet
Linear Programming Linear Programming
67 pages

Hessian Matrix in Neural Networks

Uploaded by

Hessian Matrix in Neural Networks

Uploaded by

Machine Learning Srihari

The Hessian Matrix

Definitions of Gradient and Hessian

Computing the Hessian using Backpropagation

Role of Hessian in Neural Computing

Evaluating the Hessian Matrix

Methods for evaluating the Hessian Matrix

• In many case inverse of Hessian is needed

Outer product approximation

• Use outer product approximation to obtain

• Using backprop, complexity is reduced from O(W3)

Exact Evaluation of the Hessian

• Using an extension of backprop

Fast Multiplication by the Hessian

• Application of the Hessian involve multiplication by

You might also like

•  In many case inverse of Hessian is needed

•  Use outer product approximation to obtain

•  Using backprop, complexity is reduced from O(W3)

•  Using an extension of backprop

•  Application of the Hessian involve multiplication by