0% found this document useful (0 votes)

56 views23 pages

CSC411: Optimization For Machine Learning: September 20-26, 2018

This document provides an overview of the CSC411 course on Optimization for Machine Learning held from September 20-26, 2018 at the University of Toronto. The course covers topics related to gradient descent algorithms like batch gradient descent and stochastic gradient descent. It discusses concepts such as learning rates, decaying learning rates, and how stochastic gradient descent can help optimization escape local minima and converge towards global minima for non-convex problems.

Uploaded by

john bianco

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views23 pages

CSC411: Optimization For Machine Learning: September 20-26, 2018

Uploaded by

john bianco

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

CSC411: Optimization for Machine Learning

University of Toronto

September 20–26, 2018

1
based on slides by Eleni Triantafillou, Ladislav Rampasek, Jake Snell, Kevin
Swersky, Shenlong Wang and other
!

!
!

!
θ∗ = θ (θ) θ
! θ∈R
! :R →R
(θ) − (θ)
! θ
! θ
!
θ

! ( , )
( | , θ)
! − log ( | , θ)
!
∂ (θ ∗ )
∂θ =
! θ
!

! ∂ ∂ ∂
∇θ = ( ∂θ , ∂θ , ..., ∂θ )
η
! θ
! = :
! δ ← −η∇θ −
! θ ←θ− +δ
η
! θ
! = :
! η (θ − η ∇θ − ) < (θ )
! δ ← −η ∇θ −
! θ ←θ− +δ
α∈[ , )

! θ
! δ
! = :
! δ ← −η∇θ − +αδ −
! θ ←θ− +δ

α
η
! θ
!

! δ ← −η∇θ −
! θ ←θ− +δ
!
!
| (θ + ) − (θ )| < ϵ
! ∥∇θ ∥ < ϵ
!
Learning Rate (Step Size)

In gradient descent, the learning rate ↵ is a hyperparameter we

need to tune. Here are some things that can go wrong:

↵ too small: ↵ too large: ↵ much too large:

slow progress oscillations instability
Good values are typically between 0.001 and 0.1. You should do a
grid search if you want good performance (i.e. try
0.1, 0.03, 0.01, . . .).

Intro ML (UofT) CSC311-Lec3 37 / 42

∇
!

∂ ((θ , . . . , θ + ϵ, . . . , θ )) − ((θ , . . . , θ − ϵ, . . . , θ ))
≈
∂θ ϵ

!
!

!
Stochastic Gradient Descent

Batch gradient descent moves directly downhill. SGD takes steps

in a noisy direction, but moves downhill on average.

batch gradient descent stochastic gradient descent

Intro ML (UofT) CSC311-Lec4 66 / 70

SGD Learning Rate

In stochastic training, the learning rate also influences the

fluctuations due to the stochasticity of the gradients.

Typical strategy:
I Use a large learning rate early in training so you can get close to
the optimum
I Gradually decay the learning rate to reduce the fluctuations

Intro ML (UofT) CSC311-Lec4 67 / 70

SGD Learning Rate

Warning: by reducing the learning rate, you reduce the

fluctuations, which can appear to make the loss drop suddenly.
But this can come at the expense of long-run performance.

Intro ML (UofT) CSC311-Lec4 68 / 70

SGD and Non-convex optimization

Stochastic Gradient descent

updates

3
<latexit sha1_base64="(null)">(null)</latexit>

4
Local minimum <latexit sha1_base64="(null)">(null)</latexit>

Global minimum

Stochastic methods have a chance of escaping from bad minima.

Gradient descent with small step-size converges to first minimum
it finds.
Intro ML (UofT) CSC311-Lec4 69 / 70

SI527 Lecture 3
No ratings yet
SI527 Lecture 3
9 pages
Cours 5
No ratings yet
Cours 5
23 pages
Aiml Solved Answers For QP
No ratings yet
Aiml Solved Answers For QP
39 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
771 A18 Lec9
No ratings yet
771 A18 Lec9
129 pages
Berkeley-Tutorial Optimization For Machine Learningpart2
No ratings yet
Berkeley-Tutorial Optimization For Machine Learningpart2
35 pages
Lecture Notes Sec 1-3
No ratings yet
Lecture Notes Sec 1-3
28 pages
Cours 6
No ratings yet
Cours 6
26 pages
Andrew NG Week 1-2
No ratings yet
Andrew NG Week 1-2
120 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
No ratings yet
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
16 pages
SI527 Lecture 5
No ratings yet
SI527 Lecture 5
7 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Logistic
No ratings yet
Logistic
14 pages
Maths Behind ML Algos
No ratings yet
Maths Behind ML Algos
18 pages
Math Behind ML Algos
No ratings yet
Math Behind ML Algos
18 pages
MLSS Complete PDF
No ratings yet
MLSS Complete PDF
106 pages
Backpropagation Optimization Techniques
No ratings yet
Backpropagation Optimization Techniques
46 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
2018 Ahmed PDF
No ratings yet
2018 Ahmed PDF
123 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Backpropagation & Optimization Guide
No ratings yet
Backpropagation & Optimization Guide
6 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Mathematical Introduction To Deep Learning: Methods, Implementations, and Theory
No ratings yet
Mathematical Introduction To Deep Learning: Methods, Implementations, and Theory
714 pages
Berkeley-Tutorial Optimization For Machine Learning-Part1
No ratings yet
Berkeley-Tutorial Optimization For Machine Learning-Part1
37 pages
MS Key-4
No ratings yet
MS Key-4
4 pages
14 Efficient Learning
No ratings yet
14 Efficient Learning
7 pages
LN - Optimization For ML
No ratings yet
LN - Optimization For ML
129 pages
CNN 02 Batch Normalization
No ratings yet
CNN 02 Batch Normalization
19 pages
Optimization-Module Iv
No ratings yet
Optimization-Module Iv
7 pages
NNDL
No ratings yet
NNDL
4 pages
Clahe HLS
No ratings yet
Clahe HLS
4 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Cmu ML 19 102
No ratings yet
Cmu ML 19 102
187 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
SI527 Lecture 2
No ratings yet
SI527 Lecture 2
8 pages
Curs4site PDF
No ratings yet
Curs4site PDF
44 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Lecture Slides - Linear Regression (2025)
No ratings yet
Lecture Slides - Linear Regression (2025)
45 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
DL Unit1
No ratings yet
DL Unit1
61 pages
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
No ratings yet
Layer Normalization: Jimmy@psi - Toronto.edu Rkiros@cs - Toronto.edu Hinton@cs - Toronto.edu
14 pages
02B DL2023 NN Backprop
No ratings yet
02B DL2023 NN Backprop
45 pages
Mathematical Introduction To Deep Learning
No ratings yet
Mathematical Introduction To Deep Learning
300 pages
Regression
No ratings yet
Regression
30 pages
Chapter 02.background-Theory
No ratings yet
Chapter 02.background-Theory
20 pages
AIML - Unit 4 Notes
No ratings yet
AIML - Unit 4 Notes
23 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Math
No ratings yet
Math
737 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Training NNs
No ratings yet
Training NNs
34 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Curriculum Vitae: Srishti Sengupta
No ratings yet
Curriculum Vitae: Srishti Sengupta
3 pages
Research Methods in Architecture
No ratings yet
Research Methods in Architecture
13 pages
Fbise 9th Class Result 2021 Gazette
No ratings yet
Fbise 9th Class Result 2021 Gazette
1,431 pages
Apegbc Apec Emf Reference Form
No ratings yet
Apegbc Apec Emf Reference Form
6 pages
Eight Steps To Developing An Effective Outline
No ratings yet
Eight Steps To Developing An Effective Outline
2 pages
Revise Ebook Titles v4
No ratings yet
Revise Ebook Titles v4
1 page
Psychology for Aging Well-being
No ratings yet
Psychology for Aging Well-being
11 pages
DM 287 Swimming Classes For Grade 4 Learners
No ratings yet
DM 287 Swimming Classes For Grade 4 Learners
2 pages
Blue Eye Technology Overview
No ratings yet
Blue Eye Technology Overview
13 pages
Economic Geography Dissertation Ideas
100% (2)
Economic Geography Dissertation Ideas
4 pages
Beed 1101 - Ed 102 The Teaching Profession: Research Methodology
100% (1)
Beed 1101 - Ed 102 The Teaching Profession: Research Methodology
2 pages
Minding Theory of Mind
No ratings yet
Minding Theory of Mind
24 pages
Training Needs Analysis Guide
No ratings yet
Training Needs Analysis Guide
10 pages
SSM 109 Lesson 4
No ratings yet
SSM 109 Lesson 4
44 pages
Dessalegn Tolesa
No ratings yet
Dessalegn Tolesa
72 pages
Lesson 5 Developmental Changes in Middle and Late Adolescence
No ratings yet
Lesson 5 Developmental Changes in Middle and Late Adolescence
21 pages
Learning To Solve Complex Scientific Problems David H Jonassen Editor Download
No ratings yet
Learning To Solve Complex Scientific Problems David H Jonassen Editor Download
91 pages
Explainable Churn Prediction Model
No ratings yet
Explainable Churn Prediction Model
6 pages
Philosophical Thoughts On Education: The Modern Conceptions of Education
No ratings yet
Philosophical Thoughts On Education: The Modern Conceptions of Education
16 pages
PR1 HUMSS B Group 2 Revision 3
No ratings yet
PR1 HUMSS B Group 2 Revision 3
22 pages
EY Content Services Assessment - v2.0
No ratings yet
EY Content Services Assessment - v2.0
7 pages
Final Online Games 1 5 1
No ratings yet
Final Online Games 1 5 1
59 pages
English6 - Q2 - DLP - Week1 - Gathering Relevant Information From Various Sources
100% (5)
English6 - Q2 - DLP - Week1 - Gathering Relevant Information From Various Sources
5 pages
Euler Diagrams and Reasoning
No ratings yet
Euler Diagrams and Reasoning
4 pages
Feature Engineering and Deep Learning
No ratings yet
Feature Engineering and Deep Learning
2 pages
Vol - ARS 14 2024
No ratings yet
Vol - ARS 14 2024
458 pages
Positi Vis M
No ratings yet
Positi Vis M
12 pages
TNT - 12 - Q1 - 0204 - SG - Strengthening Intuition
No ratings yet
TNT - 12 - Q1 - 0204 - SG - Strengthening Intuition
14 pages
Solutions Manual To Accompany Mechanics of Materials 7th Edition 9780132209915 Available Full Chapters
No ratings yet
Solutions Manual To Accompany Mechanics of Materials 7th Edition 9780132209915 Available Full Chapters
74 pages
Sociol 1 Syllabus Summer C6 2025
No ratings yet
Sociol 1 Syllabus Summer C6 2025
11 pages

CSC411: Optimization For Machine Learning: September 20-26, 2018

Uploaded by

CSC411: Optimization For Machine Learning: September 20-26, 2018

Uploaded by

CSC411: Optimization for Machine Learning

September 20–26, 2018

In gradient descent, the learning rate ↵ is a hyperparameter we

↵ too small: ↵ too large: ↵ much too large:

Intro ML (UofT) CSC311-Lec3 37 / 42

Batch gradient descent moves directly downhill. SGD takes steps

batch gradient descent stochastic gradient descent

Intro ML (UofT) CSC311-Lec4 66 / 70

In stochastic training, the learning rate also influences the

Intro ML (UofT) CSC311-Lec4 67 / 70

Warning: by reducing the learning rate, you reduce the

Intro ML (UofT) CSC311-Lec4 68 / 70

Stochastic Gradient descent

Stochastic methods have a chance of escaping from bad minima.

You might also like