-
Notifications
You must be signed in to change notification settings - Fork 25.7k
To add Nesterov Adam algorithm description to documentation #63793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 8164076 (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
c6068f0 to
84a21b6
Compare
Codecov Report
@@ Coverage Diff @@
## master #63793 +/- ##
==========================================
- Coverage 67.09% 67.07% -0.02%
==========================================
Files 692 691 -1
Lines 90579 90571 -8
==========================================
- Hits 60774 60753 -21
- Misses 29805 29818 +13 |
7071236 to
96b0de5
Compare
e18aff0 to
37be604
Compare
b949b7b to
217c910
Compare
torch/optim/nadam.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t initialization is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed :)
torch/optim/nadam.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do t \psi so that it cannot be confused with \psi_t which is very similar when things are small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: factor out \beta_1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good :)
torch/optim/nadam.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is kind of an abuse of notation to use \mu_{t+1} here :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i agree, so i added one more line that is reflected in rendered version
217c910 to
c0d4bfa
Compare
c0d4bfa to
8164076
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@iramazanli merged this pull request in 9ccb929. |
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper #63236.
In this PR we are adding description of Nesterov Adam Algorithm to the documentation. For more details, we refer to the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ