-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[doc][hackathon] To add AdamW Optimizer to the documentation #63252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 9671c27 (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
128aaea to
d985b28
Compare
d985b28 to
fd9304f
Compare
Codecov Report
@@ Coverage Diff @@
## master #63252 +/- ##
==========================================
- Coverage 66.76% 66.65% -0.12%
==========================================
Files 710 710
Lines 92354 92395 +41
==========================================
- Hits 61658 61582 -76
- Misses 30696 30813 +117 |
a711fbe to
409c1e7
Compare
torch/optim/adamw.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that correct? We don't actually do weight decay this way for this one no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're completely right !
torch/optim/adamw.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the \lambda \theta_{t-1} coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it shouldn't exist, you're right again !
45cdbf8 to
7b1b7ea
Compare
torch/optim/adamw.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing a multiplication by \gamma no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i added it. thanks for pointing it out
7b1b7ea to
9671c27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@iramazanli merged this pull request in 5b21f17. |
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper #63236.
In this PR we are adding description of AdamW Algorithm to the documentation. For more details, we refer to the paper here https://arxiv.org/abs/1711.05101
cc @vincentqb @iramazanli