-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[doc][hackathon] To add Adagrad Optimizer to the documentation #63254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 464a9fa (more details on the Dr. CI page):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
348a347 to
2d807b3
Compare
Codecov Report
@@ Coverage Diff @@
## master #63254 +/- ##
==========================================
- Coverage 66.73% 66.33% -0.41%
==========================================
Files 695 703 +8
Lines 90833 92223 +1390
==========================================
+ Hits 60618 61175 +557
- Misses 30215 31048 +833 |
46d58f3 to
a3b12d1
Compare
|
I don't see initial_accumulator_value and lr_decay? Also should we mention what is the behavior when the gradient is sparse? |
Let me know if you have feedbacks regarding these two point. Regarding sparse behavior, i would be happy to follow your suggestion. |
Is that done on purpose? It looks like the args doc need to be updated. In any case, in the rendered version, the argument is present in the signature, so I think it should be documented properly.
I agree with your point that we might want to discourage people from using this argument. |
f3e7b0e to
4564618
Compare
It's done ! thanks for the comments. |
torch/optim/adagrad.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit you don't want to assign to gamma here right? The new value should not be used by the next step right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update is done
torch/optim/adagrad.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this g_t^2 ?
Ho I guess that works with your definition of G.
But why not just use g_t^2 and remove the diag on the line below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to the shape your recommended
torch/optim/adagrad.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit missing epsilon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added it
4564618 to
9cc53de
Compare
9cc53de to
464a9fa
Compare
|
@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@iramazanli merged this pull request in d4b09db. |
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper #63236.
In this PR we are adding description of Adagrad to the documentation. For more details, we refer to the paper
http://jmlr.org/papers/v12/duchi11a.html
cc @vincentqb @iramazanli