-
Notifications
You must be signed in to change notification settings - Fork 25.7k
To add Stochastic Gradient Descent to Documentation #63805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 01da448 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
be3fc4a to
9914ca7
Compare
Codecov Report
@@ Coverage Diff @@
## master #63805 +/- ##
==========================================
- Coverage 67.09% 66.99% -0.11%
==========================================
Files 692 691 -1
Lines 90579 90570 -9
==========================================
- Hits 60774 60677 -97
- Misses 29805 29893 +88 |
a66c6ad to
4fcf682
Compare
torch/optim/sgd.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this case could just be an "else" for the if t > 1 below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the if t>1 condition happens inside \mu \neq 0 condition. however t==1 condition is in general scope above, so it might be tricky to replace it here.
torch/optim/sgd.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that correct? from the code, we just multiply the current buffer by momentum (not 1-\tau) and we add that to the gradients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i guess formulas get confused when i was trying to make a shortcut expression then return back to the plain version.
4fcf682 to
b1a6141
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
b1a6141 to
01da448
Compare
|
@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@iramazanli merged this pull request in 149f111. |
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper #63236.
In this PR we are adding description of Stochastic Gradient Descent to the documentation.