KEMBAR78
To add RMSProp algorithm documentation by iramazanli · Pull Request #63721 · pytorch/pytorch · GitHub
Skip to content

Conversation

@iramazanli
Copy link
Contributor

@iramazanli iramazanli commented Aug 21, 2021

It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper #63236.

In this PR we are adding description of RMSProp to the documentation. For more details, we refer to the paper https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

RMSProp

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 21, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit add63b0 (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



1 job timed out:

  • pytorch_linux_xenial_py3_clang7_asan_test1

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_py3_clang7_asan_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Aug 28 00:22:07 AssertionError: 5.0 not less than or equal to 0
Aug 28 00:22:06   test_get (__main__.MultiprocessingRequestQueueTest) ... ok (0.024s)
Aug 28 00:22:06   test_get_less_than_size (__main__.MultiprocessingRequestQueueTest) ... FAIL (0.520s)
Aug 28 00:22:07   test_get_size (__main__.MultiprocessingRequestQueueTest) ... ok (0.921s)
Aug 28 00:22:07 
Aug 28 00:22:07 ======================================================================
Aug 28 00:22:07 FAIL [0.520s]: test_get_less_than_size (__main__.MultiprocessingRequestQueueTest)
Aug 28 00:22:07 ----------------------------------------------------------------------
Aug 28 00:22:07 Traceback (most recent call last):
Aug 28 00:22:07   File "distributed/elastic/timer/local_timer_test.py", line 182, in test_get_less_than_size
Aug 28 00:22:07     self.assertLessEqual(n / 2, len(requests))
Aug 28 00:22:07 AssertionError: 5.0 not less than or equal to 0
Aug 28 00:22:07 
Aug 28 00:22:07 ----------------------------------------------------------------------
Aug 28 00:22:07 Ran 14 tests in 4.522s
Aug 28 00:22:07 
Aug 28 00:22:07 FAILED (failures=1)
Aug 28 00:22:07 
Aug 28 00:22:07 Generating XML reports...
Aug 28 00:22:07 Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerServerTest-20210828002202.xml
Aug 28 00:22:07 Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerTest-20210828002202.xml
Aug 28 00:22:07 Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-MultiprocessingRequestQueueTest-20210828002202.xml

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@iramazanli iramazanli force-pushed the rmsprop_algorithm_doc branch from db4de6f to 8fa9fdb Compare August 21, 2021 01:17
@iramazanli iramazanli requested a review from albanD August 21, 2021 01:24
@iramazanli iramazanli force-pushed the rmsprop_algorithm_doc branch from 8fa9fdb to 0701462 Compare August 21, 2021 01:48
@codecov
Copy link

codecov bot commented Aug 21, 2021

Codecov Report

Merging #63721 (8496590) into master (49b782b) will increase coverage by 0.08%.
The diff coverage is n/a.

❗ Current head 8496590 differs from pull request most recent head add63b0. Consider uploading reports for the commit add63b0 to get more accurate results

@@            Coverage Diff             @@
##           master   #63721      +/-   ##
==========================================
+ Coverage   66.73%   66.81%   +0.08%     
==========================================
  Files         695      695              
  Lines       90833    90845      +12     
==========================================
+ Hits        60618    60700      +82     
+ Misses      30215    30145      -70     

@iramazanli iramazanli force-pushed the rmsprop_algorithm_doc branch 5 times, most recently from 8af380f to 3af2ef4 Compare August 25, 2021 13:36
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the rendered version, that helps a lot!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a typo in the previous doc right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually \alpha generally stands for learning rate, however for the specific case of RMSProp alpha stands for coefficient of square average in the literature (including in PyTorch parameter alpha - smoothing constant).

So, the learning rate scheduler needed to have some other representation which i selected \gamma. We can change it to some other character if you like :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a specific preference. If there is one character that is not used by any of the optimizers, we should use that one so that we can make all the optimizers consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so i have changed everything to \gamma for learning rate :) now, there should be consistency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the momentum, weight decay and centered arguments?

Copy link
Contributor Author

@iramazanli iramazanli Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point I provided algorithms as they are provided in their corresponding research papers. For example Adam has the following look

AdamInthePaper

So I tried to keep consistent with papers. Let me know if you think that we should go for different design.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can link to the paper if people want to see the original algorithm. But the one we provide should properly reflect what is actually implemented. And all the arguments we support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be f_{t-1} no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually in many papers the notation is "f_t" in particular "f_t (\theta_{t-1})"

For example

To my understanding the idea is we use parameter \theta_{t-1} because the latest computed parameter is at timestep t-1. However, the stochastic gradient is computed at this step -t, that's why notation is f_t instead of f_{t-1}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho that's interesting.. But as long as it is consistent for all optimizers, I'm happy :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is consistent in PyTorch documentation and papers :)

@iramazanli
Copy link
Contributor Author

Thanks for the rendered version, that helps a lot!

Thanks for reviewing the PR

@iramazanli iramazanli force-pushed the rmsprop_algorithm_doc branch 6 times, most recently from 8496590 to 500d5c3 Compare August 27, 2021 17:12
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall structure looks good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g^{ave} is missing from here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, thanks for pointing out :) and its fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The momentum buffer is initialized with 0s in the code. Why is this one 1s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes , you're right, i modified it

@iramazanli iramazanli force-pushed the rmsprop_algorithm_doc branch from 500d5c3 to e8e56bb Compare August 27, 2021 19:30
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@iramazanli iramazanli force-pushed the rmsprop_algorithm_doc branch from e8e56bb to add63b0 Compare August 27, 2021 20:36
@facebook-github-bot
Copy link
Contributor

@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@iramazanli merged this pull request in aefa2f3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants