KEMBAR78
Allow ReadyQueue to handle empty tasks by albanD · Pull Request #15791 · pytorch/pytorch · GitHub
Skip to content

Conversation

@albanD
Copy link
Collaborator

@albanD albanD commented Jan 7, 2019

Allow the comparison function used in ReadyQueue to handle the empty FunctionTasks created by the reentrant autograd.
Fix #11732

@ezyang
Copy link
Contributor

ezyang commented Jan 7, 2019

Any chance we can get a test for this? :)

@albanD
Copy link
Collaborator Author

albanD commented Jan 7, 2019

@ezyang I added a test. Unfortunately the bug relies on two threads finishing their work in a given order and it's not possible to trick them into syncing the right way (as many ops need to run on both in various orders). But I give much more work to one thread than the other and so the test is really reliable with the current autograd engine.
In the unlikely case where the test is flaky, it will wrongfully report success. So it won't add noise to the CI.
Does that sound good? Unfortunately, I cannot find a better way to test this.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

soumith pushed a commit that referenced this pull request Jan 17, 2019
Summary:
Allow the comparison function used in ReadyQueue to handle the empty FunctionTasks created by the reentrant autograd.
Fix #11732
Pull Request resolved: #15791

Differential Revision: D13598006

Pulled By: soumith

fbshipit-source-id: 0bfdf28a735fbfe44f0fdbaf8b74a6198e6a1984
soumith pushed a commit that referenced this pull request Jan 18, 2019
Summary:
Allow the comparison function used in ReadyQueue to handle the empty FunctionTasks created by the reentrant autograd.
Fix #11732
Pull Request resolved: #15791

Differential Revision: D13598006

Pulled By: soumith

fbshipit-source-id: 0bfdf28a735fbfe44f0fdbaf8b74a6198e6a1984
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segfault in dataparallel + checkpoint

3 participants