-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add generic join unit tests #61786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generic join unit tests #61786
Conversation
[ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit a7e4496 (more details on the Dr. CI page and at hud.pytorch.org/pr/61786):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Install Visual Studio 2019 toolchain | 🔁 rerun | |
| Install dependencies | 🔁 rerun |
ci.pytorch.org: 1 failed
Preview docs built from this PR
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
[ghstack-poisoned]
|
@andwgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to add this new test file to https://github.com/pytorch/pytorch/blob/master/test/run_test.py to enable it?
This adds unit tests for the generic join context manager. ``` gpurun python test/distributed/algorithms/test_join.py ``` Differential Revision: [D29746646](https://our.internmc.facebook.com/intern/diff/D29746646) [ghstack-poisoned]
|
@andwgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I left some minor comments. Please make sure tests are launched in CI and tests pass before landing.
When imported, please also add this to internal tests.
| inputs = self.construct_uneven_inputs(BASE_NUM_INPUTS, OFFSET) | ||
| with _Join([allreducer], run_post_hooks=True): | ||
| for _ in inputs: | ||
| allreducer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would I be correct if I assume main hook would still run in this case? And compared to test_single_joinable, the only difference here is that test_single_joinable_post_hooks does not check main hook results? If so, does it make sense if we remove this test, as it seems test_single_joinable behaves exactly the same way and only differs in asserts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now made test_single_joinable_post_hooks() not run the main hooks.
| @require_n_gpus_for_nccl_backend( | ||
| WORLD_SIZE, BACKEND | ||
| ) | ||
| def test_single_joinable(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to dedup test code with test_single_joinable_main_hooks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I factored the core logic out into a single _test_join_base() and now use that as the base for all tests.
| @require_n_gpus_for_nccl_backend( | ||
| WORLD_SIZE, BACKEND | ||
| ) | ||
| def test_multiple_joinables(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this can also dedup with test_single_joinable_main_hooks by providing a list of hooks and expected results to a common function?
| allreducer = AllReducer(self.device, self.process_group) | ||
| inputs = self.construct_uneven_inputs(BASE_NUM_INPUTS, OFFSET) | ||
| allreduce_total = 0 | ||
| with self.assertRaises(RuntimeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we check the error message is expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| ] | ||
| inputs = self.construct_uneven_inputs(BASE_NUM_INPUTS, OFFSET) | ||
| allreduce_total = 0 | ||
| with self.assertRaises(RuntimeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This adds unit tests for the generic join context manager. ``` gpurun python test/distributed/algorithms/test_join.py ``` Differential Revision: [D29746646](https://our.internmc.facebook.com/intern/diff/D29746646) [ghstack-poisoned]
|
@andwgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This adds unit tests for the generic join context manager. ``` gpurun python test/distributed/algorithms/test_join.py ``` Differential Revision: [D29746646](https://our.internmc.facebook.com/intern/diff/D29746646) [ghstack-poisoned]
|
@andwgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
This adds unit tests for the generic join context manager. ``` gpurun python test/distributed/algorithms/test_join.py ``` Differential Revision: [D29746646](https://our.internmc.facebook.com/intern/diff/D29746646) [ghstack-poisoned]
This adds unit tests for the generic join context manager. ``` gpurun python test/distributed/algorithms/test_join.py ``` Differential Revision: [D29746646](https://our.internmc.facebook.com/intern/diff/D29746646) [ghstack-poisoned]
|
@andwgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Stack from ghstack:
This adds unit tests for the generic join context manager.
Differential Revision: D29746646