KEMBAR78
[C10D] Document barrier interaction with device_id by wconstab · Pull Request #159389 · pytorch/pytorch · GitHub
Skip to content

Conversation

@wconstab
Copy link
Contributor

@wconstab wconstab commented Jul 29, 2025

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Jul 29, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159389

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (2 Unrelated Failures)

As of commit 211b052 with merge base 31b3b38 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wconstab added a commit that referenced this pull request Jul 29, 2025
Addresses #159262

ghstack-source-id: 04dca5a
Pull Request resolved: #159389
.. note:: `ProcessGroupNCCL` now blocks the cpu thread till the completion of the barrier collective.
.. warning:: `ProcessGroupNCCL` implements barrier as an all_gather of a 1-element tensor. This tensor will be
allocated on the device specified by 'device_ids' arg if specified, or the device set with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device_ids sounds like a plural, i.e. could be list/tuple. Do you want to clarify that it will be allocated on first (or one of )devices listed in device_ids

the device set with torch.cuda.set_deviceThis sounds a bit confusing to me, especially for multithreaded apps, where each thread can have their own default device. May be it should say something oror on the current device, which for given thread can be queried using torch.cuda.get_device or altered using torch.cuda.set_device API or torch.device context manager

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, to be honest i have no idea what is going on with that argument. Specifically, why it is a list in the first place.

@kwen2501 do you know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out i was incorrect about my understanding of how barrier selects device, it has been improved since i last saw it. I updated the doc to reflect what I see in ProcessGroupNCCL::guessDeviceId

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is vestige of when we used to support multigpu collectives (one process using multiple GPUs) e.g. #85961. But now, our main assumption is 1 process = 1 GPU

Addresses #159262

cc H-Huang awgu wanchaol fegin fduwjj wz337 d4l3k pragupta

[ghstack-poisoned]
wconstab added a commit that referenced this pull request Jul 31, 2025
Addresses #159262

ghstack-source-id: 424ead1
Pull Request resolved: #159389
Addresses #159262

cc H-Huang awgu wanchaol fegin fduwjj wz337 d4l3k pragupta

[ghstack-poisoned]
wconstab added a commit that referenced this pull request Jul 31, 2025
Addresses #159262

ghstack-source-id: 76ecd94
Pull Request resolved: #159389
Copy link
Member

@H-Huang H-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs look good!

Copy link
Contributor

@kwen2501 kwen2501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment looks good to me!

@wconstab
Copy link
Contributor Author

wconstab commented Aug 1, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 1, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@wconstab
Copy link
Contributor Author

wconstab commented Aug 1, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/wconstab/433/head branch September 1, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants