KEMBAR78
[C10D] Support group_dst/group_src in c10d send/recv object_list by wconstab · Pull Request #140847 · pytorch/pytorch · GitHub
Skip to content

Conversation

@wconstab
Copy link
Contributor

@wconstab wconstab commented Nov 15, 2024

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Nov 15, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140847

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 440d2cf with merge base b379a28 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@wconstab wconstab mentioned this pull request Nov 16, 2024
wconstab added a commit that referenced this pull request Nov 17, 2024
wconstab added a commit that referenced this pull request Nov 17, 2024
@wconstab wconstab requested review from H-Huang and kwen2501 and removed request for kwen2501 November 18, 2024 17:21
…t_list"


Also add mypy annotations

Partially addresses RFC 0042 (pytorch/rfcs#71)
See more details/motivation in #140460


[ghstack-poisoned]
Copy link
Member

@H-Huang H-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

serialized and converted to tensors which are moved to the
``device`` before sending. Default is ``None``.
group_dst (int, optional): Destination rank on ``group``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you specify group_dst is there a requirement to also specify group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. Technically, group defaults to 'default_group' and that's documented. So I allowed using group_dst along with default group. (This means in that case group_dst would be == global_dst).

x = [{}]
c10d.recv_object_list(x, src=self.rank + 1, group=subgroup, device=device)
if group_rank:
c10d.recv_object_list(x, group_src=1, group=subgroup, device=device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of group_src=1 is there a way to do something like subgroup.local_rank() + 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it would be equivalent. I just left it this way bc the test already hardcodes 4 ranks and 2 subgroups, so this is pretty clear IMO

…t_list"


Also add mypy annotations

Partially addresses RFC 0042 (pytorch/rfcs#71)
See more details/motivation in #140460


[ghstack-poisoned]
wconstab added a commit that referenced this pull request Nov 18, 2024
@wconstab
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 18, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…orch#140847)

Also add mypy annotations

Partially addresses RFC 0042 (pytorch/rfcs#71)
See more details/motivation in pytorch#140460

Pull Request resolved: pytorch#140847
Approved by: https://github.com/H-Huang
ghstack dependencies: pytorch#140843
Esquains pushed a commit to Esquains/study1 that referenced this pull request Dec 15, 2024
@github-actions github-actions bot deleted the gh/wconstab/366/head branch December 19, 2024 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants