KEMBAR78
Detect accelerator type when backend is not specified by kwen2501 · Pull Request #142216 · pytorch/pytorch · GitHub
Skip to content

Conversation

@kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Dec 6, 2024

Stack from ghstack (oldest at bottom):

Today, when user does init_process_group(), without backend or device_id specification, we would auto-translate it into cuda:nccl,cpu:gloo. The idea was to initialize all default backends to cover what the user may do later.

A side effect is increase of initialization time and resources.

This PR changes it to detecting the accelerator type on the machine, and initialize only the backend for that accelerator.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142216

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bf1aec7 with merge base 61dc5e9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kwen2501 added a commit that referenced this pull request Dec 6, 2024
ghstack-source-id: f33839e
Pull Request resolved: #142216
@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Dec 6, 2024
@kwen2501
Copy link
Contributor Author

kwen2501 commented Dec 6, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
Today, when user does `init_process_group()`, without `backend` or `device_id` specification, we would auto-translate it into `cuda:nccl,cpu:gloo`. The idea was to initialize all **default** backends to cover what the user may do later.

A side effect is increase of initialization time and resources.

This PR changes it to detecting the accelerator type on the machine, and initialize only the backend for that accelerator.

Pull Request resolved: pytorch#142216
Approved by: https://github.com/wconstab, https://github.com/XilunWu
kwen2501 added a commit that referenced this pull request Dec 9, 2024
Update doc to reflect change brought by #142216

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
kwen2501 added a commit that referenced this pull request Dec 9, 2024
Update doc to reflect change brought by #142216

cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Dec 9, 2024
Update doc to reflect change brought by #142216

Pull Request resolved: #142404
Approved by: https://github.com/XilunWu
@github-actions github-actions bot deleted the gh/kwen2501/111/head branch January 6, 2025 02:08
pytorchmergebot pushed a commit that referenced this pull request Sep 6, 2025
 inconsistent with the logic introduced in #162157  and modified in #142216.This update ensures the documentation matches the actual behavior of the code.

Pull Request resolved: #162158
Approved by: https://github.com/wconstab
daisyden pushed a commit to daisyden/pytorch that referenced this pull request Sep 8, 2025
 inconsistent with the logic introduced in pytorch#162157  and modified in pytorch#142216.This update ensures the documentation matches the actual behavior of the code.

Pull Request resolved: pytorch#162158
Approved by: https://github.com/wconstab
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
 inconsistent with the logic introduced in pytorch#162157  and modified in pytorch#142216.This update ensures the documentation matches the actual behavior of the code.

Pull Request resolved: pytorch#162158
Approved by: https://github.com/wconstab
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
 inconsistent with the logic introduced in pytorch#162157  and modified in pytorch#142216.This update ensures the documentation matches the actual behavior of the code.

Pull Request resolved: pytorch#162158
Approved by: https://github.com/wconstab
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
 inconsistent with the logic introduced in pytorch#162157  and modified in pytorch#142216.This update ensures the documentation matches the actual behavior of the code.

Pull Request resolved: pytorch#162158
Approved by: https://github.com/wconstab
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
 inconsistent with the logic introduced in pytorch#162157  and modified in pytorch#142216.This update ensures the documentation matches the actual behavior of the code.

Pull Request resolved: pytorch#162158
Approved by: https://github.com/wconstab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants