-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[c10d] Fix extra CUDA context created by barrier #152834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #149119. In ProcessGroup.hpp, we create a dummy tensor for dispatching. This requires a correct device index. This PR uses `device_id` given by user when calling `init_process_group`. This PR also uses `torch._C._get_accelerator()` to determine the device type. ghstack-source-id: 96c32b9 Pull Request resolved: #149144
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152834
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 4 Cancelled Jobs, 1 Unrelated FailureAs of commit 99138ee with merge base 924a247 ( NEW FAILURE - The following job has failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test that no extra contexts are created?
| ) | ||
| # Detect the accelerator on the machine. If no accelerator is available, it | ||
| # returns CPU. | ||
| device = torch._C._get_accelerator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_accelerator poisons the context on the current device, to just get the accelerator on the machine it's better to use _accelerator_getAccelerator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @albanD torch.accelerator.current_accelerator() is also non-poisoning
| # may use default device 0, causing issues like hang or all processes | ||
| # creating context on device 0. | ||
| opts.device = device | ||
| warnings.warn( # warn only once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it actually warn only once by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #149119.
In ProcessGroup.hpp, we create a dummy tensor for dispatching. This requires a correct device index. This PR uses
device_idgiven by user when callinginit_process_group.This PR also uses
torch._C._get_accelerator()to determine the device type.ghstack-source-id: 96c32b9565794d995c26bd1794856d1ef7961652
Pull Request resolved: #149144
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k