-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[c10d] Remove deprecated multi-gpu-per-thread APIs #114156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As of today, PyTorch Distributed's preferred programming model is one device per thread, as exemplified by the APIs in its document. The multi-GPU functions (which stand for multiple GPUs per CPU thread) have been deprecated for three versions. Removing them now before 2.2 release.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114156
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ef380df with merge base 140c54e ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public api doc change sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I think there is a lot of follow up work that can be done (BE?)
We can probably comb through a lot of util functions and find which has an assumption for multiple GPUs and remove that logic to simplify our code. For example, in PGnccl getDeviceList() only makes sense for multigpus and we have a lot of, now unnecessary, logic of looping through devices.
| seq_++; | ||
|
|
||
| // Currently, the API permits two scenarios where inputs.size() and | ||
| // Currently, the API permits one scenario where inputs.size() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create a follow up task for removing the vector arguments for all collectives (e.g. std::vector<at::Tensor>& to at::Tensor&)? The only reason vector was added is for multi-gpu collectives right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the _coalesced python APIs (the "one scenario" left referred here).
And yeah, we'd need to think of a strategic way to remove them.
|
@H-Huang correct, there are a lot of Better Engineering to do. This PR just removes the user-facing APIs, as a starting point. Next we could go through the backend implementation in ProcessGroupNCCL.cpp. |
|
Adding |
| nGPUs = torch.cuda.device_count() | ||
| visible_devices = range(nGPUs) | ||
|
|
||
| if backend == "nccl": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to remove init_multigpu_helper completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely not. It is used in a lot of places (even non-distributed tests). So I let it stay, because other tests may need it for other purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
This has caused multiple issues in the Meta-internal pipelines. |
As of today, PyTorch Distributed's preferred programming model is one device per thread, as exemplified by the APIs in its document. The multi-GPU functions (which stand for multiple GPUs per CPU thread) have been deprecated for three versions. Removing them now before 2.2 release.
cc @ezyang @gchanan