KEMBAR78
[c10d] Start deprecating *_multigpu APIs by kwen2501 · Pull Request #85961 · pytorch/pytorch · GitHub
Skip to content

Conversation

@kwen2501
Copy link
Contributor

Deprecation reasons:

  • For most users training is on one GPU per process so these APIs are rarely used
  • They added one more API dimension
  • They can be expressed in a composed manner
  • They are not abstracted – specific to GPU
  • They caused backend APIs and implementations to have nested std::vector<std::vector<Tensor>>, which is hard to read or maintain

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 30, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85961

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures, 1 Pending

As of commit 217d2c5:

The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: distributed (c10d) release notes category label Sep 30, 2022
@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 30, 2022
@kwen2501
Copy link
Contributor Author

The "CUDA out of memory" CI error is unrelated

Copy link
Contributor

@XilunWu XilunWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thx for working on it!

"""
warnings.warn(
"torch.distributed.broadcast_multigpu will be deprecated. If you must "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 30, 2022
@kwen2501
Copy link
Contributor Author

kwen2501 commented Oct 1, 2022

@pytorchbot merge -f 'The CUDA out of memory error is unrelated to this change'

@kwen2501 kwen2501 added the topic: deprecation topic category label Oct 1, 2022
@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the force (-f) flag. This means your change will be merged immediately, bypassing any CI checks (ETA: 1-5 minutes). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

mehtanirav pushed a commit that referenced this pull request Oct 4, 2022
### Deprecation reasons:
- For most users training is on one GPU per process so these APIs are rarely used
- They added one more API dimension
- They can be expressed in a composed manner
- They are not abstracted – specific to GPU
- They caused backend APIs and implementations to have nested `std::vector<std::vector<Tensor>>`, which is hard to read or maintain

Pull Request resolved: #85961
Approved by: https://github.com/XilunWu, https://github.com/H-Huang
@github-actions github-actions bot deleted the c10d_deprecate_multigpu branch April 1, 2024 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category topic: deprecation topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants