KEMBAR78
[FSDP2] Added `_set_unshard_async_op` by awgu · Pull Request #135523 · pytorch/pytorch · GitHub
Skip to content

Conversation

@awgu
Copy link
Collaborator

@awgu awgu commented Sep 9, 2024

Stack from ghstack (oldest at bottom):

This PR adds a private API _set_unshard_async_op that allows for running pre-forward and pre-backward all-gathers using the async_op=True path so that all-gather allocations happen in the default stream to avoid inter-stream fragmentation.

If using this option, forward requires explicit prefetching e.g. via the unshard(async_op=True) API for overlap. fp32 -> bf16 casts and the all-gather copy-in will not overlap with compute.

cc @XilunWu @H-Huang @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

Differential Revision: D62401551

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135523

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 266fc37 with merge base b7eb725 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels Sep 9, 2024
@awgu awgu added release notes: distributed (fsdp2) release notes category and removed release notes: distributed (fsdp) release notes category labels Sep 9, 2024
cc XilunWu H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
awgu pushed a commit that referenced this pull request Sep 9, 2024
ghstack-source-id: 0c8bc84
Pull Request resolved: #135523
@awgu
Copy link
Collaborator Author

awgu commented Sep 9, 2024

@awgu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@awgu awgu marked this pull request as ready for review September 9, 2024 22:16
@awgu awgu requested a review from weifengpy September 10, 2024 16:30
@awgu awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 10, 2024
@awgu
Copy link
Collaborator Author

awgu commented Sep 10, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
This PR adds a private API `_set_unshard_async_op` that allows for running pre-forward and pre-backward all-gathers using the `async_op=True` path so that all-gather allocations happen in the default stream to avoid inter-stream fragmentation.

If using this option, forward requires explicit prefetching e.g. via the `unshard(async_op=True)` API for overlap. fp32 -> bf16 casts and the all-gather copy-in will not overlap with compute.

Differential Revision: [D62401551](https://our.internmc.facebook.com/intern/diff/D62401551)
Pull Request resolved: pytorch#135523
Approved by: https://github.com/weifengpy
@github-actions github-actions bot deleted the gh/awgu/637/head branch October 12, 2024 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp2) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants