-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[FSDP2] Move to public torch.distributed.fsdp
#141868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141868
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit 1b534f8 with merge base bab15df ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
test/distributed/_composable/test_composability/test_pp_composability.py
Outdated
Show resolved
Hide resolved
test/distributed/_composable/fsdp/test_fully_shard_mixed_precision.py
Outdated
Show resolved
Hide resolved
cc H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o LucasLLC MeetVadakkanchery mhorowitz pradeepfn [ghstack-poisoned]
will need several CI iterations to get this right -- please do not review yet cc H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o LucasLLC MeetVadakkanchery mhorowitz pradeepfn [ghstack-poisoned]
will need several CI iterations to get this right -- please do not review yet cc H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames LucasLLC MeetVadakkanchery mhorowitz pradeepfn [ghstack-poisoned]
will need several CI iterations to get this right -- please do not review yet cc H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames LucasLLC MeetVadakkanchery mhorowitz pradeepfn [ghstack-poisoned]
will need several CI iterations to get this right -- please do not review yet cc H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames LucasLLC MeetVadakkanchery mhorowitz pradeepfn [ghstack-poisoned]
will need several CI iterations to get this right -- please do not review yet cc H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames LucasLLC MeetVadakkanchery mhorowitz pradeepfn [ghstack-poisoned]
will need several CI iterations to get this right -- please do not review yet cc H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames LucasLLC MeetVadakkanchery mhorowitz pradeepfn [ghstack-poisoned]
|
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 45583a5. Reverted #141868 on behalf of https://github.com/atalman due to failing internally ([comment](#141868 (comment)))
|
@awgu your PR has been successfully reverted. |
**Overview** This PR moves `torch/distributed/_composable/fsdp` to `torch/distributed/fsdp/_fully_shard` and makes public APIs available from `torch.distributed.fsdp`, e.g.: ``` from torch.distributed.fsdp import fully_shard ``` This is targeting 2.6 release. I rewrote some of the documentation with (hopefully) improved phrasing. **Follow-Ups** - [x] Add some explanation in the docs about FSDP1 vs. FSDP2 - [ ] Move unit tests from `test/distributed/_composable/fsdp` to `test/distributed/fsdp/fully_shard/` cc H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov LucasLLC MeetVadakkanchery mhorowitz pradeepfn [ghstack-poisoned]
|
@awgu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
failure is not related |
|
internal tests look good! |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 3, 5, lf.linux.4xlarge.nvidia.gpu, oncall:debug-build) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
**Overview** This PR moves `torch/distributed/_composable/fsdp` to `torch/distributed/fsdp/_fully_shard` and makes public APIs available from `torch.distributed.fsdp`, e.g.: ``` from torch.distributed.fsdp import fully_shard ``` This is targeting 2.6 release. I rewrote some of the documentation with (hopefully) improved phrasing. **Follow-Ups** - [x] Add some explanation in the docs about FSDP1 vs. FSDP2 - [ ] Move unit tests from `test/distributed/_composable/fsdp` to `test/distributed/fsdp/fully_shard/` Pull Request resolved: #141868 Approved by: https://github.com/kwen2501, https://github.com/wconstab, https://github.com/weifengpy Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
This reverts commit ad93aa8. Reverted pytorch#141398 on behalf of https://github.com/atalman due to Sorry need to revert pytorch#141868, we can try rebase and reland this after ([comment](pytorch#141398 (comment)))
This reverts commit 45583a5. Reverted #141868 on behalf of https://github.com/atalman due to failing internally ([comment](#141868 (comment)))
**Overview** This PR moves `torch/distributed/_composable/fsdp` to `torch/distributed/fsdp/_fully_shard` and makes public APIs available from `torch.distributed.fsdp`, e.g.: ``` from torch.distributed.fsdp import fully_shard ``` This is targeting 2.6 release. I rewrote some of the documentation with (hopefully) improved phrasing. **Changes for Reland** - Preserved the public objects from `torch/distributed/_composable/fsdp/fully_shard.py` so that the import path still works internally - Added a unit test that we can do `from torch.distributed._composable.fsdp.fully_shard import FSDPModule` Differential Revision: [D66890387](https://our.internmc.facebook.com/intern/diff/D66890387) Pull Request resolved: #141868 Approved by: https://github.com/kwen2501, https://github.com/wconstab, https://github.com/weifengpy, https://github.com/fegin, https://github.com/XilunWu Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
ghstack-source-id: 2b447e0 Pull Request resolved: pytorch/pytorch#141868
Stack from ghstack (oldest at bottom):
torch.distributed.fsdp#141868Overview
This PR moves
torch/distributed/_composable/fsdptotorch/distributed/fsdp/_fully_shardand makes public APIs available fromtorch.distributed.fsdp, e.g.:This is targeting 2.6 release. I rewrote some of the documentation with (hopefully) improved phrasing.
Changes for Reland
torch/distributed/_composable/fsdp/fully_shard.pyso that the import path still works internallyfrom torch.distributed._composable.fsdp.fully_shard import FSDPModulecc @H-Huang @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @LucasLLC @MeetVadakkanchery @mhorowitz @pradeepfn
Differential Revision: D66890387