-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[PT][FSDP] fail set_allocate_memory_from_process_group if used together with custom comm hooks
#157487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157487
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 11702fa with merge base af9c92b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D77681620 |
set_allocate_memory_from_process_group if used together withset_allocate_memory_from_process_group if used together with custom comm hooks
…ther with (pytorch#157487) Summary: This is a follow up after the PR to add comm override support: pytorch#155189 The previous PR loosely checks the allocation mixin classes, which isn't really safe as the actual hook may still override the behavior. This may lead to unnecessary confusion for no good use case. So for now we just make the 2 sets of APIs largely incompatible: 1. setting custom comms after `set_allocate_memory_from_process_group_for_comm()` is ok. 2. setting `set_allocate_memory_from_process_group_for_comm()` after custom comms is ko. Basically `set_allocate_memory_from_process_group_for_comm` is like a drop in hammer while the `set_custom_all_gather/reduce_scatter()` are like finer-grained scalpels that require more code crafted. We can revisit this if there's use case in between but for now they can be largely viewed independent from each other (even tho we do share some of the underlying pieces for now, that could be subject to change and should not be exposed to end users). Test Plan: added UT Reviewed By: weifengpy Differential Revision: D77681620
8885ee2 to
11702fa
Compare
|
This pull request was exported from Phabricator. Differential Revision: D77681620 |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
This is a follow up after the PR to add comm override support: #155189
The previous PR loosely checks the allocation mixin classes, which isn't really safe as the actual hook may still override the behavior.
This may lead to unnecessary confusion for no good use case. So for now we just make the 2 sets of APIs largely incompatible:
set_allocate_memory_from_process_group_for_comm()is ok.set_allocate_memory_from_process_group_for_comm()after custom comms is ko.Basically
set_allocate_memory_from_process_group_for_commis like a drop in hammer while theset_custom_all_gather/reduce_scatter()are like finer-grained scalpels that require more code crafted.We can revisit this if there's use case in between but for now they can be largely viewed independent from each other (even tho we do share some of the underlying pieces for now, that could be subject to change and should not be exposed to end users).
Test Plan: added UT
Differential Revision: D77681620
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k