-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[RFC][FSDP] Don't move ignored params / buffers to device #108033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108033
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Merge Blocking SEVsThere is 1 active merge blocking SEVs. Please view them below:
If you must merge, use ✅ No FailuresAs of commit 9a5eae4 with merge base a20fac8 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) ghstack-source-id: 198857863 Pull Request resolved: #108033
| import torch.distributed.fsdp._traversal_utils as traversal_utils | ||
| import torch.distributed.fsdp.fully_sharded_data_parallel as fsdp_file | ||
| import torch.nn as nn | ||
| from torch import Tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If we do this, I feel like we should replace all torch.Tensor in the file with Tensor, or else it gets fragmented.
Maybe just use torch.Tensor on L857?
| for submodule in curr_module.children(): | ||
| if not isinstance(submodule, fsdp_file.FullyShardedDataParallel): | ||
| queue.append(submodule) | ||
| # NOTE: This includes moving ignored modules' parameters. If we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this note.
|
I am on board with the change. |
Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]
Pull Request resolved: #108033 Since these are ignored by FSDP, don't move them. ghstack-source-id: 199417931 @exported-using-ghexport Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/)
Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]
|
Please check lint before landing. Thanks! |
Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]
Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]
…NTRANT (#108435) We should use no_reentrant. There are a lot of users of this API, but it is in a prototype state so should be fine to change. Differential Revision: [D48898148](https://our.internmc.facebook.com/intern/diff/D48898148/) Pull Request resolved: #108435 Approved by: https://github.com/awgu ghstack dependencies: #108032, #108033
| ignored_buffers: Set[torch.Tensor], | ||
| device_from_device_id: Optional[torch.device], | ||
| ) -> None: | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docstring below should be updated
Stack from ghstack (oldest at bottom):
Since these are ignored by FSDP, don't move them.
Differential Revision: D48727044