[RFC][FSDP] Don't move ignored params / buffers to device #108033

rohan-varma · 2023-08-28T01:03:48Z

Stack from ghstack (oldest at bottom):

Since these are ignored by FSDP, don't move them.

Differential Revision: D48727044

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]

pytorch-bot · 2023-08-28T01:03:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108033

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Merge Blocking SEVs

There is 1 active merge blocking SEVs. Please view them below:

(merge blocking) GH API issues are preventing multiple ci jobs to start.

If you must merge, use @pytorchbot merge -f.

✅ No Failures

As of commit 9a5eae4 with merge base a20fac8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) ghstack-source-id: 198857863 Pull Request resolved: #108033

awgu · 2023-08-31T18:46:28Z

torch/distributed/fsdp/_init_utils.py

 import torch.distributed.fsdp._traversal_utils as traversal_utils
 import torch.distributed.fsdp.fully_sharded_data_parallel as fsdp_file
 import torch.nn as nn
+from torch import Tensor


nit: If we do this, I feel like we should replace all torch.Tensor in the file with Tensor, or else it gets fragmented.

Maybe just use torch.Tensor on L857?

awgu · 2023-08-31T18:46:47Z

torch/distributed/fsdp/_init_utils.py

            for submodule in curr_module.children():
                if not isinstance(submodule, fsdp_file.FullyShardedDataParallel):
                    queue.append(submodule)
        # NOTE: This includes moving ignored modules' parameters. If we


We should remove this note.

awgu · 2023-08-31T18:47:09Z

I am on board with the change.

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]

Pull Request resolved: #108033 Since these are ignored by FSDP, don't move them. ghstack-source-id: 199417931 @exported-using-ghexport Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/)

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]

awgu · 2023-09-01T22:20:29Z

Please check lint before landing. Thanks!

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]

…NTRANT (#108435) We should use no_reentrant. There are a lot of users of this API, but it is in a prototype state so should be fine to change. Differential Revision: [D48898148](https://our.internmc.facebook.com/intern/diff/D48898148/) Pull Request resolved: #108435 Approved by: https://github.com/awgu ghstack dependencies: #108032, #108033

ppwwyyxx · 2024-05-08T04:39:41Z

torch/distributed/fsdp/_init_utils.py

+    ignored_buffers: Set[torch.Tensor],
    device_from_device_id: Optional[torch.device],
 ) -> None:
    """


the docstring below should be updated

[RFC][FSDP] Don't move ignored params / buffers to device

46df9ad

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]

rohan-varma requested review from H-Huang, awgu, fduwjj, fegin, kwen2501, mrshenli, wanchaol, wz337 and zhaojuanmao as code owners August 28, 2023 01:03

rohan-varma requested review from d4l3k and kiukchung as code owners August 28, 2023 01:03

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Aug 28, 2023

rohan-varma mentioned this pull request Aug 28, 2023

[RFC] Don't materialize ignored modules for FSDP #108032

Closed

awgu reviewed Aug 31, 2023

View reviewed changes

Update on "[RFC][FSDP] Don't move ignored params / buffers to device"

a21cfbc

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]

rohan-varma requested a review from awgu September 1, 2023 00:43

Update on "[RFC][FSDP] Don't move ignored params / buffers to device"

0c7d515

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]

rohan-varma mentioned this pull request Sep 1, 2023

[RFC] Somewhat BC breaking: make checkpoint_wrapper default to NO_REENTRANT #108435

Closed

awgu approved these changes Sep 1, 2023

View reviewed changes

rohan-varma added 2 commits September 5, 2023 00:40

Update on "[RFC][FSDP] Don't move ignored params / buffers to device"

2026056

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]

Update on "[RFC][FSDP] Don't move ignored params / buffers to device"

9a5eae4

Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) [ghstack-poisoned]

pytorchmergebot added the Merged label Sep 5, 2023

pytorchmergebot closed this in db6d09c Sep 5, 2023

facebook-github-bot deleted the gh/rohan-varma/735/head branch September 9, 2023 14:22

ppwwyyxx reviewed May 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC][FSDP] Don't move ignored params / buffers to device #108033

[RFC][FSDP] Don't move ignored params / buffers to device #108033

Uh oh!

rohan-varma commented Aug 28, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 28, 2023 •

edited

Loading

Uh oh!

awgu Aug 31, 2023

Uh oh!

awgu Aug 31, 2023

Uh oh!

awgu commented Aug 31, 2023

Uh oh!

awgu commented Sep 1, 2023

Uh oh!

ppwwyyxx May 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[RFC][FSDP] Don't move ignored params / buffers to device #108033

[RFC][FSDP] Don't move ignored params / buffers to device #108033

Uh oh!

Conversation

rohan-varma commented Aug 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108033

❗ 1 Merge Blocking SEVs

✅ No Failures

Uh oh!

awgu Aug 31, 2023

Choose a reason for hiding this comment

Uh oh!

awgu Aug 31, 2023

Choose a reason for hiding this comment

Uh oh!

awgu commented Aug 31, 2023

Uh oh!

awgu commented Sep 1, 2023

Uh oh!

ppwwyyxx May 8, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rohan-varma commented Aug 28, 2023 •

edited

Loading

pytorch-bot bot commented Aug 28, 2023 •

edited

Loading