[dynamo] FSDP + AC + torch.compile #103953

anijain2305 · 2023-06-21T07:11:56Z

Stack from ghstack (oldest at bottom):

-> [dynamo] FSDP + AC + torch.compile #103953

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @ipiszy @chenyang78

[ghstack-poisoned]

pytorch-bot · 2023-06-21T07:11:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103953

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7c3c297:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 0632b73 Pull Request resolved: #103953

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

wconstab · 2023-06-21T18:57:38Z

test/distributed/test_dynamo_distributed.py

+    @skip_if_lt_x_gpu(1)
+    @unittest.skipIf(not has_triton(), "Inductor+gpu needs triton and recent GPU arch")
+    def test_fsdp_activation_checkpointing(self):
+        from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (


do we want to also test the native torch.utils one too?

Thats already tested in test/dynamo/test_activation_checkpointing.py

They are also tested heavily on HF models - more data here - #102935

test/distributed/test_dynamo_distributed.py

wconstab · 2023-06-21T19:01:26Z

test/distributed/test_dynamo_distributed.py

+        with _dynamo_dist_per_rank_init(self.rank, self.world_size):
+            model, inputs = get_toy_model_for_activation_checkpointing(f"cuda:{self.rank}")
+            is_inner = lambda module: isinstance(module, ToyInnerModel)  # noqa: E731
+            wrap_policy = functools.partial(lambda_auto_wrap_policy, lambda_fn=is_inner)


are there any valid cases where we don't use the same policy for fsdp/ac that we need to test? cc @awgu

wconstab · 2023-06-21T19:53:46Z

torch/_dynamo/variables/torch.py


            gm.__name__ = next_name
-            src = NNModuleSource(GetItemSource(self.source, next_name))
+            if self.source.guard_source().is_fsdp_module():


what does 'add_subgraph' do generally? are there more than one subgraph per higher order operator?

It is just a util that inserts the subgraph into the output graph module of Dynamo. Yes, there is no limit on how many subgraphs we can have. torch.cond, for example, has 2 - one for true branch and one for false.

In the case of activation checkpointing, there is only 1.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

torch/_dynamo/utils.py

torch/_higher_order_ops/wrap.py

torch/_functorch/partitioners.py

wconstab

i think it looks good. Would like to get @wanchaol to also review

wanchaol

lgtm, the CI failure looks correlated https://hud.pytorch.org/pr/pytorch/pytorch/103953#14451950178

wanchaol · 2023-06-21T22:51:10Z

test/distributed/test_dynamo_distributed.py

+            model = FSDP(
+                copy.deepcopy(model),
+                auto_wrap_policy=wrap_policy,
+                use_orig_params=True


shall we paramterize the tests and test both use_orig_params == True/False similar to https://github.com/pytorch/pytorch/blob/main/test/distributed/fsdp/test_fsdp_checkpoint.py#L137?

test/distributed/test_dynamo_distributed.py

torch/_dynamo/variables/torch.py

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

ghstack-source-id: eae969f Pull Request resolved: #103953

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

ghstack-source-id: c9b33ee Pull Request resolved: #103953

ezyang · 2023-06-22T15:26:40Z

torch/_dynamo/variables/user_defined.py

                k: variables.ConstantVariable(v) for k, v in self.value.keywords.items()
            }
            partial_kwargs.update(kwargs)
+            if requires_higher_order_op(self.value.func):


This seems like a good opportunity to use the walrus operator.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

anijain2305 · 2023-06-23T02:38:21Z

@pytorchbot merge

pytorchmergebot · 2023-06-23T02:39:55Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

ghstack-source-id: 4d2437d Pull Request resolved: #103953

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

ghstack-source-id: f2716de Pull Request resolved: #103953

anijain2305 · 2023-06-23T22:51:35Z

@pytorchbot merge

pytorchmergebot · 2023-06-23T22:54:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

voznesenskym · 2023-06-23T22:56:36Z

Do you mind testing this against #103711 ? I know its an onerous ask, but I want to make sure we also keep in mind the relative future direction of PT2 + FSDP for things like this.

anijain2305 · 2023-06-26T17:17:35Z

Do you mind testing this against #103711 ? I know its an onerous ask, but I want to make sure we also keep in mind the relative future direction of PT2 + FSDP for things like this.

Oh missed this message. Will certainly do in next couple of weeks. Trying to wrap up some work, and will pick this up.

[WIP][DONT REVIEW] FSDP + AC + torch.compile

ae8a730

[ghstack-poisoned]

anijain2305 requested review from H-Huang, awgu, fduwjj, fegin, kwen2501, mrshenli, rohan-varma, wanchaol and zhaojuanmao as code owners June 21, 2023 07:11

anijain2305 added a commit that referenced this pull request Jun 21, 2023

[WIP][DONT REVIEW] FSDP + AC + torch.compile

57e3fbd

ghstack-source-id: 0632b73 Pull Request resolved: #103953

github-actions bot added ciflow/inductor module: dynamo labels Jun 21, 2023

anijain2305 added a commit that referenced this pull request Jun 21, 2023

[dynamo] FSDP + AC + torch.compile

61ad095

ghstack-source-id: 0632b73 Pull Request resolved: #103953

Update on "[WIP][DONT REVIEW] FSDP + AC + torch.compile"

a99af36

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

anijain2305 changed the title ~~[WIP][DONT REVIEW] FSDP + AC + torch.compile~~ [dynamo] FSDP + AC + torch.compile Jun 21, 2023

wconstab reviewed Jun 21, 2023

View reviewed changes

test/distributed/test_dynamo_distributed.py Show resolved Hide resolved

wconstab reviewed Jun 21, 2023

View reviewed changes

Update on "[dynamo] FSDP + AC + torch.compile"

ee89d5c

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

wconstab reviewed Jun 21, 2023

View reviewed changes

torch/_dynamo/utils.py Show resolved Hide resolved

wconstab reviewed Jun 21, 2023

View reviewed changes

torch/_higher_order_ops/wrap.py Outdated Show resolved Hide resolved

wconstab reviewed Jun 21, 2023

View reviewed changes

torch/_functorch/partitioners.py Show resolved Hide resolved

wconstab reviewed Jun 21, 2023

View reviewed changes

wanchaol approved these changes Jun 21, 2023

View reviewed changes

Update on "[dynamo] FSDP + AC + torch.compile"

f2aa88d

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

anijain2305 added a commit that referenced this pull request Jun 22, 2023

[dynamo] FSDP + AC + torch.compile

d206af1

ghstack-source-id: eae969f Pull Request resolved: #103953

Update on "[dynamo] FSDP + AC + torch.compile"

1ab108b

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

anijain2305 added a commit that referenced this pull request Jun 22, 2023

[dynamo] FSDP + AC + torch.compile

c715386

ghstack-source-id: c9b33ee Pull Request resolved: #103953

ezyang reviewed Jun 22, 2023

View reviewed changes

Update on "[dynamo] FSDP + AC + torch.compile"

5ae722b

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 23, 2023

pytorchmergebot added the merging label Jun 23, 2023

pytorchmergebot removed the merging label Jun 23, 2023

Update on "[dynamo] FSDP + AC + torch.compile"

37ce036

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

anijain2305 added a commit that referenced this pull request Jun 23, 2023

[dynamo] FSDP + AC + torch.compile

705ba5b

ghstack-source-id: 4d2437d Pull Request resolved: #103953

anijain2305 added the topic: not user facing topic category label Jun 23, 2023

Update on "[dynamo] FSDP + AC + torch.compile"

7c3c297

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 [ghstack-poisoned]

anijain2305 added a commit that referenced this pull request Jun 23, 2023

[dynamo] FSDP + AC + torch.compile

b3f041d

ghstack-source-id: f2716de Pull Request resolved: #103953

pytorchmergebot added the merging label Jun 23, 2023

pytorchmergebot added Merged and removed merging labels Jun 24, 2023

pytorchmergebot closed this in 75dab58 Jun 24, 2023

facebook-github-bot deleted the gh/anijain2305/65/head branch June 27, 2023 14:16

[dynamo] FSDP + AC + torch.compile #103953

[dynamo] FSDP + AC + torch.compile #103953

Uh oh!

Conversation

anijain2305 commented Jun 21, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103953

✅ No Failures

Uh oh!

wconstab Jun 21, 2023

Choose a reason for hiding this comment

Uh oh!

anijain2305 Jun 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wconstab Jun 21, 2023

Choose a reason for hiding this comment

Uh oh!

wconstab Jun 21, 2023

Choose a reason for hiding this comment

Uh oh!

anijain2305 Jun 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

wanchaol Jun 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ezyang Jun 22, 2023

Choose a reason for hiding this comment

Uh oh!

anijain2305 commented Jun 23, 2023

Uh oh!

pytorchmergebot commented Jun 23, 2023

Merge failed

Uh oh!

anijain2305 commented Jun 23, 2023

Uh oh!

pytorchmergebot commented Jun 23, 2023

Merge started

Uh oh!

voznesenskym commented Jun 23, 2023

Uh oh!

anijain2305 commented Jun 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

anijain2305 commented Jun 21, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 21, 2023 •

edited

Loading