Refactor optional graph module into CompiledFxGraphConstants #141897

jamesjwu · 2024-12-02T19:13:02Z

Stack from ghstack (oldest at bottom):

-> Refactor optional graph module into CompiledFxGraphConstants #141897

FXGraphCache supports freezing, but AOTAutogradCache does not. This is due to the fact that when freezing is turned on, instead of using the constants from the graph module that was saved on cache miss, we have to take the constants from the AOTAutograd generated graph module. This PR does two things:

It bypasses AOTAutogradCache when freezing is turned on. We should have always been doing this.
It refactors the code to be way more clear about the constants we're using and when we're using them.

Basically, there are two possible sets of constants we can grab from the compiled fx graph.

If freezing is turned off, we save the constants directly in CompiledFxGraph.
If freezing is turned on, we save the names of the constants in CompiledFxGraph, and use the runtime GraphModule's actual constant values: we reconstruct them from the saved names + the new graph module from AOTDispatch.

We implement two different classes for doing just this: one that has access to the post aotdispatch gm, which supports freezing, and one that doesn't have it, which does not support freezing. Then we construct the wrappers and unwrap the result as needed.

This makes it clear that the gm passed to AOTAutogradCache is not part of post compile, only the cache key generated from it is.

The whole flow is pretty confusing, but hopefully this gives us better types and static information for understanding what the different codepaths are doing.

Will add a specific AOTAutogradCache to confirm we bypass freezing.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-12-02T19:13:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141897

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit a6f565d with merge base 920e436 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (similar failure)
##[error]Process completed with exit code 1.

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge) (gh) (trunk failure)
##[error]Process completed with exit code 128.

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141703)
convnext_base
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141498)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jamesjwu · 2024-12-02T19:39:52Z

test/functorch/test_aotdispatch.py

            gm = make_boxed_func(gm)
        return gm, {}

-    def post_compile(self, gm, inputs, cudagraphs):


This is no longer called after ed's previous refactor

[ghstack-poisoned]

ezyang · 2024-12-02T21:30:05Z

torch/_inductor/output_code.py

    compiled_graph: CompiledFxGraph,
    cudagraphs: BoxedBool,
-    gm: Optional[torch.fx.GraphModule],
+    constants: Dict[str, torch.Tensor],


this is good

ezyang · 2024-12-02T21:31:27Z

torch/_inductor/output_code.py



+class CompiledFxGraphConstants:
+    """Wrapper class that gets constants from a compiled fx graph"""


It would be nice if this parent class explained the subclass inheritance situation and when this one versus the other got used

ezyang · 2024-12-02T21:32:18Z

test/functorch/test_aotdispatch.py



+@dataclasses.dataclass
+class MockFXGraphCacheOutput(OutputCode):


So what exactly is this

oh this is a one off mock ig

I think I'd prefer for this to live in _inductor/output_code.py, as the interface for OutputCode is not settled and likely will change some more as we keep working on it.

ezyang · 2024-12-02T21:34:35Z

torch/_inductor/compile_fx.py

+                    config.patch(get_cpp_wrapper_config())
+                    if config.cpp_wrapper
+                    else contextlib.nullcontext()
+                ):


all this reformatting very annoying lol

ezyang

None of the comments are blocking, feel free to do them separately

masnesral · 2024-12-02T19:46:49Z

torch/_functorch/_aot_autograd/autograd_cache.py

        raise BypassAOTAutogradCache(
            "Cannot cache a graph with compiled autograd enabled"
        )
+    if torch._inductor.config.freezing:


In the original freezing diff, I checked to see if the gm actually had an frozen params created. Maybe that's a little better? I believe that when the config option is set, freezing is applied unconditionally currently, but maybe there's a future where it's not?

pytorch/torch/_inductor/output_code.py

Line 96 in fe68f61

def has_frozen_params(gm: torch.fx.GraphModule) -> bool:

Thing is, we can't really find that out unless we run AOTAutograd. So at this stage, the best we can do is look at the config.

masnesral · 2024-12-02T21:14:27Z

torch/_functorch/_aot_autograd/autograd_cache.py

        )

        # TODO: How come cudagraphs could be None here?
        # TODO: How come gm is None here?


Can we remove these TODOs now?

[ghstack-poisoned]

ghstack-source-id: 1724c4e Pull Request resolved: #141897

[ghstack-poisoned]

ghstack-source-id: a23e8d0 Pull Request resolved: #141897

[ghstack-poisoned]

ghstack-source-id: 5a9fcf0 Pull Request resolved: #141897

jamesjwu · 2024-12-05T00:26:06Z

@pytorchbot merge

pytorchmergebot · 2024-12-05T00:28:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…#141897) FXGraphCache supports freezing, but AOTAutogradCache does not. This is due to the fact that when freezing is turned on, instead of using the constants from the graph module that was saved on cache miss, we have to take the constants from the AOTAutograd generated graph module. This PR does two things: - It bypasses AOTAutogradCache when freezing is turned on. We should have always been doing this. - It refactors the code to be way more clear about the constants we're using and when we're using them. Basically, there are two possible sets of constants we can grab from the compiled fx graph. 1. If freezing is turned off, we save the constants directly in CompiledFxGraph. 2. If freezing is turned on, we save the *names* of the constants in CompiledFxGraph, and use the runtime GraphModule's actual constant values: we reconstruct them from the saved names + the new graph module from AOTDispatch. We implement two different classes for doing just this: one that has access to the post aotdispatch gm, which supports freezing, and one that doesn't have it, which does not support freezing. Then we construct the wrappers and unwrap the result as needed. This makes it clear that the gm passed to AOTAutogradCache is *not* part of post compile, only the cache key generated from it is. The whole flow is pretty confusing, but hopefully this gives us better types and static information for understanding what the different codepaths are doing. Will add a specific AOTAutogradCache to confirm we bypass freezing. Pull Request resolved: pytorch#141897 Approved by: https://github.com/ezyang, https://github.com/masnesral

ghstack-source-id: c34bf8a Pull Request resolved: pytorch/pytorch#141897

leslie-fang-intel · 2024-12-19T02:22:04Z

Hi @jamesjwu @masnesral, Thanks for your PR. We recently meet a issue #143144 which seems related to this PR (or some related changes before this PR).

If freezing is turned on, we save the names of the constants in CompiledFxGraph, and use the runtime GraphModule's actual constant values: we reconstruct them from the saved names + the new graph module from AOTDispatch.

This seems not a correct assumption. As a example, in the Inductor lowering phase, we may re-layout some constants since a different kernel might be chosen by max-autotune as in:

pytorch/torch/_inductor/codegen/cpp_gemm_template.py

Line 927 in 14fe1f7

W_packed_constant = V.graph.add_tensor_constant(W_packed)

In this case, we will add new constant in the CompiledFXGraph but it may not in the GraphModule (we will also delete the original constant which is not used now to save memory). Looking forward to your suggestions for how to resolve this issue. cc @frost-intel @jgong5

Update

9908ebe

[ghstack-poisoned]

jamesjwu mentioned this pull request Dec 2, 2024

[REFACTOR] Inline FxGraphCache.post_compile into sole call site #141877

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Dec 2, 2024

jamesjwu marked this pull request as ready for review December 2, 2024 19:18

jamesjwu requested review from Chillee, bdhirsh and ezyang as code owners December 2, 2024 19:18

jamesjwu added the topic: not user facing topic category label Dec 2, 2024

jamesjwu requested review from eellison and masnesral December 2, 2024 19:19

jamesjwu commented Dec 2, 2024

View reviewed changes

Update

3198938

[ghstack-poisoned]

Fix MockFXGraphCache

18ca8cb

[ghstack-poisoned]

Update

f1ab894

[ghstack-poisoned]

Update

edd4eb6

[ghstack-poisoned]

pytorch-bot bot added the module: dynamo label Dec 2, 2024

ezyang reviewed Dec 2, 2024

View reviewed changes

jamesjwu mentioned this pull request Dec 2, 2024

[nocommit] Enable autograd cache with view_replay_for_aliased_outputs #141913

Closed

ezyang reviewed Dec 2, 2024

View reviewed changes

ezyang approved these changes Dec 2, 2024

View reviewed changes

masnesral approved these changes Dec 2, 2024

View reviewed changes

jamesjwu added 7 commits December 2, 2024 13:59

Update

bafa921

[ghstack-poisoned]

Update

2570c7e

[ghstack-poisoned]

Update

0d15c34

[ghstack-poisoned]

Update

97cae08

[ghstack-poisoned]

Update

93269c2

[ghstack-poisoned]

Update

bce5d93

[ghstack-poisoned]

Update

623cc91

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Dec 2, 2024

Refactor optional graph module into CompiledFxGraphConstants

bcfe533

ghstack-source-id: 1724c4e Pull Request resolved: #141897

Fix bugs and rebase for new test

9493369

[ghstack-poisoned]

jamesjwu added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 3, 2024

Update

d056c7a

[ghstack-poisoned]

eellison removed their request for review December 3, 2024 23:06

Update

19cbf64

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Dec 4, 2024

Refactor optional graph module into CompiledFxGraphConstants

1b5ea36

ghstack-source-id: a23e8d0 Pull Request resolved: #141897

Update

a6f565d

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Dec 4, 2024

Refactor optional graph module into CompiledFxGraphConstants

9656845

ghstack-source-id: 5a9fcf0 Pull Request resolved: #141897

pytorchmergebot added the merging label Dec 5, 2024

pytorchmergebot added the Merged label Dec 5, 2024

pytorchmergebot closed this in 60a1920 Dec 5, 2024

pytorchmergebot removed the merging label Dec 5, 2024

leslie-fang-intel mentioned this pull request Dec 13, 2024

[inductor][cpu] With inductor_max_autotune, constants missing from frozen FxGraph. #143144

Closed

Esquains pushed a commit to Esquains/study1 that referenced this pull request Dec 15, 2024

Refactor optional graph module into CompiledFxGraphConstants

bcb084c

ghstack-source-id: c34bf8a Pull Request resolved: pytorch/pytorch#141897

github-actions bot deleted the gh/jamesjwu/83/head branch January 19, 2025 02:07



		class CompiledFxGraphConstants:
		"""Wrapper class that gets constants from a compiled fx graph"""



		@dataclasses.dataclass
		class MockFXGraphCacheOutput(OutputCode):

Refactor optional graph module into CompiledFxGraphConstants #141897

Refactor optional graph module into CompiledFxGraphConstants #141897

Uh oh!

Conversation

jamesjwu commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141897

✅ You can merge normally! (4 Unrelated Failures)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesjwu commented Dec 5, 2024

Uh oh!

pytorchmergebot commented Dec 5, 2024

Merge started

Uh oh!

leslie-fang-intel commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jamesjwu commented Dec 2, 2024 •

edited

Loading

pytorch-bot bot commented Dec 2, 2024 •

edited

Loading

leslie-fang-intel commented Dec 19, 2024 •

edited

Loading