preserve module signature with multiple calls #137999

avikchaudhuri · 2024-10-15T17:01:51Z

Previously we would error when trying to preserve the call signature for a module when it was called multiple times. This PR can now do this without erroring. The fix is to propagate call indices in a few more places.

Note that while this works in the presence of params, buffers, and tensor constants, preserving call signatures for multiple calls to a module when buffers are mutated is not supported yet. This is future work. The main problem is that we do not have enough metadata to copy_ mutated buffers at the end of each call to a module, so the next call can read those buffers at the beginning. Making this work will likely need some explicit tracking of intermediate values of mutated buffers when collecting metadata during functionalization in export.

Note also that we stop short of creating a single graph out of multiple graphs: that is still future work. So the unflattened module will still have different targets n, n@1, n@2, etc. for each call when we ask the module call signature of n to be preserved. However it is way easier to swap all of these targets with a replacement that behaves similar to the original, because all of these calls will respect the original module call signature. (In particular, any constant inputs will be carried by the calls.)

Differential Revision: D64406945

pytorch-bot · 2024-10-15T17:01:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137999

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 2755f65 with merge base 7a117f3 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal) (gh) (similar failure)
RuntimeError: Task <Task pending name='Task-1' coro=<FbScribeLogger._worker() running at /opt/conda/envs/py_3.9/lib/python3.9/site-packages/fbscribelogger/__init__.py:214> cb=[_run_until_complete_cb() at /opt/conda/envs/py_3.9/lib/python3.9/asyncio/base_events.py:184]> got Future <Future pending> attached to a different loop

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100) (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-10-15T17:02:09Z

This pull request was exported from Phabricator. Differential Revision: D64406945

facebook-github-bot · 2024-10-15T19:57:37Z

This pull request was exported from Phabricator. Differential Revision: D64406945

facebook-github-bot · 2024-10-15T20:00:26Z

This pull request was exported from Phabricator. Differential Revision: D64406945

Summary: Pull Request resolved: pytorch#137999 Test Plan: fixed tests Differential Revision: D64406945

avikchaudhuri · 2024-10-15T20:03:44Z

torch/export/unflatten.py

        self.parent_call_module: Optional[torch.fx.Node] = None
        if parent is not None:
-            if self.fqn in module_call_graph and num_calls == 1:
-                raise ValueError(


main thing that changed

avikchaudhuri · 2024-10-15T20:04:13Z

test/export/test_export.py

+        if not is_retracebility_test(self._testMethodName):
+            test(
+                export(M(), inp, preserve_module_call_signature=("n",)),
+                swap={"n": N(), "n@1": N()},


tugsbayasgalan · 2024-10-16T18:43:41Z

test/export/test_export.py

-                "Cannot unflatten multiple calls to module n while preserving its signature",
-            ):
-                torch.export.unflatten(ep)
+        def test(ep, swap):


Can you move this test to test_unfatten.py? I want to make sure it works with training IR as well. This here tests training IR + run_decomp so it is still operating on a functional IR.

I don't like test_unflatten.py because in general there are more variations tested here. But good point that I should test this with state as well. I'll hold off landing until I do that / add them in.

For training IR only I could do a separate explicit test?

torch/export/_trace.py

torch/export/unflatten.py

facebook-github-bot · 2024-10-17T00:16:42Z

This pull request was exported from Phabricator. Differential Revision: D64406945

facebook-github-bot · 2024-10-17T00:19:15Z

This pull request was exported from Phabricator. Differential Revision: D64406945

Summary: Pull Request resolved: pytorch#137999 Test Plan: fixed tests Reviewed By: tugsbayasgalan Differential Revision: D64406945

facebook-github-bot · 2024-10-17T19:49:53Z

This pull request was exported from Phabricator. Differential Revision: D64406945

Summary: Pull Request resolved: pytorch#137999 Test Plan: fixed tests Reviewed By: tugsbayasgalan Differential Revision: D64406945

facebook-github-bot · 2024-10-17T19:53:32Z

This pull request was exported from Phabricator. Differential Revision: D64406945

facebook-github-bot · 2024-10-18T07:04:31Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2024-10-18T07:06:26Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

avikchaudhuri · 2024-10-18T07:28:22Z

@pytorchbot merge -f "Landed internally"

pytorchmergebot · 2024-10-18T07:28:41Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2024-10-18T07:30:12Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

As called out in #137999, preserving signatures of multiple calls when buffer mutations are present was NYI. The main problem was that intermediate values of buffers were not tracked, so couldn't be propagated statefully between multiple calls (i.e., they would need to be explicitly passed around, defeating the unlifting needed for preserving signatures). This PR fixes this situation, by introducing module attributes that carry the necessary intermediate values of buffer mutations. In general, a buffer mutation can have several intermediate values it depends on recursively, even other buffers. So rather than tying an intermediate value with a particular buffer, we tie it with the submodules that create and read it. We install an attribute on all modules that create or read a particular intermediate value, sharing the same initial storage (i.e., initialized with the same empty tensor). For the module that creates this intermediate value, we copy the value into the corresponding attribute; and for the modules that read it, we read the corresponding attribute instead. Another complication that needed to be addressed was that a `run_decompositions` following an `export_for_training` was not preserving module call graphs, which is needed for unflattening and, in particular, used when remapping inputs. Fortunately some existing metadata already tracks provenance of nodes, which we could use to update a module call graph after functionalization / decomposition. Differential Revision: D64806175 Pull Request resolved: #138669 Approved by: https://github.com/tugsbayasgalan

avikchaudhuri requested review from angelayi, tugsbayasgalan, ydwu4 and zhxchen17 as code owners October 15, 2024 17:01

pytorch-bot bot added ciflow/inductor release notes: export labels Oct 15, 2024

facebook-github-bot added the fb-exported label Oct 15, 2024

avikchaudhuri changed the title ~~preserve module signature multiple calls~~ preserve module signature with multiple calls Oct 15, 2024

avikchaudhuri force-pushed the export-D64406945 branch from 7de7714 to c28aa8b Compare October 15, 2024 19:57

avikchaudhuri force-pushed the export-D64406945 branch from c28aa8b to b66ffb0 Compare October 15, 2024 20:00

avikchaudhuri added a commit to avikchaudhuri/pytorch that referenced this pull request Oct 15, 2024

preserve module signature multiple calls (pytorch#137999)

b66ffb0

Summary: Pull Request resolved: pytorch#137999 Test Plan: fixed tests Differential Revision: D64406945

avikchaudhuri commented Oct 15, 2024

View reviewed changes

tugsbayasgalan approved these changes Oct 16, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 16, 2024

avikchaudhuri force-pushed the export-D64406945 branch from b66ffb0 to 2a61d91 Compare October 17, 2024 00:16

avikchaudhuri force-pushed the export-D64406945 branch 2 times, most recently from cf754c2 to 9adf5c0 Compare October 17, 2024 19:49

preserve module signature multiple calls (pytorch#137999)

2755f65

Summary: Pull Request resolved: pytorch#137999 Test Plan: fixed tests Reviewed By: tugsbayasgalan Differential Revision: D64406945

avikchaudhuri force-pushed the export-D64406945 branch from 9adf5c0 to 2755f65 Compare October 17, 2024 19:53

pytorchmergebot added the merging label Oct 18, 2024

pytorchmergebot closed this in 5d01126 Oct 18, 2024

pytorchmergebot added Merged and removed merging labels Oct 18, 2024

avikchaudhuri mentioned this pull request Oct 23, 2024

preserve signatures with multiple calls + buffer mutations #138669

Closed

preserve module signature with multiple calls #137999

preserve module signature with multiple calls #137999

Uh oh!

Conversation

avikchaudhuri commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137999

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

facebook-github-bot commented Oct 15, 2024

Uh oh!

facebook-github-bot commented Oct 15, 2024

Uh oh!

facebook-github-bot commented Oct 15, 2024

Uh oh!

avikchaudhuri Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Oct 17, 2024

Uh oh!

facebook-github-bot commented Oct 17, 2024

Uh oh!

facebook-github-bot commented Oct 17, 2024

Uh oh!

facebook-github-bot commented Oct 17, 2024

Uh oh!

facebook-github-bot commented Oct 18, 2024

Uh oh!

pytorchmergebot commented Oct 18, 2024

Merge started

Uh oh!

avikchaudhuri commented Oct 18, 2024

Uh oh!

pytorchmergebot commented Oct 18, 2024

Uh oh!

pytorchmergebot commented Oct 18, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

avikchaudhuri commented Oct 15, 2024 •

edited

Loading

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading

avikchaudhuri Oct 15, 2024 •

edited

Loading