preserve signatures with multiple calls + buffer mutations #138669

avikchaudhuri · 2024-10-23T02:23:57Z

As called out in #137999, preserving signatures of multiple calls when buffer mutations are present was NYI. The main problem was that intermediate values of buffers were not tracked, so couldn't be propagated statefully between multiple calls (i.e., they would need to be explicitly passed around, defeating the unlifting needed for preserving signatures).

This PR fixes this situation, by introducing module attributes that carry the necessary intermediate values of buffer mutations. In general, a buffer mutation can have several intermediate values it depends on recursively, even other buffers. So rather than tying an intermediate value with a particular buffer, we tie it with the submodules that create and read it. We install an attribute on all modules that create or read a particular intermediate value, sharing the same initial storage (i.e., initialized with the same empty tensor). For the module that creates this intermediate value, we copy the value into the corresponding attribute; and for the modules that read it, we read the corresponding attribute instead.

Another complication that needed to be addressed was that a run_decompositions following an export_for_training was not preserving module call graphs, which is needed for unflattening and, in particular, used when remapping inputs. Fortunately some existing metadata already tracks provenance of nodes, which we could use to update a module call graph after functionalization / decomposition.

Differential Revision: D64806175

pytorch-bot · 2024-10-23T02:24:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138669

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fe87858 with merge base fe458ee ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-10-23T02:24:22Z

This pull request was exported from Phabricator. Differential Revision: D64806175

facebook-github-bot · 2024-10-23T15:46:30Z

This pull request was exported from Phabricator. Differential Revision: D64806175

facebook-github-bot · 2024-10-23T15:48:49Z

This pull request was exported from Phabricator. Differential Revision: D64806175

facebook-github-bot · 2024-10-23T19:37:56Z

This pull request was exported from Phabricator. Differential Revision: D64806175

facebook-github-bot · 2024-10-23T19:42:47Z

This pull request was exported from Phabricator. Differential Revision: D64806175

tugsbayasgalan · 2024-10-23T20:22:06Z

test/export/test_export.py

Can you also test unflatten on training_ir directly?

tugsbayasgalan · 2024-10-23T20:23:53Z

torch/export/exported_program.py

What does "from_node" mean actually? Shouldn't we also need to rewrite the node.meta after run_decomp to reflect the change in "from_node"?

"from_node" keeps a history of how a node was generated from tracing other nodes.

So after the first decomp, it will contain the original node from export; after the second decomp, it will contain the original node followed by the node in the first decomp; etc.

tugsbayasgalan · 2024-10-23T20:25:21Z

torch/export/unflatten.py

Hmm this won't work for training IR cause because we don't have this information...

And fixing this is probably complicated, what do you think about temporarily decomposing the training IR to figure out which buffers are mutated in the short term?

Note that this is only needed when buffer updates have been functionalized away. If the code contains direct buffer updates, then none of this is required. So it should be fine.

Hmm, I was wrong. It turns out that the output node, rather than the input node, of a mutation is what is threaded through the rest of the program, so every buffer mutation won't have the placeholder node corresponding to the buffer. E.g., add_ = buf.add_(1); add__1 = add_.add_(2).

I think temporarily decomposing the training IR is overkill. How can I detect mutating ops?

facebook-github-bot · 2024-10-24T08:46:05Z

This pull request was exported from Phabricator. Differential Revision: D64806175

facebook-github-bot · 2024-10-24T08:48:28Z

This pull request was exported from Phabricator. Differential Revision: D64806175

facebook-github-bot · 2024-10-24T16:26:00Z

This pull request was exported from Phabricator. Differential Revision: D64806175

…38669) Summary: Pull Request resolved: pytorch#138669 Test Plan: modified test Differential Revision: D64806175

facebook-github-bot · 2024-10-24T16:28:56Z

This pull request was exported from Phabricator. Differential Revision: D64806175

torch/export/unflatten.py

facebook-github-bot · 2024-10-24T23:39:33Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2024-10-24T23:41:19Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

avikchaudhuri · 2024-10-25T00:11:26Z

@pytorchbot merge -f "Landed internally"

pytorchmergebot · 2024-10-25T00:11:45Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2024-10-25T00:13:15Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

avikchaudhuri requested review from angelayi, tugsbayasgalan, ydwu4 and zhxchen17 as code owners October 23, 2024 02:23

pytorch-bot bot added the release notes: export label Oct 23, 2024

facebook-github-bot added the fb-exported label Oct 23, 2024

avikchaudhuri force-pushed the export-D64806175 branch from 0b4df53 to 5a7268d Compare October 23, 2024 15:46

avikchaudhuri force-pushed the export-D64806175 branch 2 times, most recently from 851d564 to 0ac21b6 Compare October 23, 2024 19:37

avikchaudhuri force-pushed the export-D64806175 branch from 0ac21b6 to 6747141 Compare October 23, 2024 19:42

tugsbayasgalan reviewed Oct 23, 2024

View reviewed changes

avikchaudhuri force-pushed the export-D64806175 branch from 6747141 to 969dfa9 Compare October 24, 2024 08:45

avikchaudhuri force-pushed the export-D64806175 branch 2 times, most recently from f071676 to 94caa3b Compare October 24, 2024 16:25

preserve signatures with multiple calls + buffer mutations (pytorch#1…

fe87858

…38669) Summary: Pull Request resolved: pytorch#138669 Test Plan: modified test Differential Revision: D64806175

avikchaudhuri force-pushed the export-D64806175 branch from 94caa3b to fe87858 Compare October 24, 2024 16:29

tugsbayasgalan approved these changes Oct 24, 2024

View reviewed changes

torch/export/unflatten.py Show resolved Hide resolved

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2024

pytorchmergebot added the merging label Oct 24, 2024

pytorchmergebot added the Merged label Oct 25, 2024

pytorchmergebot closed this in 1d98a52 Oct 25, 2024

pytorchmergebot removed the merging label Oct 25, 2024

preserve signatures with multiple calls + buffer mutations #138669

preserve signatures with multiple calls + buffer mutations #138669

Uh oh!

Conversation

avikchaudhuri commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138669

✅ No Failures

Uh oh!

facebook-github-bot commented Oct 23, 2024

Uh oh!

facebook-github-bot commented Oct 23, 2024

Uh oh!

facebook-github-bot commented Oct 23, 2024

Uh oh!

facebook-github-bot commented Oct 23, 2024

Uh oh!

facebook-github-bot commented Oct 23, 2024

Uh oh!

tugsbayasgalan Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

tugsbayasgalan Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri Oct 23, 2024

Choose a reason for hiding this comment

Uh oh!

avikchaudhuri Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 24, 2024

Uh oh!

facebook-github-bot commented Oct 24, 2024

Uh oh!

facebook-github-bot commented Oct 24, 2024

Uh oh!

facebook-github-bot commented Oct 24, 2024

Uh oh!

Uh oh!

facebook-github-bot commented Oct 24, 2024

Uh oh!

pytorchmergebot commented Oct 24, 2024

Merge started

Uh oh!

avikchaudhuri commented Oct 25, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

avikchaudhuri commented Oct 23, 2024 •

edited

Loading

pytorch-bot bot commented Oct 23, 2024 •

edited

Loading

tugsbayasgalan Oct 23, 2024 •

edited

Loading

avikchaudhuri Oct 23, 2024 •

edited

Loading