[aotd] Support mutations of the same input in fw and bw #155354

IvanKobzarev · 2025-06-06T18:36:13Z

Stack from ghstack (oldest at bottom):

-> [aotd] Support mutations of the same input in fw and bw #155354

Original issue: #154820

The issue happens when there is a mutation for the same input in forward AND in backward.

AOTD emited copy_ after joint_function tracing. This made this fx-node to correspond to the side effects of both mutations (in forward and in backward).
After that partitioner can put it either in forward or in backward.

The fix:

1/ Introduce joint_function.handle that allows to set "post_forward" callback, to be able to check inputs state after forward

We do not want to apply the mutation after joint, if we already applied it in forward. For that we need "mutation_counter" and memorize the version of mutation that we applied for forward mutation.

2/ Exposing mutation_counter to python

We want to keep invariant that copy_ exist only in the end of joint graph.

3/ We memorize mutation_counter and state of the inputs after forward, using the handle post_forward.
Emit post_forward mutations after joint graph fully traced.

add for post_forward mutations "must_be_in_forward" tag (similar to existing "must_be_in_backward") to keep them in forward.

4/ Ban recompute of the source of mutation. Recompute can apply the same op (e.g. add) in forward and backward.
For this set MUST_SAVE for the source of mutation in forward.

proxy_tensor changes:

By default proxy tensor updates tensor_tracker. In this case applied mutations will be chained.
But we want that this copy_ will be independent and applied just to primals.
For this introducing a contextmanager to be able to disable update of tensor_tracker for adding forward mutations.

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @chenyang78 @kadeng @chauhang @amjames

[ghstack-poisoned]

pytorch-bot · 2025-06-06T18:36:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155354

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit a60f113 with merge base bbf1a6f ():
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳ inductor-rocm / rocm-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2, unstable) (gh) (#155917)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/fx/experimental/proxy_tensor.py

Original issue: #154820 The issue happens when there is a mutation for the same input in forward AND in backward. AOTD emited copy_ after joint_function tracing. This made this fx-node to correspond to the side effects of both mutations (in forward and in backward). After that partitioner can put it either in forward or in backward. The fix: 1/ Introduce joint_function.handle that allows to set "post_forward" callback, to be able to check inputs state after forward We do not want to apply the mutation after joint, if we already applied it in forward. For that we need "mutation_counter" and memorize the version of mutation that we applied for forward mutation. 2/ Exposing mutation_counter to python We want to keep invariant that copy_ exist only in the end of joint graph. 3/ We memorize mutation_counter and state of the inputs after forward, using the handle post_forward. Emit post_forward mutations after joint graph fully traced. add for post_forward mutations "must_be_in_forward" tag (similar to existing "must_be_in_backward") to keep them in forward. 4/ Ban recompute of the source of mutation. Recompute can apply the same op (e.g. add) in forward and backward. For this set MUST_SAVE for the source of mutation in forward. proxy_tensor changes: By default proxy tensor updates tensor_tracker. In this case applied mutations will be chained. But we want that this copy_ will be independent and applied just to primals. For this introducing a contextmanager to be able to disable update of tensor_tracker for adding forward mutations. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

ghstack-source-id: 95a4053 Pull Request resolved: #155354

Original issue: #154820 The issue happens when there is a mutation for the same input in forward AND in backward. AOTD emited copy_ after joint_function tracing. This made this fx-node to correspond to the side effects of both mutations (in forward and in backward). After that partitioner can put it either in forward or in backward. The fix: 1/ Introduce joint_function.handle that allows to set "post_forward" callback, to be able to check inputs state after forward We do not want to apply the mutation after joint, if we already applied it in forward. For that we need "mutation_counter" and memorize the version of mutation that we applied for forward mutation. 2/ Exposing mutation_counter to python We want to keep invariant that copy_ exist only in the end of joint graph. 3/ We memorize mutation_counter and state of the inputs after forward, using the handle post_forward. Emit post_forward mutations after joint graph fully traced. add for post_forward mutations "must_be_in_forward" tag (similar to existing "must_be_in_backward") to keep them in forward. 4/ Ban recompute of the source of mutation. Recompute can apply the same op (e.g. add) in forward and backward. For this set MUST_SAVE for the source of mutation in forward. proxy_tensor changes: By default proxy tensor updates tensor_tracker. In this case applied mutations will be chained. But we want that this copy_ will be independent and applied just to primals. For this introducing a contextmanager to be able to disable update of tensor_tracker for adding forward mutations. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

ghstack-source-id: 9fb4349 Pull Request resolved: #155354

Original issue: #154820 The issue happens when there is a mutation for the same input in forward AND in backward. AOTD emited copy_ after joint_function tracing. This made this fx-node to correspond to the side effects of both mutations (in forward and in backward). After that partitioner can put it either in forward or in backward. The fix: 1/ Introduce joint_function.handle that allows to set "post_forward" callback, to be able to check inputs state after forward We do not want to apply the mutation after joint, if we already applied it in forward. For that we need "mutation_counter" and memorize the version of mutation that we applied for forward mutation. 2/ Exposing mutation_counter to python We want to keep invariant that copy_ exist only in the end of joint graph. 3/ We memorize mutation_counter and state of the inputs after forward, using the handle post_forward. Emit post_forward mutations after joint graph fully traced. add for post_forward mutations "must_be_in_forward" tag (similar to existing "must_be_in_backward") to keep them in forward. 4/ Ban recompute of the source of mutation. Recompute can apply the same op (e.g. add) in forward and backward. For this set MUST_SAVE for the source of mutation in forward. proxy_tensor changes: By default proxy tensor updates tensor_tracker. In this case applied mutations will be chained. But we want that this copy_ will be independent and applied just to primals. For this introducing a contextmanager to be able to disable update of tensor_tracker for adding forward mutations. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]

ghstack-source-id: 73ebf15 Pull Request resolved: #155354

malfet · 2025-06-24T04:40:31Z

@pytorchbot revert -m "Not sure why CI was green, but it breaks tons of tests, see https://hud.pytorch.org/hud/pytorch/pytorch/930b575389f9233efddf70ea7b7804ed06af80d5/1?per_page=50&mergeEphemeralLF=true" -c nosignal

pytorchmergebot · 2025-06-24T04:42:03Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

)" This reverts commit 3f920f3. Reverted #155354 on behalf of https://github.com/malfet due to Not sure why CI was green, but it breaks tons of tests, see https://hud.pytorch.org/hud/pytorch/pytorch/930b575389f9233efddf70ea7b7804ed06af80d5/1?per_page=50&mergeEphemeralLF=true ([comment](#155354 (comment)))

pytorchmergebot · 2025-06-24T04:42:18Z

@IvanKobzarev your PR has been successfully reverted.

Original issue: #154820 The issue happens when there is a mutation for the same input in forward AND in backward. AOTD emited copy_ after joint_function tracing. This made this fx-node to correspond to the side effects of both mutations (in forward and in backward). After that partitioner can put it either in forward or in backward. The fix: 1/ Introduce joint_function.handle that allows to set "post_forward" callback, to be able to check inputs state after forward We do not want to apply the mutation after joint, if we already applied it in forward. For that we need "mutation_counter" and memorize the version of mutation that we applied for forward mutation. 2/ Exposing mutation_counter to python We want to keep invariant that copy_ exist only in the end of joint graph. 3/ We memorize mutation_counter and state of the inputs after forward, using the handle post_forward. Emit post_forward mutations after joint graph fully traced. add for post_forward mutations "must_be_in_forward" tag (similar to existing "must_be_in_backward") to keep them in forward. 4/ Ban recompute of the source of mutation. Recompute can apply the same op (e.g. add) in forward and backward. For this set MUST_SAVE for the source of mutation in forward. proxy_tensor changes: By default proxy tensor updates tensor_tracker. In this case applied mutations will be chained. But we want that this copy_ will be independent and applied just to primals. For this introducing a contextmanager to be able to disable update of tensor_tracker for adding forward mutations. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

ghstack-source-id: 4db1990 Pull Request resolved: #155354

IvanKobzarev · 2025-06-25T19:18:13Z

@pytorchbot merge

pytorchmergebot · 2025-06-25T19:20:00Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-06-25T19:25:37Z

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 8a39bd269e67e74d9d8b3c6ce73c7b41442f0747 returned non-zero exit code 1

Auto-merging benchmarks/dynamo/pr_time_benchmarks/expected_results.csv
CONFLICT (content): Merge conflict in benchmarks/dynamo/pr_time_benchmarks/expected_results.csv
Auto-merging torch/_functorch/_aot_autograd/dispatch_and_compile_graph.py
Auto-merging torch/_functorch/_aot_autograd/traced_function_transforms.py
CONFLICT (content): Merge conflict in torch/_functorch/_aot_autograd/traced_function_transforms.py
Auto-merging torch/_functorch/partitioners.py
error: could not apply 8a39bd269e6... [aotd] Support mutations of the same input in fw and bw
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"

Details for Dev Infra team

Raised by workflow job

Original issue: #154820 The issue happens when there is a mutation for the same input in forward AND in backward. AOTD emited copy_ after joint_function tracing. This made this fx-node to correspond to the side effects of both mutations (in forward and in backward). After that partitioner can put it either in forward or in backward. The fix: 1/ Introduce joint_function.handle that allows to set "post_forward" callback, to be able to check inputs state after forward We do not want to apply the mutation after joint, if we already applied it in forward. For that we need "mutation_counter" and memorize the version of mutation that we applied for forward mutation. 2/ Exposing mutation_counter to python We want to keep invariant that copy_ exist only in the end of joint graph. 3/ We memorize mutation_counter and state of the inputs after forward, using the handle post_forward. Emit post_forward mutations after joint graph fully traced. add for post_forward mutations "must_be_in_forward" tag (similar to existing "must_be_in_backward") to keep them in forward. 4/ Ban recompute of the source of mutation. Recompute can apply the same op (e.g. add) in forward and backward. For this set MUST_SAVE for the source of mutation in forward. proxy_tensor changes: By default proxy tensor updates tensor_tracker. In this case applied mutations will be chained. But we want that this copy_ will be independent and applied just to primals. For this introducing a contextmanager to be able to disable update of tensor_tracker for adding forward mutations. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

ghstack-source-id: 485f601 Pull Request resolved: #155354

Original issue: #154820 The issue happens when there is a mutation for the same input in forward AND in backward. AOTD emited copy_ after joint_function tracing. This made this fx-node to correspond to the side effects of both mutations (in forward and in backward). After that partitioner can put it either in forward or in backward. The fix: 1/ Introduce joint_function.handle that allows to set "post_forward" callback, to be able to check inputs state after forward We do not want to apply the mutation after joint, if we already applied it in forward. For that we need "mutation_counter" and memorize the version of mutation that we applied for forward mutation. 2/ Exposing mutation_counter to python We want to keep invariant that copy_ exist only in the end of joint graph. 3/ We memorize mutation_counter and state of the inputs after forward, using the handle post_forward. Emit post_forward mutations after joint graph fully traced. add for post_forward mutations "must_be_in_forward" tag (similar to existing "must_be_in_backward") to keep them in forward. 4/ Ban recompute of the source of mutation. Recompute can apply the same op (e.g. add) in forward and backward. For this set MUST_SAVE for the source of mutation in forward. proxy_tensor changes: By default proxy tensor updates tensor_tracker. In this case applied mutations will be chained. But we want that this copy_ will be independent and applied just to primals. For this introducing a contextmanager to be able to disable update of tensor_tracker for adding forward mutations. cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]

ghstack-source-id: 9d633c6 Pull Request resolved: #155354

IvanKobzarev · 2025-06-26T13:58:29Z

@pytorchbot merge

pytorchmergebot · 2025-06-26T14:00:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This PR adds a new config `backward_pass_autocast`, to set the backward autocast behavior. It does not change the existing behavior. The reason why we need this is that torch.compile acquires a forward and backward graph at the time of the forward pass. This means that implemented naively, if there are any context managers active outside the call to torch.compile, the backward graph will also get the behaviors from those context managers. This PR gives users a way to tweak the autocast behavior of the backward pass. Please see torch._functorch.config for the options to the `backward_pass_autocast` config. Pull Request resolved: #156356 Approved by: https://github.com/bdhirsh ghstack dependencies: #155354

[aotd] Support mutations of the same input in fw and bw

997fec9

[ghstack-poisoned]

IvanKobzarev requested review from Chillee, albanD, bdhirsh, ezyang and soulitzer as code owners June 6, 2025 18:36

IvanKobzarev mentioned this pull request Jun 6, 2025

[aotd] Support mutations in reordering_to_mimic_autograd_engine #155353

Closed

pytorch-bot bot added ciflow/inductor release notes: fx release notes category labels Jun 6, 2025

facebook-github-bot added the fx label Jun 6, 2025

IvanKobzarev requested a review from zou3519 June 6, 2025 18:58

IvanKobzarev added the topic: not user facing topic category label Jun 6, 2025

bdhirsh reviewed Jun 6, 2025

View reviewed changes

torch/fx/experimental/proxy_tensor.py Show resolved Hide resolved

IvanKobzarev added a commit that referenced this pull request Jun 6, 2025

[aotd] Support mutations of the same input in fw and bw

6786980

ghstack-source-id: 95a4053 Pull Request resolved: #155354

IvanKobzarev added a commit that referenced this pull request Jun 9, 2025

[aotd] Support mutations of the same input in fw and bw

576c0cc

ghstack-source-id: 9fb4349 Pull Request resolved: #155354

albanD removed their request for review June 9, 2025 15:18

IvanKobzarev added a commit that referenced this pull request Jun 10, 2025

[aotd] Support mutations of the same input in fw and bw

ff6d1c4

ghstack-source-id: 73ebf15 Pull Request resolved: #155354

This was referenced Jun 11, 2025

In-place operations are reordered across the forward-backward in autograd function #154820

Closed

[2/2] proxy_tensor do not clobber for mutating ops #155716

Closed

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Jun 24, 2025

pytorchmergebot reopened this Jun 24, 2025

IvanKobzarev added a commit that referenced this pull request Jun 24, 2025

[aotd] Support mutations of the same input in fw and bw

8a39bd2

ghstack-source-id: 4db1990 Pull Request resolved: #155354

pytorchmergebot added the merging label Jun 25, 2025

pytorchmergebot removed the merging label Jun 25, 2025

IvanKobzarev added a commit that referenced this pull request Jun 25, 2025

[aotd] Support mutations of the same input in fw and bw

ae7c587

ghstack-source-id: 485f601 Pull Request resolved: #155354

IvanKobzarev added a commit that referenced this pull request Jun 26, 2025

[aotd] Support mutations of the same input in fw and bw

03f4411

ghstack-source-id: 9d633c6 Pull Request resolved: #155354

pytorchmergebot added the merging label Jun 26, 2025

pytorchmergebot closed this in 2f94f69 Jun 26, 2025

pytorchmergebot removed the merging label Jun 26, 2025

zou3519 mentioned this pull request Jun 26, 2025

Add AOTDispatcher config to set backward autocast behavior #156356

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[aotd] Support mutations of the same input in fw and bw #155354

[aotd] Support mutations of the same input in fw and bw #155354

Uh oh!

IvanKobzarev commented Jun 6, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jun 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

malfet commented Jun 24, 2025

Uh oh!

pytorchmergebot commented Jun 24, 2025

Uh oh!

pytorchmergebot commented Jun 24, 2025

Uh oh!

IvanKobzarev commented Jun 25, 2025

Uh oh!

pytorchmergebot commented Jun 25, 2025

Uh oh!

pytorchmergebot commented Jun 25, 2025

Uh oh!

IvanKobzarev commented Jun 26, 2025

Uh oh!

pytorchmergebot commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[aotd] Support mutations of the same input in fw and bw #155354

[aotd] Support mutations of the same input in fw and bw #155354

Uh oh!

Conversation

IvanKobzarev commented Jun 6, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155354

⏳ No Failures, 1 Pending

Uh oh!

Uh oh!

malfet commented Jun 24, 2025

Uh oh!

pytorchmergebot commented Jun 24, 2025

Uh oh!

pytorchmergebot commented Jun 24, 2025

Uh oh!

IvanKobzarev commented Jun 25, 2025

Uh oh!

pytorchmergebot commented Jun 25, 2025

Merge started

Uh oh!

pytorchmergebot commented Jun 25, 2025

Merge failed

Uh oh!

IvanKobzarev commented Jun 26, 2025

Uh oh!

pytorchmergebot commented Jun 26, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

IvanKobzarev commented Jun 6, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 6, 2025 •

edited

Loading