Add AOTDispatcher config to set backward autocast behavior #156356

zou3519 · 2025-06-18T20:07:43Z

Stack from ghstack (oldest at bottom):

This PR adds a new config backward_pass_autocast, to set the backward autocast
behavior. It does not change the existing behavior.

The reason why we need this is that torch.compile acquires a forward and
backward graph at the time of the forward pass. This means that
implemented naively, if there are any context managers active outside
the call to torch.compile, the backward graph will also get the
behaviors from those context managers. This PR gives users a way to
tweak the autocast behavior of the backward pass.

Please see torch._functorch.config for the options to the
backward_pass_autocast config.

[ghstack-poisoned]

ghstack-source-id: 41910f9 Pull Request resolved: #156356

pytorch-bot · 2025-06-18T20:07:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156356

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

VolumeLimitExceeded Issue for linux.2xlarge and linux.4xlarge

❌ 1 New Failure, 4 Unrelated Failures

As of commit 52064e1 with merge base b500753 ():

NEW FAILURE - The following job has failed:

inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_timm, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
RuntimeError: No CUDA GPUs are available

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
RuntimeError: No CUDA GPUs are available
inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
RuntimeError: No CUDA GPUs are available
inductor / unit-test / cuda12.8-py3.12-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
distributed/test_dynamo_distributed.py::TestFakeDistributedSingleProc::test_hf_bert_ddp_aot_eager

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
RuntimeError: No CUDA GPUs are available

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: 94d0561 Pull Request resolved: #156356

This PR adds a new config `backward_pass_autocast`, to set the backward autocast behavior. It does not change the existing behavior. The reason why we need this is that torch.compile acquires a forward and backward graph at the time of the forward pass. This means that implemented naively, if there are any context managers active outside the call to torch.compile, the backward graph will also get the behaviors from those context managers. This PR gives users a way to tweak the autocast behavior of the backward pass. Please see torch._functorch.config for the options to the `backward_pass_autocast` config. [ghstack-poisoned]

This PR adds a new config `backward_pass_autocast`, to set the backward autocast behavior. It does not change the existing behavior. The reason why we need this is that torch.compile acquires a forward and backward graph at the time of the forward pass. This means that implemented naively, if there are any context managers active outside the call to torch.compile, the backward graph will also get the behaviors from those context managers. This PR gives users a way to tweak the autocast behavior of the backward pass. Please see torch._functorch.config for the options to the `backward_pass_autocast` config. ghstack-source-id: 94252a0 Pull Request resolved: #156356

This PR adds a new config `backward_pass_autocast`, to set the backward autocast behavior. It does not change the existing behavior. The reason why we need this is that torch.compile acquires a forward and backward graph at the time of the forward pass. This means that implemented naively, if there are any context managers active outside the call to torch.compile, the backward graph will also get the behaviors from those context managers. This PR gives users a way to tweak the autocast behavior of the backward pass. Please see torch._functorch.config for the options to the `backward_pass_autocast` config. [ghstack-poisoned]

bdhirsh

sgtm, we can always add more to this later (the other big one in my head is that we generate a mega backward graph for the entire fw, so we lose the ability for the user to run several small, fine-grained autograd.grad() calls)

This PR adds a new config `backward_pass_autocast`, to set the backward autocast behavior. It does not change the existing behavior. The reason why we need this is that torch.compile acquires a forward and backward graph at the time of the forward pass. This means that implemented naively, if there are any context managers active outside the call to torch.compile, the backward graph will also get the behaviors from those context managers. This PR gives users a way to tweak the autocast behavior of the backward pass. Please see torch._functorch.config for the options to the `backward_pass_autocast` config. [ghstack-poisoned]

This PR adds a new config `backward_pass_autocast`, to set the backward autocast behavior. It does not change the existing behavior. The reason why we need this is that torch.compile acquires a forward and backward graph at the time of the forward pass. This means that implemented naively, if there are any context managers active outside the call to torch.compile, the backward graph will also get the behaviors from those context managers. This PR gives users a way to tweak the autocast behavior of the backward pass. Please see torch._functorch.config for the options to the `backward_pass_autocast` config. ghstack-source-id: f1c5b1d Pull Request resolved: #156356

zou3519 · 2025-06-27T14:56:57Z

@pytorchbot merge -f "unrelated failures"

pytorchmergebot · 2025-06-27T14:58:31Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[wip]

4e11992

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request Jun 18, 2025

[wip]

625893f

ghstack-source-id: 41910f9 Pull Request resolved: #156356

pytorch-bot bot added the ciflow/inductor label Jun 18, 2025

Update on "[wip]"

2603652

[ghstack-poisoned]

zou3519 added a commit that referenced this pull request Jun 23, 2025

[wip]

2ea0fd2

ghstack-source-id: 94d0561 Pull Request resolved: #156356

zou3519 changed the title ~~[wip]~~ Add AOTDispatcher config to set backward autocast behavior Jun 24, 2025

zou3519 mentioned this pull request Jun 24, 2025

Add ability to tell when a non-trivial autocast kernel has been triggered #156739

Closed

This was referenced Jun 25, 2025

Update AOTDispatcher autocast helper to be generic across devices. #156779

Closed

Add torch._functorch.config.on_backward_autocast_mismatch #156780

Closed

zou3519 added the release notes: aotdispatcher label Jun 26, 2025

zou3519 requested a review from bdhirsh June 26, 2025 14:42

zou3519 marked this pull request as ready for review June 26, 2025 14:42

zou3519 requested review from Chillee and ezyang as code owners June 26, 2025 14:42

zou3519 removed request for Chillee and ezyang June 26, 2025 14:42

bdhirsh approved these changes Jun 26, 2025

View reviewed changes

zou3519 added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 26, 2025

pytorchmergebot added the merging label Jun 27, 2025

pytorchmergebot closed this in aa2d541 Jun 27, 2025

pytorchmergebot added Merged and removed merging labels Jun 27, 2025

This was referenced Jun 27, 2025

Document AOTAutograd autocast divergence from eager behavior and provide a workaround #157104

Closed

Unexpected float32 overflow for amp training with torch.compile #153044

Open

github-actions bot deleted the gh/zou3519/1178/head branch July 28, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AOTDispatcher config to set backward autocast behavior #156356

Add AOTDispatcher config to set backward autocast behavior #156356

Uh oh!

zou3519 commented Jun 18, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

bdhirsh left a comment

Uh oh!

zou3519 commented Jun 27, 2025

Uh oh!

pytorchmergebot commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add AOTDispatcher config to set backward autocast behavior #156356

Add AOTDispatcher config to set backward autocast behavior #156356

Uh oh!

Conversation

zou3519 commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156356

❗ 1 Active SEVs

❌ 1 New Failure, 4 Unrelated Failures

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Jun 27, 2025

Uh oh!

pytorchmergebot commented Jun 27, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zou3519 commented Jun 18, 2025 •

edited

Loading

pytorch-bot bot commented Jun 18, 2025 •

edited

Loading