[Flex] Fix silent correctness w/ backpropping grads #163677

drisspg · 2025-09-23T20:12:37Z

Stack from ghstack (oldest at bottom):

-> [Flex] Fix silent correctness w/ backpropping grads #163677

Summary

Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd. As a result result the by happen chance match the require_gradness of the eager implementation and this check will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here.

TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Chillee @yanboliang @BoyuanFeng

[ghstack-poisoned]

pytorch-bot · 2025-09-23T20:12:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163677

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2870fe4 with merge base 134dfbe ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: de0c23f Pull-Request: #163677

drisspg · 2025-09-23T23:30:56Z

@pytorchbot merge

pytorchmergebot · 2025-09-23T23:32:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes #pytorch#162228 # Summary Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd. As a result result the by happen chance match the `require_grad`ness of the eager implementation and this check will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here. TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph Pull Request resolved: pytorch#163677 Approved by: https://github.com/ydwu4

Fixes ##162228 # Summary Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd. As a result result the by happen chance match the `require_grad`ness of the eager implementation and this check will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here. TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph Pull Request resolved: #163677 Approved by: https://github.com/ydwu4

Camyll · 2025-10-01T16:49:11Z

@pytorchbot cherry-pick --onto release/2.9 --c critical

Fixes ##162228 # Summary Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd. As a result result the by happen chance match the `require_grad`ness of the eager implementation and this check will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here. TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph Pull Request resolved: #163677 Approved by: https://github.com/ydwu4 (cherry picked from commit e2ce79e)

pytorchbot · 2025-10-01T16:54:32Z

Cherry picking #163677

The cherry pick PR is at #164366 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

[v.2.9.0] Release Tracker #162497 (comment)

Details for Dev Infra team

Raised by workflow job

[Flex] Fix silent correctness w/ backpropping grads (#163677) Fixes ##162228 # Summary Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd. As a result result the by happen chance match the `require_grad`ness of the eager implementation and this check will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here. TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph Pull Request resolved: #163677 Approved by: https://github.com/ydwu4 (cherry picked from commit e2ce79e) Co-authored-by: drisspg <drisspguessous@gmail.com>

Update

2870fe4

[ghstack-poisoned]

drisspg requested a review from zou3519 as a code owner September 23, 2025 20:12

drisspg added a commit that referenced this pull request Sep 23, 2025

[Flex] Fix silent correctness w/ backpropping grads

e0db928

ghstack-source-id: de0c23f Pull-Request: #163677

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 23, 2025

drisspg requested review from BoyuanFeng, Chillee and ydwu4 September 23, 2025 20:13

ydwu4 approved these changes Sep 23, 2025

View reviewed changes

drisspg added release notes: nn release notes category topic: bug fixes topic category module: flex attention labels Sep 23, 2025

drisspg mentioned this pull request Sep 23, 2025

Backpropagation to flex_attention score_mod biases fails based on presence of graph breaks #162228

Closed

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 23, 2025

pytorchmergebot added the merging label Sep 23, 2025

drisspg added this to the 2.9.0 milestone Sep 24, 2025

pytorchmergebot added the Merged label Sep 24, 2025

pytorchmergebot closed this in e2ce79e Sep 24, 2025

pytorchmergebot removed the merging label Sep 24, 2025

pytorch deleted a comment from pytorch-bot bot Oct 1, 2025

pytorchbot mentioned this pull request Oct 1, 2025

[Flex] Fix silent correctness w/ backpropping grads #164366

Merged

pytorchbot mentioned this pull request Oct 1, 2025

[v.2.9.0] Release Tracker #162497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flex] Fix silent correctness w/ backpropping grads #163677

[Flex] Fix silent correctness w/ backpropping grads #163677

Uh oh!

drisspg commented Sep 23, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Sep 23, 2025 •

edited

Loading

Uh oh!

drisspg commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

Camyll commented Oct 1, 2025

Uh oh!

pytorchbot commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Flex] Fix silent correctness w/ backpropping grads #163677

[Flex] Fix silent correctness w/ backpropping grads #163677

Uh oh!

Conversation

drisspg commented Sep 23, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163677

✅ No Failures

Uh oh!

drisspg commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Merge started

Uh oh!

Camyll commented Oct 1, 2025

Uh oh!

pytorchbot commented Oct 1, 2025

Cherry picking #163677

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

drisspg commented Sep 23, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 23, 2025 •

edited

Loading