KEMBAR78
[Flex] Fix silent correctness w/ backpropping grads by drisspg · Pull Request #163677 · pytorch/pytorch · GitHub
Skip to content

Conversation

@drisspg
Copy link
Contributor

@drisspg drisspg commented Sep 23, 2025

Stack from ghstack (oldest at bottom):

Fixes ##162228

Summary

Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd. As a result result the by happen chance match the require_gradness of the eager implementation and this check will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here.

TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Chillee @yanboliang @BoyuanFeng

[ghstack-poisoned]
@drisspg drisspg requested a review from zou3519 as a code owner September 23, 2025 20:12
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163677

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2870fe4 with merge base 134dfbe (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg added a commit that referenced this pull request Sep 23, 2025
@drisspg
Copy link
Contributor Author

drisspg commented Sep 23, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 23, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@drisspg drisspg added this to the 2.9.0 milestone Sep 24, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
Fixes #pytorch#162228

# Summary

Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd.  As a result result the by happen chance match the `require_grad`ness of the eager implementation and this check  will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here.

TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph

Pull Request resolved: pytorch#163677
Approved by: https://github.com/ydwu4
jainapurva pushed a commit that referenced this pull request Sep 29, 2025
Fixes ##162228

# Summary

Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd.  As a result result the by happen chance match the `require_grad`ness of the eager implementation and this check  will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here.

TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph

Pull Request resolved: #163677
Approved by: https://github.com/ydwu4
@pytorch pytorch deleted a comment from pytorch-bot bot Oct 1, 2025
@Camyll
Copy link
Contributor

Camyll commented Oct 1, 2025

@pytorchbot cherry-pick --onto release/2.9 --c critical

pytorchbot pushed a commit that referenced this pull request Oct 1, 2025
Fixes ##162228

# Summary

Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd.  As a result result the by happen chance match the `require_grad`ness of the eager implementation and this check  will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here.

TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph

Pull Request resolved: #163677
Approved by: https://github.com/ydwu4

(cherry picked from commit e2ce79e)
@pytorchbot
Copy link
Collaborator

Cherry picking #163677

The cherry pick PR is at #164366 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

Camyll pushed a commit that referenced this pull request Oct 1, 2025
[Flex] Fix silent correctness w/ backpropping grads (#163677)

Fixes ##162228

# Summary

Majority of our tests are only compiling flex-attention in isolation. This means that for fake tensor propagation the input primals and all captured buffers dont do any intermediate computation below autograd.  As a result result the by happen chance match the `require_grad`ness of the eager implementation and this check  will pass. However if score_mod is a the result of some other intermediate fake tensor prop then it is not guaranteed to have accurate req_gradness, which was happening here.

TLDR is that this was a boot and suspenders that was actually harmful and we should just let the joint graph handle creating the correct joint graph

Pull Request resolved: #163677
Approved by: https://github.com/ydwu4

(cherry picked from commit e2ce79e)

Co-authored-by: drisspg <drisspguessous@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants