KEMBAR78
[FlexAttention][TF32] Handle uninitialized `torch.backends.cuda.matmul.fp32_precision` by eqy · Pull Request #161102 · pytorch/pytorch · GitHub
Skip to content

Conversation

@eqy
Copy link
Collaborator

@eqy eqy commented Aug 20, 2025

@eqy eqy added module: cuda Related to torch.cuda, and CUDA support in general open source module: tf32 Related to tf32 data format topic: bug fixes topic category module: flex attention labels Aug 20, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161102

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4b6b15c with merge base 24e7f3c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@eqy
Copy link
Collaborator Author

eqy commented Aug 21, 2025

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 21, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

jeffdaily added a commit to ROCm/pytorch that referenced this pull request Aug 26, 2025
flex attention precision was forced to tf32 always after pytorch#161102.
Need to surround python ternary with ().
@jeffdaily jeffdaily mentioned this pull request Aug 26, 2025
@jeffdaily
Copy link
Collaborator

This change doesn't work. I'm pretty sure it's the root cause for #161022 and it's also why the ROCm CI for MI200 runners started timing out because it's forcing the default to be tf32 for flex attention tests and tf32 isn't supported on MI200. test_flex_attention.py use to take less than 10 minutes, but this mistake makes it take 30+ minutes and the shard times out as a result. Issuing revert. Please rework the logic.

You need to surround the new ternary operator in parentheses. Forward fix is submitted in #161465.

@eqy
Copy link
Collaborator Author

eqy commented Aug 26, 2025

@jeffdaily thanks for the forward fix
also root cause isn't the fix attempt, it was #158979 as mentioned in #161022 itself

pytorchmergebot pushed a commit that referenced this pull request Aug 26, 2025
PR #161102 caused tf32 to be the default precision for flex attention.  This PR forward-fixes the broken logic and restores ROCm MI200 CI flex attention test.

Pull Request resolved: #161465
Approved by: https://github.com/jeffdaily, https://github.com/eqy

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…l.fp32_precision` (pytorch#161102)

For pytorch#161022
The warning says the old API will be deprecated in 2.9+ anyway, leaving it up to the author of pytorch#125888 to decide on initialization behavior then

Pull Request resolved: pytorch#161102
Approved by: https://github.com/ngimel, https://github.com/drisspg, https://github.com/BoyuanFeng
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
PR pytorch#161102 caused tf32 to be the default precision for flex attention.  This PR forward-fixes the broken logic and restores ROCm MI200 CI flex attention test.

Pull Request resolved: pytorch#161465
Approved by: https://github.com/jeffdaily, https://github.com/eqy

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: cuda Related to torch.cuda, and CUDA support in general module: flex attention module: inductor module: tf32 Related to tf32 data format open source release notes: cuda release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants