[FlexAttention][TF32] Handle uninitialized `torch.backends.cuda.matmul.fp32_precision` #161102

eqy · 2025-08-20T21:42:23Z

For #161022
The warning says the old API will be deprecated in 2.9+ anyway, leaving it up to the author of #125888 to decide on initialization behavior then

cc @ptrblck @msaroufim @jerryzh168 @zasdfgbnm @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Chillee @drisspg @yanboliang @BoyuanFeng

pytorch-bot · 2025-08-20T21:42:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161102

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4b6b15c with merge base 24e7f3c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy · 2025-08-21T01:01:52Z

@pytorchmergebot merge

pytorchmergebot · 2025-08-21T01:03:42Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

flex attention precision was forced to tf32 always after pytorch#161102. Need to surround python ternary with ().

jeffdaily · 2025-08-26T02:43:14Z

This change doesn't work. I'm pretty sure it's the root cause for #161022 and it's also why the ROCm CI for MI200 runners started timing out because it's forcing the default to be tf32 for flex attention tests and tf32 isn't supported on MI200. test_flex_attention.py use to take less than 10 minutes, but this mistake makes it take 30+ minutes and the shard times out as a result. Issuing revert. Please rework the logic.

You need to surround the new ternary operator in parentheses. Forward fix is submitted in #161465.

eqy · 2025-08-26T02:56:23Z

@jeffdaily thanks for the forward fix
also root cause isn't the fix attempt, it was #158979 as mentioned in #161022 itself

PR #161102 caused tf32 to be the default precision for flex attention. This PR forward-fixes the broken logic and restores ROCm MI200 CI flex attention test. Pull Request resolved: #161465 Approved by: https://github.com/jeffdaily, https://github.com/eqy Co-authored-by: Jeff Daily <jeff.daily@amd.com>

…l.fp32_precision` (pytorch#161102) For pytorch#161022 The warning says the old API will be deprecated in 2.9+ anyway, leaving it up to the author of pytorch#125888 to decide on initialization behavior then Pull Request resolved: pytorch#161102 Approved by: https://github.com/ngimel, https://github.com/drisspg, https://github.com/BoyuanFeng

PR pytorch#161102 caused tf32 to be the default precision for flex attention. This PR forward-fixes the broken logic and restores ROCm MI200 CI flex attention test. Pull Request resolved: pytorch#161465 Approved by: https://github.com/jeffdaily, https://github.com/eqy Co-authored-by: Jeff Daily <jeff.daily@amd.com>

check in

4b6b15c

eqy added module: cuda Related to torch.cuda, and CUDA support in general open source module: tf32 Related to tf32 data format topic: bug fixes topic category module: flex attention labels Aug 20, 2025

pytorch-bot bot added ciflow/inductor module: inductor labels Aug 20, 2025

eqy mentioned this pull request Aug 20, 2025

FlexAttention default precision decreased from ieee to tf32 since stable #161022

Open

eqy added the release notes: cuda release notes category label Aug 20, 2025

ngimel approved these changes Aug 20, 2025

View reviewed changes

drisspg approved these changes Aug 21, 2025

View reviewed changes

BoyuanFeng approved these changes Aug 21, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 21, 2025

pytorchmergebot added the merging label Aug 21, 2025

pytorchmergebot added the Merged label Aug 21, 2025

pytorchmergebot closed this in 117f11a Aug 21, 2025

pytorchmergebot removed the merging label Aug 21, 2025

jeffdaily added a commit to ROCm/pytorch that referenced this pull request Aug 26, 2025

forward fix pytorch#161102

401ec2b

flex attention precision was forced to tf32 always after pytorch#161102. Need to surround python ternary with ().

jeffdaily mentioned this pull request Aug 26, 2025

forward fix #161102 #161465

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FlexAttention][TF32] Handle uninitialized `torch.backends.cuda.matmul.fp32_precision` #161102

[FlexAttention][TF32] Handle uninitialized `torch.backends.cuda.matmul.fp32_precision` #161102

Uh oh!

eqy commented Aug 20, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 20, 2025 •

edited

Loading

Uh oh!

eqy commented Aug 21, 2025

Uh oh!

pytorchmergebot commented Aug 21, 2025

Uh oh!

jeffdaily commented Aug 26, 2025

Uh oh!

eqy commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[FlexAttention][TF32] Handle uninitialized torch.backends.cuda.matmul.fp32_precision #161102

[FlexAttention][TF32] Handle uninitialized torch.backends.cuda.matmul.fp32_precision #161102

Uh oh!

Conversation

eqy commented Aug 20, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161102

✅ No Failures

Uh oh!

eqy commented Aug 21, 2025

Uh oh!

pytorchmergebot commented Aug 21, 2025

Merge started

Uh oh!

jeffdaily commented Aug 26, 2025

Uh oh!

eqy commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[FlexAttention][TF32] Handle uninitialized `torch.backends.cuda.matmul.fp32_precision` #161102

[FlexAttention][TF32] Handle uninitialized `torch.backends.cuda.matmul.fp32_precision` #161102

eqy commented Aug 20, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 20, 2025 •

edited

Loading