[ROCm] Fix mx fp8 and fp4 code after scaling refactor changes. #163127

jagadish-amd · 2025-09-16T23:58:55Z

PR #151360 added mx fp8 and fp4 support on ROCm.

However, on recent upstream, scaling function in Blas.cpp along with test_matmul_cuda changes triggered failures.
This patch corrects is_blockwise_1x32_scaling function code.
Fixes the m, n, k dimensions for ROCm mx case.
Modify FP4E2M1FN_LARGEST_POW2 (largest power of 2 representable in torch.float4_e2m1fn_x2) to 2.
This resulted in higher SQNR value for mx fp4 test.

Testing result on gfx950 w/ ROCm7.0

PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v Ran 452 tests in 22.698s
OK passed 111
This is same as before. (when PR 151360 was merged)

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

PR pytorch#151360 added mx fp8 and fp4 support on ROCm. However on recent upstream, scaling function in Blas.cpp along with test_matmul_cuda changes trigerred failures. This patch corrects is_blockwise_1x32_scaling function code and fixes minor bug in test_matmul_cuda. Testing result on gfx950 w/ ROCm7.0 PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v Ran 452 tests in 22.698s OK passed 111 This is same as before. (when PR 151360 was merged) Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

pytorch-bot · 2025-09-16T23:58:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163127

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 905b7f0 with merge base 9009c4d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/test_matmul_cuda.py

jeffdaily · 2025-09-19T01:44:25Z

@pytorchbot merge

pytorchmergebot · 2025-09-19T01:46:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

pytorchmergebot · 2025-09-19T03:16:27Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

jeffdaily · 2025-09-19T03:24:04Z

@pytorchbot merge

pytorchmergebot · 2025-09-19T03:25:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-09-19T03:36:42Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / quick-checks / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

jeffdaily · 2025-09-19T12:22:30Z

@pytorchbot merge

pytorchmergebot · 2025-09-19T12:24:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ch#163127) PR pytorch#151360 added mx fp8 and fp4 support on ROCm. 1. However, on recent upstream, scaling function in Blas.cpp along with test_matmul_cuda changes triggered failures. This patch corrects is_blockwise_1x32_scaling function code. 2. Fixes the m, n, k dimensions for ROCm mx case. 3. Modify FP4E2M1FN_LARGEST_POW2 (largest power of 2 representable in `torch.float4_e2m1fn_x2`) to 2. This resulted in higher SQNR value for mx fp4 test. Testing result on gfx950 w/ ROCm7.0 PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py -k test_blockwise -v Ran 452 tests in 22.698s OK passed 111 This is same as before. (when PR 151360 was merged) Pull Request resolved: pytorch#163127 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Sep 16, 2025

pytorchbot added the open source label Sep 17, 2025

jeffdaily requested changes Sep 17, 2025

View reviewed changes

test/test_matmul_cuda.py Show resolved Hide resolved

jeffdaily added the topic: not user facing topic category label Sep 17, 2025

review comments

fe958f8

jeffdaily approved these changes Sep 19, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 19, 2025

pytorchmergebot added the merging label Sep 19, 2025

jeffdaily marked this pull request as ready for review September 19, 2025 01:47

jeffdaily requested review from eqy and syed-ahmed as code owners September 19, 2025 01:47

Change dim and approx_match_sqnr_target value

905b7f0

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Sep 19, 2025

pytorchmergebot removed the merging label Sep 19, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 19, 2025

pytorchmergebot added the merging label Sep 19, 2025

pytorchmergebot removed the merging label Sep 19, 2025

pytorchmergebot added the merging label Sep 19, 2025

pytorchmergebot added the Merged label Sep 19, 2025

pytorchmergebot closed this in 264e7f6 Sep 19, 2025

pytorchmergebot removed the merging label Sep 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Fix mx fp8 and fp4 code after scaling refactor changes. #163127

[ROCm] Fix mx fp8 and fp4 code after scaling refactor changes. #163127

Uh oh!

jagadish-amd commented Sep 16, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

jeffdaily commented Sep 19, 2025

Uh oh!

pytorchmergebot commented Sep 19, 2025

Uh oh!

pytorchmergebot commented Sep 19, 2025

Uh oh!

jeffdaily commented Sep 19, 2025

Uh oh!

pytorchmergebot commented Sep 19, 2025

Uh oh!

pytorchmergebot commented Sep 19, 2025

Uh oh!

jeffdaily commented Sep 19, 2025

Uh oh!

pytorchmergebot commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ROCm] Fix mx fp8 and fp4 code after scaling refactor changes. #163127

[ROCm] Fix mx fp8 and fp4 code after scaling refactor changes. #163127

Uh oh!

Conversation

jagadish-amd commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163127

✅ No Failures

Uh oh!

Uh oh!

jeffdaily commented Sep 19, 2025

Uh oh!

pytorchmergebot commented Sep 19, 2025

Merge started

Uh oh!

pytorchmergebot commented Sep 19, 2025

Merge failed

Uh oh!

jeffdaily commented Sep 19, 2025

Uh oh!

pytorchmergebot commented Sep 19, 2025

Merge started

Uh oh!

pytorchmergebot commented Sep 19, 2025

Merge failed

Uh oh!

jeffdaily commented Sep 19, 2025

Uh oh!

pytorchmergebot commented Sep 19, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jagadish-amd commented Sep 16, 2025 •

edited

Loading

pytorch-bot bot commented Sep 16, 2025 •

edited

Loading