[Quant][CPU] Avoid NaN in fp8 output of qlinear and qconv #160957

Xia-Weiwen · 2025-08-19T09:31:47Z

Summary
When output dtype is fp8, oneDNN does not ensure intermediate results in the range of [-448, 448] before converting to fp8. So, we may get NaN in the output, which is a disaster for inference. This PR fixes this issue by clamping the intermediate results by oneDNN's post-op clip.

Test plan

pytest -sv test/quantization/core/test_quantized_op.py -k "q and fp8"

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168

pytorch-bot · 2025-08-19T09:31:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160957

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1bf5f16 with merge base 8f31aa9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Valentine233

LGTM. Thanks!

aten/src/ATen/native/quantized/cpu/qconv.cpp

aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp

Xia-Weiwen · 2025-08-21T06:01:13Z

@pytorchbot merge

pytorchmergebot · 2025-08-21T06:03:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…0957) **Summary** When output dtype is fp8, oneDNN does not ensure intermediate results in the range of [-448, 448] before converting to fp8. So, we may get NaN in the output, which is a disaster for inference. This PR fixes this issue by clamping the intermediate results by oneDNN's post-op clip. **Test plan** ``` pytest -sv test/quantization/core/test_quantized_op.py -k "q and fp8" ``` Pull Request resolved: pytorch#160957 Approved by: https://github.com/Valentine233, https://github.com/CaoE

pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) release notes: quantization release notes category labels Aug 19, 2025

pytorchbot added the open source label Aug 19, 2025

Xia-Weiwen requested review from Valentine233, chunyuan-w and yanbing-j August 19, 2025 09:39

Xia-Weiwen added the intel This tag is for PR from Intel label Aug 19, 2025

[Quant][CPU] Avoid NaN in fp8 output of qlinear and qconv

4c822f5

Xia-Weiwen requested a review from mingfeima August 20, 2025 08:24

Valentine233 approved these changes Aug 20, 2025

View reviewed changes

Xia-Weiwen requested a review from CaoE August 20, 2025 08:28

Xia-Weiwen marked this pull request as ready for review August 20, 2025 08:29

Xia-Weiwen requested review from digantdesai, jerryzh168, jianyuh, kimishpatel and salilsdesai as code owners August 20, 2025 08:29

CaoE reviewed Aug 20, 2025

View reviewed changes

aten/src/ATen/native/quantized/cpu/qconv.cpp Outdated Show resolved Hide resolved

CaoE reviewed Aug 20, 2025

View reviewed changes

aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp Show resolved Hide resolved

CaoE approved these changes Aug 20, 2025

View reviewed changes

Use a macro to represent max of fp8_e4m3

1bf5f16

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 21, 2025

pytorchmergebot added the merging label Aug 21, 2025

pytorchmergebot added the Merged label Aug 21, 2025

pytorchmergebot closed this in a941d7f Aug 21, 2025

pytorchmergebot removed the merging label Aug 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quant][CPU] Avoid NaN in fp8 output of qlinear and qconv #160957

[Quant][CPU] Avoid NaN in fp8 output of qlinear and qconv #160957

Uh oh!

Xia-Weiwen commented Aug 19, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 19, 2025 •

edited

Loading

Uh oh!

Valentine233 left a comment

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Aug 21, 2025

Uh oh!

pytorchmergebot commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Quant][CPU] Avoid NaN in fp8 output of qlinear and qconv #160957

[Quant][CPU] Avoid NaN in fp8 output of qlinear and qconv #160957

Uh oh!

Conversation

Xia-Weiwen commented Aug 19, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160957

✅ No Failures

Uh oh!

Valentine233 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Aug 21, 2025

Uh oh!

pytorchmergebot commented Aug 21, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Xia-Weiwen commented Aug 19, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 19, 2025 •

edited

Loading