[Quant][CPU] Enable fp8 qconv #157076

Xia-Weiwen · 2025-06-27T08:53:41Z

Summary
Enable fp8 qconv on CPU. It's part of the plan to enable fp8 static quantization on CPU. This PR only adds FP8 support of the existing int8 qconv op. It does not add a new op nor does it affect frontend or quantization flow. The schema of the qconv op is not changed either.

So, the FP8 qconv shares the same op as INT8 qconv and the difference is that src/wei dtype is fp8 instead of int8. The output dtype can be fp8/float32/bfloat16. The implementation uses the oneDNN library.

Note:
OneDNN does not support quantized fp8 convolution until v3.9 but the version used in PyTorch is v3.7.2. So, the op goes to the reference kernel for now. And we have also update the oneDNN path so that it's compatible with the fp8 dtype. Once oneDNN is upgraded to v3.9 or newer, minimum changes are needed to enable the oneDNN path. And we have ensured that the behavior of the reference kernel is the same as the new oneDNN's implementation.

oneDNN version < 3.9 (now)
- Always go to the reference kernel
oneDNN version >= 3.9 (future)
- Go to reference kernel on old platforms (without AMX)
- Use oneDNN on new platforms (with AMX)

Test plan

pytest test/quantization/core/test_quantized_op.py -k "qconv and fp8"

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168

pytorch-bot · 2025-06-27T08:53:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157076

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 0cc5b80 with merge base 84b77ec ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-jammy-cuda12.8-py3.10-gcc11-test / test (distributed, 1, 3, lf.linux.g4dn.12xlarge.nvidia.gpu) (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/quantized/cpu/qconv.cpp

jerryzh168 · 2025-07-11T03:31:09Z

have we discussed about long term whether these quantized ops should live in pytorch or torchao?

Xia-Weiwen · 2025-07-11T05:36:54Z

have we discussed about long term whether these quantized ops should live in pytorch or torchao?

Not yet. Do you have a plan to move all these kernels to Torchao? And does torchao have a plan to build cpp kernels for X86 CPU by default? Thanks.

Xia-Weiwen · 2025-07-11T05:56:35Z

@pytorchbot merge

pytorchmergebot · 2025-07-11T05:58:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jerryzh168 · 2025-07-11T17:38:43Z

does torchao have a plan to build cpp kernels for X86 CPU by default? Thanks.

this is in discussion I think, we can revisit moving these kernels to torchao in the future then. but I think in the end it makes sense to move

Xia-Weiwen · 2025-07-14T01:49:57Z

does torchao have a plan to build cpp kernels for X86 CPU by default? Thanks.

this is in discussion I think, we can revisit moving these kernels to torchao in the future then. but I think in the end it makes sense to move

Moving makes sense to me, too. Please let us know when you figure out a plan. Thanks.

pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) release notes: quantization release notes category labels Jun 27, 2025

pytorchbot added the open source label Jun 27, 2025

[Quant][CPU] Enable fp8 qconv

d7f8d19

Xia-Weiwen force-pushed the fp8_qconv branch from b03614e to f4bcf11 Compare July 4, 2025 06:56

Xia-Weiwen added the intel This tag is for PR from Intel label Jul 4, 2025

Xia-Weiwen changed the title ~~Fp8 qconv~~ [Quant][CPU] Enable fp8 qconv Jul 4, 2025

Xia-Weiwen requested review from CaoE, Valentine233, leslie-fang-intel, mingfeima and yanbing-j and removed request for CaoE, Valentine233, leslie-fang-intel, mingfeima and yanbing-j July 4, 2025 07:11

Xia-Weiwen added 7 commits July 4, 2025 14:43

Refine code

9f0cd3b

Merge branch 'main' into fp8_qconv

f4bcf11

Debug CI failures

037fc2a

Fix CI failures

ff1e21d

Fix CI failures

8581fca

Fix CI failures

daa38fc

Fix CI Failures

c56e632

Xia-Weiwen requested review from CaoE, Valentine233, leslie-fang-intel, mingfeima and yanbing-j July 7, 2025 14:14

Fix bug in meta pass

f6ed899

leslie-fang-intel reviewed Jul 8, 2025

View reviewed changes

aten/src/ATen/native/quantized/cpu/qconv.cpp Outdated Show resolved Hide resolved

Xia-Weiwen requested a review from leslie-fang-intel July 8, 2025 08:29

Xia-Weiwen added 2 commits July 8, 2025 16:18

Always go to the reference path

5de08e2

Merge branch 'main' into fp8_qconv

f7ca450

leslie-fang-intel approved these changes Jul 10, 2025

View reviewed changes

aten/src/ATen/native/quantized/cpu/qconv.cpp Outdated Show resolved Hide resolved

Xia-Weiwen marked this pull request as ready for review July 10, 2025 10:03

Xia-Weiwen requested review from digantdesai, jerryzh168, jianyuh, kimishpatel and salilsdesai as code owners July 10, 2025 10:03

Xia-Weiwen added 2 commits July 10, 2025 10:40

Refine code

56683a3

Bug fix

0cc5b80

jerryzh168 approved these changes Jul 11, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 11, 2025

pytorchmergebot added the merging label Jul 11, 2025

pytorchmergebot closed this in e1a2098 Jul 11, 2025

pytorchmergebot added Merged and removed merging labels Jul 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quant][CPU] Enable fp8 qconv #157076

[Quant][CPU] Enable fp8 qconv #157076

Uh oh!

Xia-Weiwen commented Jun 27, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jerryzh168 commented Jul 11, 2025

Uh oh!

Xia-Weiwen commented Jul 11, 2025

Uh oh!

Xia-Weiwen commented Jul 11, 2025

Uh oh!

pytorchmergebot commented Jul 11, 2025

Uh oh!

jerryzh168 commented Jul 11, 2025

Uh oh!

Xia-Weiwen commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Quant][CPU] Enable fp8 qconv #157076

[Quant][CPU] Enable fp8 qconv #157076

Uh oh!

Conversation

Xia-Weiwen commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157076

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

jerryzh168 commented Jul 11, 2025

Uh oh!

Xia-Weiwen commented Jul 11, 2025

Uh oh!

Xia-Weiwen commented Jul 11, 2025

Uh oh!

pytorchmergebot commented Jul 11, 2025

Merge started

Uh oh!

jerryzh168 commented Jul 11, 2025

Uh oh!

Xia-Weiwen commented Jul 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Xia-Weiwen commented Jun 27, 2025 •

edited

Loading

pytorch-bot bot commented Jun 27, 2025 •

edited

Loading