-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Intel GPU] qconv at XPU backend #133080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Intel GPU] qconv at XPU backend #133080
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133080
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 3 Unrelated FailuresAs of commit 83849fa with merge base 3614d13 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@ZhiweiYan-96 , pls. reach out to me to review the bunch of PRs as long as they are ready. |
|
Successfully rebased |
|
@pytorchbot rebase -b main |
|
@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here |
|
Successfully rebased |
|
@pytorchbot merge -i |
|
The failed cases are irrelevant to this PR. We have filed a github issue to track it - #141466. |
Merge startedYour change will be merged while ignoring the following 5 checks: xpu / linux-jammy-xpu-2025_0-py3.9 / test (default, 1, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025_0-py3.9 / test (default, 3, 4, linux.idc.xpu), xpu / linux-jammy-xpu-2025_0-py3.9 / test (default, 4, 4, linux.idc.xpu), inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu), inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…139578) # Motivation This PR add `XPUInductorQuantizer`, which would defined the recipe of int8 quantization at XPU backend. # Detailed The `XPUInductorQuantizer` is class derived from `X86InductorQuantizer` as both quantizer would take the advantage of highly optimized operators in oneDNN library(qconv, qlinear, qconv/qlinear fusion). We share the same recipe as `X86InductorQuantizer`, so we would have same `annotate_xxxx` methods. So, in ideal situation, the `XPUInductorQuantizer` would have no class body as all implementation can inherit from base class. In this PR, we override the `annotate_xxx` method for operators that has NOT be implemented. All operators XPU backend does not implement would be fallbacked to fp32 implementation as the node in graph is a `dq-op-q` pairs. This would help provide good OOB usability for XPU backend. On the other hand, the implemented operators would uses `annotate_op` implemented in base class and could be lowered successfully. Pull Request resolved: #139578 Approved by: https://github.com/EikanWang, https://github.com/leslie-fang-intel, https://github.com/CuiYifeng, https://github.com/jerryzh168 ghstack dependencies: #133080
# Motivation This PR is a precursor to pytorch#133080. The PR extracts common logics in convolution and quantized convolution into `Utils.cpp`. With such modification, these two operators could share codes like input format querying, op layout querying. Pull Request resolved: pytorch#139580 Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/malfet ghstack dependencies: pytorch#139721
# Motivation This PR enables the XPU quantized convolution. The operators it registers are `onednn::qconv_prepack`, `onednn::qconv1d_pointwise`, `onednn::qconv2d_pointwise`, `onednn::qconv3d_pointwise`. We share same operator schemas as Intel CPU backend as both would call kernels implemented in oneDNN library. # Details The implemented operators would be further integrated into pt2e quant flow. In this PR, we validated the kernel functionality via the UT in `test/inductor/test_mkldnn_pattern_matcher.py` where CPU backend defines a series of UT for quantized convolution. Also, we extend the device support for inductor lowering pass and inductor IR defined in `torch/_inductor/fx_passes/quantization.py` and `torch/_inductor/mkldnn_ir.py`. The overall picture would be that, CPU and GPU backend could share the general optimization pass(op fusion) and quantization inductor IR. After lowering, the final kernel would be dispatched to different implementation in oneDNN library. In this PR, we share the same int8 quantizer in CPU, namely, `X68InductorQuantizer`. In next PR pytorch#139578, we will append a `XPUIndcutorQuantizer` which will customized the pt2e behaviors at XPU backend. The capability of `XPUInductorQuantizer` would gradually grow along with the development of quantized operators in XPU. # Validation * UT testing ```bash python test/inductor/test_mkldnn_pattern_matcher.py -v \ -k test_qconv2d_xpu \ -k test_qconv2d_silu_xpu \ -k test_qconv2d_relu6_xpu \ -k test_qconv2d_hardtanh_xpu \ -k test_qconv2d_hardswish_xpu ``` * Runtime exemplification ```bash #qconv2d onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_u8::blocked:acdb::f0 wei_s8::blocked:acdb::f0 bia_undef::undef::: dst_f32::blocked:acdb::f0,attr-scratchpad:user attr-scales:src0:0:f32+wei:1:f32 attr-zero-points:src0:0:s32 attr-post-ops:binary_add:f32:2+eltwise_linear:1,alg:convolution_direct,mb1_ic128oc128_ih6oh4kh3sh1dh0ph0_iw6ow4kw3sw1dw0pw0,0.0668945 #qconv2d_silu onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_u8::blocked:acdb::f0 wei_s8::blocked:acdb::f0 bia_undef::undef::: dst_u8::blocked:acdb::f0,attr-scratchpad:user attr-scales:src0:0:f32+wei:1:f32 attr-zero-points:src0:0:s32 attr-post-ops:eltwise_swish:1+binary_add:f32:2+eltwise_linear:0.0124779:22,alg:convolution_direct,mb1_ic3oc128_ih8oh6kh3sh1dh0ph0_iw8ow6kw3sw1dw0pw0,0.0881348 ``` Pull Request resolved: pytorch#133080 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/atalman
…ytorch#139578) # Motivation This PR add `XPUInductorQuantizer`, which would defined the recipe of int8 quantization at XPU backend. # Detailed The `XPUInductorQuantizer` is class derived from `X86InductorQuantizer` as both quantizer would take the advantage of highly optimized operators in oneDNN library(qconv, qlinear, qconv/qlinear fusion). We share the same recipe as `X86InductorQuantizer`, so we would have same `annotate_xxxx` methods. So, in ideal situation, the `XPUInductorQuantizer` would have no class body as all implementation can inherit from base class. In this PR, we override the `annotate_xxx` method for operators that has NOT be implemented. All operators XPU backend does not implement would be fallbacked to fp32 implementation as the node in graph is a `dq-op-q` pairs. This would help provide good OOB usability for XPU backend. On the other hand, the implemented operators would uses `annotate_op` implemented in base class and could be lowered successfully. Pull Request resolved: pytorch#139578 Approved by: https://github.com/EikanWang, https://github.com/leslie-fang-intel, https://github.com/CuiYifeng, https://github.com/jerryzh168 ghstack dependencies: pytorch#133080
| c10::optional<at::Tensor> accum, | ||
| double accum_scale, | ||
| int64_t accum_zero_point, | ||
| c10::optional<c10::ScalarType> output_dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use c10::optional. Use std::optional instead.
|
@ZhiweiYan-96 - Note my review comment: use |
Motivation
This PR enables the XPU quantized convolution. The operators it registers are
onednn::qconv_prepack,onednn::qconv1d_pointwise,onednn::qconv2d_pointwise,onednn::qconv3d_pointwise. We share same operator schemas as Intel CPU backend as both would call kernels implemented in oneDNN library.Details
The implemented operators would be further integrated into pt2e quant flow. In this PR, we validated the kernel functionality via the UT in
test/inductor/test_mkldnn_pattern_matcher.pywhere CPU backend defines a series of UT for quantized convolution. Also, we extend the device support for inductor lowering pass and inductor IR defined intorch/_inductor/fx_passes/quantization.pyandtorch/_inductor/mkldnn_ir.py. The overall picture would be that, CPU and GPU backend could share the general optimization pass(op fusion) and quantization inductor IR. After lowering, the final kernel would be dispatched to different implementation in oneDNN library.In this PR, we share the same int8 quantizer in CPU, namely,
X68InductorQuantizer. In next PR #139578, we will append aXPUIndcutorQuantizerwhich will customized the pt2e behaviors at XPU backend. The capability ofXPUInductorQuantizerwould gradually grow along with the development of quantized operators in XPU.Validation
Stack from ghstack (oldest at bottom):
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @gujinghui @fengyuan14 @guangyey