[quant][gpu] Adding quantized conv operator in cudnn #70622

jerryzh168 · 2022-01-04T18:45:07Z

Stack from ghstack:

-> [quant][gpu] Adding quantized conv operator in cudnn #70622

Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

TODO:

Support bias, relu, support more parameter flexibilities
Use the packed_prams api

Test Plan:

> USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install
> python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn

debug command:

CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D33409155

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-probot · 2022-01-04T18:45:10Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/a3efda64fe7ccf9cf163166434931ba64464e543/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries/conda`	🚫 skipped
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries/libtorch`	🚫 skipped
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries/libtorch`	🚫 skipped
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries/wheel`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-bionic-py3.6-clang9	`ciflow/xla`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2022-01-04T18:45:12Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/70622
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 343292c (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: be41e0a Pull Request resolved: #70622

jerryzh168 · 2022-01-04T18:47:45Z

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c76e3e4 Pull Request resolved: #70622

masahi · 2022-01-14T22:16:26Z

Hi @jerryzh168, what is your experience with cuDNN on int8 like? Recently I tried to support cuDNN int8 on NHWC layout, but I got an impression that cuDNN does not support int32 output. What I need is int8 data, int8 weight -> int32 output, but I couldn't get it working.

I noticed that you are doing int8 data, int8 weight -> float32 output. I also tried it but it doesn't work either (at least on NHWC input, haven't tried NCHW).

jerryzh168 · 2022-01-15T02:08:23Z

Hi @jerryzh168, what is your experience with cuDNN on int8 like? Recently I tried to support cuDNN int8 on NHWC layout, but I got an impression that cuDNN does not support int32 output. What I need is int8 data, int8 weight -> int32 output, but I couldn't get it working.

I noticed that you are doing int8 data, int8 weight -> float32 output. I also tried it but it doesn't work either (at least on NHWC input, haven't tried NCHW).

Hi @masahi, I just started with cuDNN on int8, we are planning to support native quantized GPU backend through cuDNN this half.

About the layout support, I'm using cudnn_v8 apis, and looks like the data layout is not yet exposed in the api: https://github.com/NVIDIA/cudnn-frontend/blob/main/include/cudnn_frontend_Tensor.h#L37, so my guess is that it's using the default layout (might be NCHW).

I haven't been able to get this working yet, current it can't find any engines, still debugging and trying to find the problem.

please let me know if you have NCHW working.

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 158e83f Pull Request resolved: #70622

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

jerryzh168 · 2022-01-22T03:15:08Z

Hi @jerryzh168, what is your experience with cuDNN on int8 like? Recently I tried to support cuDNN int8 on NHWC layout, but I got an impression that cuDNN does not support int32 output. What I need is int8 data, int8 weight -> int32 output, but I couldn't get it working.

I noticed that you are doing int8 data, int8 weight -> float32 output. I also tried it but it doesn't work either (at least on NHWC input, haven't tried NCHW).

Hi @masahi, this PR is working now, we do need NHWC layout for input, weight and output .
for dtype, the operator dtype needs to be int32, and the output tensor dtype can be int32 or float I think

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 18e9c44 Pull Request resolved: #70622

jerryzh168 · 2022-01-22T03:29:37Z

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

aten/src/ATen/native/quantized/library.cpp

zasdfgbnm · 2022-01-24T23:42:20Z

cc @eqy

eqy · 2022-01-24T23:51:04Z

@jerryzh168 Would it be possible to reuse v8 methods that are planned to be introduced with V8 API Convolutions? e.g., in
#60755
https://github.com/eqy/pytorch/blob/cudnn3/aten/src/ATen/native/cudnn/Conv_v8.cpp

…operator in cudnn [PR currently incomplete]" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

…n [PR currently incomplete]" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

…operator in cudnn" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

…operator in cudnn" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

…operator in cudnn" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

…operator in cudnn" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` ghstack-source-id: 2e4a852 Pull Request resolved: #73959

…operator in cudnn" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` ghstack-source-id: 586d55e Pull Request resolved: #73959

…operator in cudnn" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

…operator in cudnn" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

…operator in cudnn" Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251) [ghstack-poisoned]

Summary: This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` ghstack-source-id: 0a2c6c1 Pull Request resolved: #73959

Summary: Pull Request resolved: #73959 This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test Plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Imported from OSS Differential Revision: D34824251 D34824251 Reviewed By: jerryzh168 Pulled By: dzdang fbshipit-source-id: 47139796782ade8d030ba2f9968a9abdd3a91d2f

Summary: Pull Request resolved: #73959 This PR is similar to #70622, but for the linear operator. Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator, and also directly implements bias & relu. Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As a temporary workaround, we cast our int8 tensors to fp32 prior to matmul. Test Plan: ``` python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn ``` Imported from OSS Differential Revision: D34824251 D34824251 Reviewed By: jerryzh168 Pulled By: dzdang fbshipit-source-id: 47139796782ade8d030ba2f9968a9abdd3a91d2f (cherry picked from commit eade369)

pytorch-probot bot added the ciflow/default label Jan 4, 2022

facebook-github-bot added the cla signed label Jan 4, 2022

jerryzh168 mentioned this pull request Jan 4, 2022

[bc-breaking][quant][be] Add is_qat as a required argument to eager and fx graph apis #70009

Closed

jerryzh168 requested a review from dzdang January 22, 2022 03:14

jerryzh168 requested a review from zasdfgbnm January 22, 2022 03:15

jerryzh168 requested a review from vkuzo January 24, 2022 18:09

vkuzo reviewed Jan 24, 2022

View reviewed changes

aten/src/ATen/native/quantized/library.cpp Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[quant][gpu] Adding quantized conv operator in cudnn #70622

[quant][gpu] Adding quantized conv operator in cudnn #70622

Uh oh!

jerryzh168 commented Jan 4, 2022 •

edited

Loading

Uh oh!

pytorch-probot bot commented Jan 4, 2022 •

edited by pytorch-bot bot

Loading

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Jan 4, 2022 •

edited

Loading

Uh oh!

jerryzh168 commented Jan 4, 2022

Uh oh!

masahi commented Jan 14, 2022

Uh oh!

jerryzh168 commented Jan 15, 2022

Uh oh!

jerryzh168 commented Jan 22, 2022

Uh oh!

jerryzh168 commented Jan 22, 2022

Uh oh!

Uh oh!

zasdfgbnm commented Jan 24, 2022

Uh oh!

eqy commented Jan 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[quant][gpu] Adding quantized conv operator in cudnn #70622

[quant][gpu] Adding quantized conv operator in cudnn #70622

Uh oh!

Conversation

jerryzh168 commented Jan 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-probot bot commented Jan 4, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Jan 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

jerryzh168 commented Jan 4, 2022

Uh oh!

masahi commented Jan 14, 2022

Uh oh!

jerryzh168 commented Jan 15, 2022

Uh oh!

jerryzh168 commented Jan 22, 2022

Uh oh!

jerryzh168 commented Jan 22, 2022

Uh oh!

Uh oh!

zasdfgbnm commented Jan 24, 2022

Uh oh!

eqy commented Jan 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jerryzh168 commented Jan 4, 2022 •

edited

Loading

pytorch-probot bot commented Jan 4, 2022 •

edited by pytorch-bot bot

Loading

facebook-github-bot commented Jan 4, 2022 •

edited

Loading