KEMBAR78
[quant][gpu] Adding quantized conv operator in cudnn by jerryzh168 · Pull Request #70622 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Jan 4, 2022

Stack from ghstack:

Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

TODO:

  1. Support bias, relu, support more parameter flexibilities
  2. Use the packed_prams api

Test Plan:

> USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install
> python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn

debug command:

CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D33409155

Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

Test Plan:
python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-probot
Copy link

pytorch-probot bot commented Jan 4, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/a3efda64fe7ccf9cf163166434931ba64464e543/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-binary-conda ciflow/binaries, ciflow/binaries/conda 🚫 skipped
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries/libtorch 🚫 skipped
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries/libtorch 🚫 skipped
linux-binary-manywheel ciflow/binaries, ciflow/binaries/wheel 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-bionic-py3.6-clang9 ciflow/xla 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 4, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 343292c (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

jerryzh168 added a commit that referenced this pull request Jan 4, 2022
Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

Test Plan:
python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: be41e0a
Pull Request resolved: #70622
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

Test Plan:
```
> USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install
> python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 11, 2022
Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

Test Plan:
python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: c76e3e4
Pull Request resolved: #70622
@masahi
Copy link

masahi commented Jan 14, 2022

Hi @jerryzh168, what is your experience with cuDNN on int8 like? Recently I tried to support cuDNN int8 on NHWC layout, but I got an impression that cuDNN does not support int32 output. What I need is int8 data, int8 weight -> int32 output, but I couldn't get it working.

I noticed that you are doing int8 data, int8 weight -> float32 output. I also tried it but it doesn't work either (at least on NHWC input, haven't tried NCHW).

@jerryzh168
Copy link
Contributor Author

Hi @jerryzh168, what is your experience with cuDNN on int8 like? Recently I tried to support cuDNN int8 on NHWC layout, but I got an impression that cuDNN does not support int32 output. What I need is int8 data, int8 weight -> int32 output, but I couldn't get it working.

I noticed that you are doing int8 data, int8 weight -> float32 output. I also tried it but it doesn't work either (at least on NHWC input, haven't tried NCHW).

Hi @masahi, I just started with cuDNN on int8, we are planning to support native quantized GPU backend through cuDNN this half.

About the layout support, I'm using cudnn_v8 apis, and looks like the data layout is not yet exposed in the api: https://github.com/NVIDIA/cudnn-frontend/blob/main/include/cudnn_frontend_Tensor.h#L37, so my guess is that it's using the default layout (might be NCHW).

I haven't been able to get this working yet, current it can't find any engines, still debugging and trying to find the problem.

please let me know if you have NCHW working.

Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

Test Plan:
```
> USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install
> python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 21, 2022
Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

Test Plan:
python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 158e83f
Pull Request resolved: #70622
Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

Test Plan:
```
> USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install
> python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155)

[ghstack-poisoned]
@jerryzh168 jerryzh168 requested a review from dzdang January 22, 2022 03:14
@jerryzh168
Copy link
Contributor Author

Hi @jerryzh168, what is your experience with cuDNN on int8 like? Recently I tried to support cuDNN int8 on NHWC layout, but I got an impression that cuDNN does not support int32 output. What I need is int8 data, int8 weight -> int32 output, but I couldn't get it working.

I noticed that you are doing int8 data, int8 weight -> float32 output. I also tried it but it doesn't work either (at least on NHWC input, haven't tried NCHW).

Hi @masahi, this PR is working now, we do need NHWC layout for input, weight and output .
for dtype, the operator dtype needs to be int32, and the output tensor dtype can be int32 or float I think

@jerryzh168 jerryzh168 requested a review from zasdfgbnm January 22, 2022 03:15
Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

TODO:
1. Support bias, relu, support more parameter flexibilities
2. Use the packed_prams api

Test Plan:
```
> USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install
> python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155)

[ghstack-poisoned]
Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

TODO:
1. Support bias, relu, support more parameter flexibilities
2. Use the packed_prams api

Test Plan:
```
> USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install
> python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155)

[ghstack-poisoned]
Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

TODO:
1. Support bias, relu, support more parameter flexibilities
2. Use the packed_prams api

Test Plan:
```
> USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install
> python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jan 22, 2022
Summary:
This PR is the initial PR to add eager mode quantized GPU operator support, we'll start
with convolution, following cudnn fp32 Conv code and the example cudnn frontend code
#51390
https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557

Test Plan:
python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 18e9c44
Pull Request resolved: #70622
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jerryzh168 jerryzh168 requested a review from vkuzo January 24, 2022 18:09
@zasdfgbnm
Copy link
Collaborator

cc @eqy

@eqy
Copy link
Collaborator

eqy commented Jan 24, 2022

@jerryzh168 Would it be possible to reuse v8 methods that are planned to be introduced with V8 API Convolutions? e.g., in
#60755
https://github.com/eqy/pytorch/blob/cudnn3/aten/src/ATen/native/cudnn/Conv_v8.cpp

dzdang added a commit that referenced this pull request Mar 29, 2022
…operator in cudnn [PR currently incomplete]"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
…n [PR currently incomplete]"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
…operator in cudnn"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
…operator in cudnn"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
…operator in cudnn"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
…operator in cudnn"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 29, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

ghstack-source-id: 2e4a852
Pull Request resolved: #73959
dzdang added a commit that referenced this pull request Mar 31, 2022
…operator in cudnn"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 31, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 31, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

ghstack-source-id: 586d55e
Pull Request resolved: #73959
dzdang added a commit that referenced this pull request Mar 31, 2022
…operator in cudnn"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 31, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 31, 2022
…operator in cudnn"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 31, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 31, 2022
…operator in cudnn"

Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 31, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

Differential Revision: [D34824251](https://our.internmc.facebook.com/intern/diff/D34824251)

[ghstack-poisoned]
dzdang added a commit that referenced this pull request Mar 31, 2022
Summary:
This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```

ghstack-source-id: 0a2c6c1
Pull Request resolved: #73959
facebook-github-bot pushed a commit that referenced this pull request Apr 1, 2022
Summary:
Pull Request resolved: #73959

This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test Plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```
Imported from OSS

Differential Revision:
D34824251
D34824251

Reviewed By: jerryzh168

Pulled By: dzdang

fbshipit-source-id: 47139796782ade8d030ba2f9968a9abdd3a91d2f
pytorchmergebot pushed a commit that referenced this pull request Apr 1, 2022
Summary:
Pull Request resolved: #73959

This PR is similar to #70622, but for the linear operator.
Unlke PR 70622, this implementations directly uses packed parameters, rather than a refactorization, as was done for the conv operator,
and also directly implements bias & relu.
Currently, int8 matrix multiplication is not supported in cudnn. The ETA for this support is in the first half of April 2022. As
a temporary workaround, we cast our int8 tensors to fp32 prior to matmul.

Test Plan:
```
python test/test_quantization.py TestQuantizedLinear.test_qlinear_cudnn
```
Imported from OSS

Differential Revision:
D34824251
D34824251

Reviewed By: jerryzh168

Pulled By: dzdang

fbshipit-source-id: 47139796782ade8d030ba2f9968a9abdd3a91d2f
(cherry picked from commit eade369)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants