step 0 of cuDNN v8 convolution API integration #51390

zasdfgbnm · 2021-01-30T00:23:10Z

This PR is step 0 of adding PyTorch convolution bindings using the cuDNN frontend. The cuDNN frontend is the recommended way of using cuDNN v8 API. It is supposed to have faster release cycles, so that, for example, if people find a specific kernel has a bug, they can report it, and that kernel will be blocked in the cuDNN frontend and frameworks could just update that submodule without the need for waiting for a whole cuDNN release.

The work is not complete, and this PR is only step 0.

What this PR does:

Add cudnn-frontend as a submodule.
Modify cmake to build that submodule.
Add bindings for convolution forward in Conv_v8.cpp, which is disabled by a macro by default.
Tested manually by enabling the macro and run test_nn.py. All tests pass except those mentioned below.

What this PR doesn't:

Only convolution forward, no backward. The backward will use v7 API.
No 64bit-indexing support for some configuration. This is a known issue of cuDNN, and will be fixed in a later cuDNN version. PyTorch will not implement any workaround for issue, but instead, v8 API should be disabled on problematic cuDNN versions.
No test beyond PyTorch's unit tests.
- Not tested for correctness on real models.
- Not benchmarked for performance.
Benchmark cache is not thread-safe. (This is marked as FIXME in the code, and will be fixed in a follow-up PR)
cuDNN benchmark is not supported.

There are failing tests, which will be resolved later:

FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (in...
FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 32 element(s) (out of 32) whose difference(s) exceeded the margin of error (...
FAILED test/test_nn.py::TestNNDeviceTypeCUDA::test_conv_large_cuda - RuntimeError: CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: 9
FAILED test/test_nn.py::TestNN::test_Conv2d_depthwise_naive_groups_cuda - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=1e-05, found 64 element(s) (out of 64) whose difference(s) exceeded the margin of error (including 0 an...
FAILED test/test_nn.py::TestNN::test_Conv2d_deterministic_cudnn - RuntimeError: not supported yet
FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_fp32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM
FAILED test/test_nn.py::TestNN::test_ConvTranspose2d_groups_cuda_tf32 - RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM

Although this is not a complete implementation of cuDNN v8 API binding, I still want to merge this first. This would allow me to do small and incremental work, for the ease of development and review.

…r in cudnn" Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

…r in cudnn" Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 031f88e Pull Request resolved: #70622

…r in cudnn" Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b58bbb3 Pull Request resolved: #70622

…r in cudnn" Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 88afb86 Pull Request resolved: #70622

…r in cudnn" Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

…r in cudnn" Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e3af2fc Pull Request resolved: #70622

…r in cudnn" Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 TODO: 1. Support bias, relu, support more parameter flexibilities 2. Use the packed_prams api Test Plan: ``` > USE_EXPERIMENTAL_CUDNN_V8_API=1 python setup.py install > python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn ``` debug command: ``` CUDNN_LOGINFO_DBG=1 CUDNN_LOGWARN_DBG=1 CUDNN_LOGERR_DBG=1 CUDNN_LOGDEST_DBG=stdout python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn > log ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D33409155](https://our.internmc.facebook.com/intern/diff/D33409155) [ghstack-poisoned]

Summary: This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b445548 Pull Request resolved: #70622

Summary: Pull Request resolved: #70622 This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Imported from OSS Reviewed By: vkuzo Differential Revision: D33409155 fbshipit-source-id: cb5183d274993fcd2c3ab6de8ae022baa9f89f7f

Summary: Pull Request resolved: #70622 This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code #51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Imported from OSS Reviewed By: vkuzo Differential Revision: D33409155 fbshipit-source-id: cb5183d274993fcd2c3ab6de8ae022baa9f89f7f (cherry picked from commit 4fde555)

Summary: Pull Request resolved: pytorch/pytorch#70622 This PR is the initial PR to add eager mode quantized GPU operator support, we'll start with convolution, following cudnn fp32 Conv code and the example cudnn frontend code pytorch/pytorch#51390 https://github.com/NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp#L557 Test Plan: python test/test_quantization.py TestQuantizedConv.test_qconv2d_cudnn Imported from OSS Reviewed By: vkuzo Differential Revision: D33409155 fbshipit-source-id: cb5183d274993fcd2c3ab6de8ae022baa9f89f7f (cherry picked from commit 4fde5559dee2a28907b09f96bc5a8dd259148d2e)

zasdfgbnm added 30 commits October 13, 2020 14:53

cuDNN v8 API support

b3c561c

save

214fc19

Merge branch 'master' of github.com:pytorch/pytorch into master

cb7b100

Merge branch 'master' of github.com:pytorch/pytorch into master

b38a12c

fix most

a4507ee

Merge branch 'master' of github.com:pytorch/pytorch into master

0963d8c

fix

8475e5f

save

520b3cf

support deterministic

18ef69d

save

e1a2b7b

save

3eaa2d5

Merge branch 'master' of github.com:pytorch/pytorch

c029396

save

46b43e2

rm fallback

9e4b897

align 16

664b1e8

cudnn-frontend

1720314

Merge branch 'master' of github.com:pytorch/pytorch

8b44a2d

Refactor

6954575

save

db868bf

fix

c4d7c8d

Merge branch 'master' of github.com:pytorch/pytorch

889447b

save

cff2f25

save

94e2acf

save

e1f8231

gh

baf403e

save

eb7ff58

save

6341ea4

save

6405b64

save

8e0a559

save

a27cbba

eqy added a commit to eqy/pytorch that referenced this pull request Feb 4, 2022

change from pytorch#51390

446e2a5

eqy added a commit to eqy/pytorch that referenced this pull request Mar 1, 2022

change from pytorch#51390

f59f453

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

step 0 of cuDNN v8 convolution API integration #51390

step 0 of cuDNN v8 convolution API integration #51390

Uh oh!

zasdfgbnm commented Jan 30, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

step 0 of cuDNN v8 convolution API integration #51390

step 0 of cuDNN v8 convolution API integration #51390

Uh oh!

Conversation

zasdfgbnm commented Jan 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zasdfgbnm commented Jan 30, 2021 •

edited

Loading