[cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ #163581

eqy · 2025-09-22T22:52:47Z

To workaround #163539

Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing.

cc @csarofeen @ptrblck @xwang233 @msaroufim @jerryzh168 @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

pytorch-bot · 2025-09-22T22:52:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163581

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b3318d7 with merge base cf28ab2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ZejiaZheng · 2025-09-23T20:48:46Z

disabling cudnn for this specific op is not the solution for us unfortunately. Without cudnn we run into OOM on this conv3d quickly.

aten/src/ATen/native/Convolution.cpp

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

malfet · 2025-09-23T21:00:05Z

aten/src/ATen/native/Convolution.cpp

      }
+      // broken on cuDNN 9.8
+      if (cudnn_version >= 90800) {
+        if (input.scalar_type() == at::kBFloat16 || input.scalar_type() == at::kHalf) {


Is it broken just for inputs or for weights as well? Also, don't we have a reduced_precision_float bredicate?

I do not think we have convolutions that are mixing input and weight types as IIRC this fails checks

Are you referring to reduced precision reductions? Those are for cuBLAS matmuls only and control the use of split-k.

Or if you're referring to automatic mixed-precision/AMP that's doing casts of weight and input before the computation.

malfet · 2025-09-23T21:00:34Z

aten/src/ATen/native/Convolution.cpp

+        }
+      }
    }
    if (!input.is_cuda() || !cudnn_enabled) {


Q: Shouldn't this check be at the very beginning of the function

Yes, trying not to create a merge conflict for myself: #163171

eqy · 2025-09-25T16:50:19Z

Note that we still need this to disable cuDNN for this case in general as it is preferred over silent numerical incorrectness

eqy · 2025-09-26T00:49:47Z

@pytorchbot cherry-pick --onto release/2.9 --fixes #163539 -c critical

ngimel

Approving to unblock, but if the sizes in the issue don't require 64-bit indexing, how will that help?

eqy · 2025-09-26T01:22:03Z

Approving to unblock, but if the sizes in the issue don't require 64-bit indexing, how will that help?

Good catch! Fumbled an edit here

ngimel

Ouch, so this should disable cudnn for all of 3d cases?

ngimel · 2025-09-26T01:49:29Z

aten/src/ATen/native/Convolution.cpp

+    if (cudnn_version >= 90800) {
+      if (input.scalar_type() == at::kBFloat16 || input.scalar_type() == at::kHalf) {
+        for (auto val : weight.sizes()) {
+          if (val != 1) {


this is also checking channels of weight, not just filter? Also, this disables all cudnn convolutions, not just 3d?

That's a good point, in theory it would only need to be disabled for channels-first cases...

ngimel · 2025-09-26T03:04:08Z

aten/src/ATen/native/Convolution.cpp

+    if (cudnn_version >= 90800) {
+      if (cudnn_conv_suggest_memory_format(input, weight) == at::MemoryFormat::Contiguous &&
+          input.scalar_type() == at::kBFloat16 || input.scalar_type() == at::kHalf) {
+        for (int i = 2; i < weight.dim(); i++) {


does it affect both 2d and 3d, or 3d only?

I'll check with the team to confirm but my understand is it's 3d only as the user who reported said they could work around it with a 2D conv

so you need to also check weight.dim() == 5?

eqy · 2025-09-26T23:39:56Z

@pytorchmergebot merge

pytorchmergebot · 2025-09-26T23:41:47Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

eqy · 2025-09-26T23:56:06Z

@pytorchbot cherry-pick --onto release/2.9 --fixes #163539 -c critical

…ze != 1 for cuDNN 9.8+ (#163581) To workaround #163539 Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing. Pull Request resolved: #163581 Approved by: https://github.com/ngimel, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> (cherry picked from commit e2817ac)

pytorchbot · 2025-09-27T00:02:06Z

Cherry picking #163581

The cherry pick PR is at #164027 and it is linked with issue #163539. The following tracker issues are updated:

[v.2.9.0] Release Tracker #162497 (comment)

Details for Dev Infra team

Raised by workflow job

…ze != 1 for cuDNN 9.8+ (#163581) To workaround #163539 Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing. Pull Request resolved: #163581 Approved by: https://github.com/ngimel, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

…ze != 1 for cuDNN 9.8+ (#164027) [cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ (#163581) To workaround #163539 Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing. Pull Request resolved: #163581 Approved by: https://github.com/ngimel, https://github.com/malfet (cherry picked from commit e2817ac) Co-authored-by: Eddie Yan <eddiey@nvidia.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

…ze != 1 for cuDNN 9.8+ (pytorch#163581) To workaround pytorch#163539 Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing. Pull Request resolved: pytorch#163581 Approved by: https://github.com/ngimel, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

chec kin

3a7b509

eqy added module: cudnn Related to torch.backends.cudnn, and CuDNN support module: cuda Related to torch.cuda, and CUDA support in general module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) open source topic: bug fixes topic category release notes: cudnn labels Sep 22, 2025

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Sep 22, 2025

Skylion007 added this to the 2.9.0 milestone Sep 22, 2025

eqy added 2 commits September 23, 2025 15:35

fix test case and lint

cf759cc

update condition

eba4145

malfet reviewed Sep 23, 2025

View reviewed changes

aten/src/ATen/native/Convolution.cpp Outdated Show resolved Hide resolved

eqy and others added 2 commits September 23, 2025 13:58

Update aten/src/ATen/native/Convolution.cpp

63fecfb

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Update test_convolution.py

2819136

malfet reviewed Sep 23, 2025

View reviewed changes

jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 24, 2025

eqy changed the title ~~[cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8 - 9.13~~ [cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ Sep 24, 2025

ngimel approved these changes Sep 26, 2025

View reviewed changes

Update Convolution.cpp

3e48d78

ngimel reviewed Sep 26, 2025

View reviewed changes

Update Convolution.cpp

2a241a9

ngimel reviewed Sep 26, 2025

View reviewed changes

eqy added 2 commits September 25, 2025 20:16

Update Convolution.cpp

a027ac1

Update Convolution.cpp

9789ee0

eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 26, 2025

update tolerances for fp16, add large tensor decorator

b3318d7

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 26, 2025

malfet approved these changes Sep 26, 2025

View reviewed changes

pytorchmergebot added the merging label Sep 26, 2025

pytorchmergebot closed this in e2817ac Sep 26, 2025

pytorchmergebot added Merged and removed merging labels Sep 26, 2025

pytorchbot mentioned this pull request Sep 27, 2025

[v.2.9.0] Release Tracker #162497

Closed

jeffdaily mentioned this pull request Sep 29, 2025

DISABLED test_conv3d_cudnn_broken_cuda (__main__.TestConvolutionNNDeviceTypeCUDA) #164137

Closed

youkaichao mentioned this pull request Oct 23, 2025

4x performance regression for 3D convs with AMP on torch 2.9.0 #166122

Open

[cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ #163581

[cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ #163581

Conversation

eqy commented Sep 22, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163581

✅ No Failures

Uh oh!

ZejiaZheng commented Sep 23, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eqy commented Sep 25, 2025

Uh oh!

eqy commented Sep 26, 2025

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

eqy commented Sep 26, 2025

Uh oh!

ngimel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eqy commented Sep 26, 2025

Uh oh!

pytorchmergebot commented Sep 26, 2025

Merge started

Uh oh!

eqy commented Sep 26, 2025

Uh oh!

pytorchbot commented Sep 27, 2025

Cherry picking #163581

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

eqy commented Sep 22, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 22, 2025 •

edited

Loading

ngimel left a comment •

edited

Loading