KEMBAR78
[cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ by eqy · Pull Request #163581 · pytorch/pytorch · GitHub
Skip to content

Conversation

@eqy
Copy link
Collaborator

@eqy eqy commented Sep 22, 2025

To workaround #163539

Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing.

cc @csarofeen @ptrblck @xwang233 @msaroufim @jerryzh168 @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

@eqy eqy added module: cudnn Related to torch.backends.cudnn, and CuDNN support module: cuda Related to torch.cuda, and CUDA support in general module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) open source topic: bug fixes topic category release notes: cudnn labels Sep 22, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163581

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b3318d7 with merge base cf28ab2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Sep 22, 2025
@Skylion007 Skylion007 added this to the 2.9.0 milestone Sep 22, 2025
@ZejiaZheng
Copy link

disabling cudnn for this specific op is not the solution for us unfortunately. Without cudnn we run into OOM on this conv3d quickly.

eqy and others added 2 commits September 23, 2025 13:58
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
}
// broken on cuDNN 9.8
if (cudnn_version >= 90800) {
if (input.scalar_type() == at::kBFloat16 || input.scalar_type() == at::kHalf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it broken just for inputs or for weights as well? Also, don't we have a reduced_precision_float bredicate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we have convolutions that are mixing input and weight types as IIRC this fails checks

Are you referring to reduced precision reductions? Those are for cuBLAS matmuls only and control the use of split-k.

Or if you're referring to automatic mixed-precision/AMP that's doing casts of weight and input before the computation.

}
}
}
if (!input.is_cuda() || !cudnn_enabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Shouldn't this check be at the very beginning of the function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, trying not to create a merge conflict for myself: #163171

@jbschlosser jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 24, 2025
@eqy eqy changed the title [cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8 - 9.13 [cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ Sep 24, 2025
@eqy
Copy link
Collaborator Author

eqy commented Sep 25, 2025

Note that we still need this to disable cuDNN for this case in general as it is preferred over silent numerical incorrectness

@eqy
Copy link
Collaborator Author

eqy commented Sep 26, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes #163539 -c critical

Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, but if the sizes in the issue don't require 64-bit indexing, how will that help?

@eqy
Copy link
Collaborator Author

eqy commented Sep 26, 2025

Approving to unblock, but if the sizes in the issue don't require 64-bit indexing, how will that help?

Good catch! Fumbled an edit here

Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, so this should disable cudnn for all of 3d cases?

if (cudnn_version >= 90800) {
if (input.scalar_type() == at::kBFloat16 || input.scalar_type() == at::kHalf) {
for (auto val : weight.sizes()) {
if (val != 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also checking channels of weight, not just filter? Also, this disables all cudnn convolutions, not just 3d?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, in theory it would only need to be disabled for channels-first cases...

if (cudnn_version >= 90800) {
if (cudnn_conv_suggest_memory_format(input, weight) == at::MemoryFormat::Contiguous &&
input.scalar_type() == at::kBFloat16 || input.scalar_type() == at::kHalf) {
for (int i = 2; i < weight.dim(); i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it affect both 2d and 3d, or 3d only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check with the team to confirm but my understand is it's 3d only as the user who reported said they could work around it with a 2D conv

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you need to also check weight.dim() == 5?

@eqy eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 26, 2025
@eqy
Copy link
Collaborator Author

eqy commented Sep 26, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@eqy
Copy link
Collaborator Author

eqy commented Sep 26, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes #163539 -c critical

pytorchbot pushed a commit that referenced this pull request Sep 27, 2025
…ze != 1 for cuDNN 9.8+ (#163581)

To workaround #163539

Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing.

Pull Request resolved: #163581
Approved by: https://github.com/ngimel, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
(cherry picked from commit e2817ac)
@pytorchbot
Copy link
Collaborator

Cherry picking #163581

The cherry pick PR is at #164027 and it is linked with issue #163539. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

jainapurva pushed a commit that referenced this pull request Sep 29, 2025
…ze != 1 for cuDNN 9.8+ (#163581)

To workaround #163539

Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing.

Pull Request resolved: #163581
Approved by: https://github.com/ngimel, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Camyll pushed a commit that referenced this pull request Sep 29, 2025
…ze != 1 for cuDNN 9.8+ (#164027)

[cuDNN][Convolution] Disable cuDNN for 3D convolutions with kernel size != 1 for cuDNN 9.8+ (#163581)

To workaround #163539

Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing.

Pull Request resolved: #163581
Approved by: https://github.com/ngimel, https://github.com/malfet


(cherry picked from commit e2817ac)

Co-authored-by: Eddie Yan <eddiey@nvidia.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
maggiemoss pushed a commit to maggiemoss/pytorch that referenced this pull request Sep 29, 2025
…ze != 1 for cuDNN 9.8+ (pytorch#163581)

To workaround pytorch#163539

Still confirming whether 9.10 is affected. The original test states that the convolution is "large," but note that the input size does not apepar to require 64-bit indexing.

Pull Request resolved: pytorch#163581
Approved by: https://github.com/ngimel, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: cudnn Related to torch.backends.cudnn, and CuDNN support module: inductor open source release notes: cudnn topic: bug fixes topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants