[cuDNN][SDPA] Update cuDNN grad output layout check #141147

eqy · 2024-11-20T18:49:32Z

Thanks to #137978 from @Skylion007 which bumps to cuDNN 9.5.1 the broken assumption of dO strides == O strides is fixed

Note that there is still the restriction that the innermost stride of the grad output is 1 (this is almost always guaranteed because this condition is required of the input tensors). The main exception would be in test code that does e.g., .sum().backward() which yields grad output tensors with strides [0, 0, 0, 0].

CC @drisspg

cc @csarofeen @ptrblck @xwang233 @drisspg @mikaylagawarecki

pytorch-bot · 2024-11-20T18:49:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141147

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 183acff with merge base 78491d6 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 2, 5, linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh) (similar failure)
test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_3_cuda
periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 3, 5, linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh) (similar failure)
test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_linear_cuda_float32
periodic / linux-focal-cuda11.8-py3.10-gcc9-debug / test (default, 4, 5, linux.4xlarge.nvidia.gpu, oncall:debug-build) (gh) (similar failure)
test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_backward_memory_usage_cuda_float32

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/cudnn/MHA.cpp

drisspg · 2024-11-21T05:37:06Z

aten/src/ATen/native/cudnn/MHA.cpp

+  const auto innermost_dO_stride = dO.strides()[dO.strides().size() - 1];
+  if (innermost_dO_stride != 1) {
+    TORCH_WARN_ONCE(
+        "cuDNN SDPA backward got grad_output with an innermost stride != 1 "


I wonder if we should even warn once here since a lot of people still use this for testing and this is kind of just like log spams them and there's nothing more efficient they can do.

I mean, we can tell them to update to a version of PyTorch that support cudnn 9.5.1

For this check it's a known limitation due to the way the kernel is architected and is not planned to be fixed

eqy · 2024-11-25T19:04:16Z

@pytorchmergebot rebase

pytorchmergebot · 2024-11-25T19:05:43Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-11-25T19:05:46Z

Successfully rebased cudnn951dO onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cudnn951dO && git pull --rebase)

eqy · 2024-11-25T20:53:55Z

@pytorchmergebot merge

pytorchmergebot · 2024-11-25T20:55:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-25T23:37:58Z

Merge failed

Reason: 1 jobs have failed, first few of them are: periodic / linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, lf.linux.g5.12xlarge.nvidia.gpu, oncall:distributed)

Details for Dev Infra team

Raised by workflow job

eqy · 2024-11-25T23:53:10Z

@pytorchmergebot rebase

pytorchmergebot · 2024-11-25T23:54:38Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-11-25T23:54:41Z

Successfully rebased cudnn951dO onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cudnn951dO && git pull --rebase)

eqy · 2024-11-26T19:09:05Z

@pytorchmergebot merge

pytorchmergebot · 2024-11-26T19:10:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@Skylion007

Thanks to pytorch#137978 from @Skylion007 which bumps to cuDNN 9.5.1 the broken assumption of dO strides == O strides is fixed Note that there is still the restriction that the innermost stride of the grad output is 1 (this is almost always guaranteed because this condition is required of the input tensors). The main exception would be in test code that does e.g., `.sum().backward()` which yields grad output tensors with strides `[0, 0, 0, 0]`. CC @drisspg Pull Request resolved: pytorch#141147 Approved by: https://github.com/drisspg

eqy added module: cudnn Related to torch.backends.cudnn, and CuDNN support open source topic: not user facing topic category module: multi-headed-attention labels Nov 20, 2024

eqy requested a review from syed-ahmed as a code owner November 20, 2024 18:49

drisspg reviewed Nov 20, 2024

View reviewed changes

aten/src/ATen/native/cudnn/MHA.cpp Show resolved Hide resolved

drisspg reviewed Nov 21, 2024

View reviewed changes

drisspg approved these changes Nov 21, 2024

View reviewed changes

eqy added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Nov 21, 2024

pytorchmergebot force-pushed the cudnn951dO branch from b99e438 to 6af9575 Compare November 25, 2024 19:05

pytorchmergebot added the merging label Nov 25, 2024

pytorchmergebot removed the merging label Nov 25, 2024

eqy added 3 commits November 25, 2024 23:54

update permute check

d827b26

Update MHA.cpp

d9457ac

lint

183acff

pytorchmergebot force-pushed the cudnn951dO branch from 6af9575 to 183acff Compare November 25, 2024 23:54

pytorchmergebot added the merging label Nov 26, 2024

pytorchmergebot added the Merged label Nov 26, 2024

pytorchmergebot closed this in 816ca98 Nov 26, 2024

pytorchmergebot removed the merging label Nov 26, 2024

[cuDNN][SDPA] Update cuDNN grad output layout check #141147

[cuDNN][SDPA] Update cuDNN grad output layout check #141147

Uh oh!

Conversation

eqy commented Nov 20, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141147

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

Uh oh!

drisspg Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

Skylion007 Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

eqy Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eqy commented Nov 25, 2024

Uh oh!

pytorchmergebot commented Nov 25, 2024

Uh oh!

pytorchmergebot commented Nov 25, 2024

Uh oh!

eqy commented Nov 25, 2024

Uh oh!

pytorchmergebot commented Nov 25, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 25, 2024

Merge failed

Uh oh!

eqy commented Nov 25, 2024

Uh oh!

pytorchmergebot commented Nov 25, 2024

Uh oh!

pytorchmergebot commented Nov 25, 2024

Uh oh!

eqy commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eqy commented Nov 20, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 20, 2024 •

edited

Loading

eqy Nov 21, 2024 •

edited

Loading