KEMBAR78
[cuDNN][SDPA][submodule] Roll-back cuDNN frontend upgrade, update Meta registration by eqy · Pull Request #163104 · pytorch/pytorch · GitHub
Skip to content

Conversation

@eqy
Copy link
Collaborator

@eqy eqy commented Sep 16, 2025

For pytorch/torchtitan#1713

Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time https://github.com/NVIDIA/cudnn-frontend/blame/1a7b4b78db44712fb9707d21cd2e3179f1fd88b8/include/cudnn_frontend/node/sdpa_support_surface.h#L447%C2%A0

cc @csarofeen @ptrblck @xwang233

@eqy eqy requested a review from syed-ahmed as a code owner September 16, 2025 21:18
@eqy eqy added module: cudnn Related to torch.backends.cudnn, and CuDNN support open source topic: not user facing topic category module: sdpa All things related to torch.nn.functional.scaled_dot_product_attentiion labels Sep 16, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163104

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8e7c7ae with merge base 5937861 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@eqy
Copy link
Collaborator Author

eqy commented Sep 16, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes "fix cuDNN segfault in SDPA"

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 16, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot cherry-pick: error: the following arguments are required: -c/--classification

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Try @pytorchbot --help for more info.

@eqy
Copy link
Collaborator Author

eqy commented Sep 16, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes "fix cuDNN segfault in SDPA" -c critical

@drisspg drisspg added this to the 2.9.0 milestone Sep 16, 2025
@eqy eqy added ciflow/h100 ciflow/trunk Trigger trunk jobs on your pull request labels Sep 16, 2025
@eqy
Copy link
Collaborator Author

eqy commented Sep 17, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

This PR updates submodules third_party/cudnn_frontend

If those updates are intentional, please add "submodule" keyword to PR title/description.

@eqy eqy changed the title [cuDNN][SDPA] Roll-back cuDNN frontend upgrade, update Meta registration [cuDNN][SDPA][submodule] Roll-back cuDNN frontend upgrade, update Meta registration Sep 17, 2025
@eqy
Copy link
Collaborator Author

eqy commented Sep 17, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

eqy added a commit to eqy/pytorch that referenced this pull request Sep 18, 2025
…a registration (pytorch#163104)

For pytorch/torchtitan#1713

Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time https://github.com/NVIDIA/cudnn-frontend/blame/1a7b4b78db44712fb9707d21cd2e3179f1fd88b8/include/cudnn_frontend/node/sdpa_support_surface.h#L447%C2%A0

Pull Request resolved: pytorch#163104
Approved by: https://github.com/drisspg
Camyll pushed a commit that referenced this pull request Sep 19, 2025
…de, update Met… (#163265)

[cuDNN][SDPA][submodule] Roll-back cuDNN frontend upgrade, update Meta registration (#163104)

For pytorch/torchtitan#1713

Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time https://github.com/NVIDIA/cudnn-frontend/blame/1a7b4b78db44712fb9707d21cd2e3179f1fd88b8/include/cudnn_frontend/node/sdpa_support_surface.h#L447%C2%A0

Pull Request resolved: #163104
Approved by: https://github.com/drisspg
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…a registration (pytorch#163104)

For pytorch/torchtitan#1713

Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time https://github.com/NVIDIA/cudnn-frontend/blame/1a7b4b78db44712fb9707d21cd2e3179f1fd88b8/include/cudnn_frontend/node/sdpa_support_surface.h#L447%C2%A0

Pull Request resolved: pytorch#163104
Approved by: https://github.com/drisspg
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…a registration (pytorch#163104)

For pytorch/torchtitan#1713

Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time https://github.com/NVIDIA/cudnn-frontend/blame/1a7b4b78db44712fb9707d21cd2e3179f1fd88b8/include/cudnn_frontend/node/sdpa_support_surface.h#L447%C2%A0

Pull Request resolved: pytorch#163104
Approved by: https://github.com/drisspg
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…a registration (pytorch#163104)

For pytorch/torchtitan#1713

Also note that we will need to rollback the cuDNN frontend upgrade in 2.9 as it currently introduces a segmentation fault by assuming tensors have their strides and sizes populated at graph creation time https://github.com/NVIDIA/cudnn-frontend/blame/1a7b4b78db44712fb9707d21cd2e3179f1fd88b8/include/cudnn_frontend/node/sdpa_support_surface.h#L447%C2%A0

Pull Request resolved: pytorch#163104
Approved by: https://github.com/drisspg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/h100 ciflow/trunk Trigger trunk jobs on your pull request Merged module: cudnn Related to torch.backends.cudnn, and CuDNN support module: sdpa All things related to torch.nn.functional.scaled_dot_product_attentiion open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants