KEMBAR78
[ROCm] Allow use of rocSOLVER for Cholesky inversion. by naromero77amd · Pull Request #157154 · pytorch/pytorch · GitHub
Skip to content

Conversation

@naromero77amd
Copy link
Collaborator

@naromero77amd naromero77amd commented Jun 27, 2025

Fixes #155046

This change allows Cholesky inversion to use rocSOLVER. This is now also the default on ROCm for Cholesky inversion which aligns with the behavior on NVIDIA (which defaults to cuSOLVER for this linear algebra operation). This fix also gets around a memory access fault encountered in MAGMA for large matrices.

MAGMA can still be forced on ROCm by doing:

torch.backends.cuda.preferred_linalg_library(backend='magma')

Ran all Cholesky UT on ROCm and there were no regressions.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157154

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 3446548 with merge base d0cfa3e (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Jun 27, 2025
@naromero77amd naromero77amd requested a review from jeffdaily June 27, 2025 20:05
@naromero77amd naromero77amd added the release notes: linalg_frontend release notes category label Jun 27, 2025
@jeffdaily jeffdaily added ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Jun 27, 2025
@naromero77amd
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased fix_rocm_cholesky_inverse onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix_rocm_cholesky_inverse && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the fix_rocm_cholesky_inverse branch from 5db9a84 to 3446548 Compare June 27, 2025 23:21
@pytorch-bot pytorch-bot bot removed ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Jun 27, 2025
@jeffdaily jeffdaily added ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Jun 28, 2025
@jeffdaily
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 28, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source release notes: linalg_frontend release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ROCm: torch.cholesky_inverse raises Memory access fault for large tensor shapes

4 participants