cublaslt/hipblaslt persistent workspace #156495

jeffdaily · 2025-06-20T15:32:35Z

Similar to cublas/hipblas, LT now allocates one workspace per handle+stream combo.

fixes hipblaslt issue where memory use increased during graph capture
preserves CUDA env var TORCH_CUBLASLT_UNIFIED_WORKSPACE
moves LT workspace and size from CUDABlas.cpp into CublasHandlePool.cpp, new APIs
- size_t getCUDABlasLtWorkspaceSize()
- void* getCUDABlasLtWorkspace()

Similar to cublas/hipblas, LT now allocates one workspace per handle+stream combo. - fixes hipblaslt issue where memory use increased during graph capture - preserves CUDA env var TORCH_CUBLASLT_UNIFIED_WORKSPACE - moves LT workspace and size from CUDABlas.cpp into CublasHandlePool.cpp, new APIs - size_t getCUDABlasLtWorkspaceSize() - void* getCUDABlasLtWorkspace()

pytorch-bot · 2025-06-20T15:32:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156495

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 35928f5 with merge base 3644b41 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jeffdaily · 2025-06-20T22:00:34Z

@eqy this PR touches nvidia-specific code, and the unified cublas/cublaslt workspace option that you recently worked on. I would appreciate your review. This change was needed to fix memory behavior on ROCm, but it seemed like the correct thing to do was make this change for CUDA, as well.

eqy · 2025-06-21T09:23:12Z

Sure I'm OOTO until next Friday, can I take a look then or is this blocking something on your end?

jeffdaily · 2025-06-23T16:07:37Z

@eqy thanks for taking a look. Not blocking anything.

jeffdaily · 2025-06-27T19:35:54Z

@pytorchbot merge

pytorchmergebot · 2025-06-27T19:37:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-06-27T23:45:32Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / cuda12.8-py3.10-gcc9-sm80 / build

Details for Dev Infra team

Raised by workflow job

jeffdaily · 2025-06-28T22:31:08Z

@pytorchbot merge

pytorchmergebot · 2025-06-28T22:33:03Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jeanschmidt · 2025-06-30T12:07:39Z

@jeffdaily since this PR merged, a new error is happening in rocom trunk tests, not sure if this is to blame,

https://github.com/pytorch/pytorch/actions/runs/15948855901/job/44987650811

Should we revert to evaluate?

jeffdaily · 2025-06-30T22:29:11Z

@jeanschmidt the test is passing in latest CI runs. Using latest tip of main I could not repro using the command from the CI link

PYTORCH_TEST_WITH_ROCM=1 python test/test_cuda.py TestMemPool.test_mempool_limited_memory_with_allocator

Cherry-pick of pytorch#156495. --------- Co-authored-by: Eddie Yan <eddiey@nvidia.com>

Similar to cublas/hipblas, LT now allocates one workspace per handle+stream combo. - fixes hipblaslt issue where memory use increased during graph capture - preserves CUDA env var TORCH_CUBLASLT_UNIFIED_WORKSPACE - moves LT workspace and size from CUDABlas.cpp into CublasHandlePool.cpp, new APIs - size_t getCUDABlasLtWorkspaceSize() - void* getCUDABlasLtWorkspace() Fixes #2286. Pull Request resolved: pytorch#156495 Approved by: https://github.com/eqy (cherry picked from commit 996206e)

Similar to cublas/hipblas, LT now allocates one workspace per handle+stream combo. - fixes hipblaslt issue where memory use increased during graph capture - preserves CUDA env var TORCH_CUBLASLT_UNIFIED_WORKSPACE - moves LT workspace and size from CUDABlas.cpp into CublasHandlePool.cpp, new APIs - size_t getCUDABlasLtWorkspaceSize() - void* getCUDABlasLtWorkspace() Fixes ROCm#2286. Pull Request resolved: pytorch#156495 Approved by: https://github.com/eqy (cherry picked from commit 996206e)

Similar to cublas/hipblas, LT now allocates one workspace per handle+stream combo. - fixes hipblaslt issue where memory use increased during graph capture - preserves CUDA env var TORCH_CUBLASLT_UNIFIED_WORKSPACE - moves LT workspace and size from CUDABlas.cpp into CublasHandlePool.cpp, new APIs - size_t getCUDABlasLtWorkspaceSize() - void* getCUDABlasLtWorkspace() Fixes #2286. Pull Request resolved: pytorch#156495 Approved by: https://github.com/eqy (cherry picked from commit 996206e)

jeffdaily marked this pull request as ready for review June 20, 2025 15:32

jeffdaily requested review from eqy and syed-ahmed as code owners June 20, 2025 15:32

jeffdaily added the ciflow/rocm Trigger "default" config CI on ROCm label Jun 20, 2025

pytorchbot added the open source label Jun 20, 2025

jeffdaily added release notes: rocm mandatorylabel release notes: cuda release notes category labels Jun 20, 2025

jeffdaily mentioned this pull request Jun 20, 2025

Memory issues in nn.Linear due to creation of an empty tensor ROCm/pytorch#2286

Closed

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 26, 2025

eqy approved these changes Jun 27, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 27, 2025

pytorchmergebot added the merging label Jun 27, 2025

pytorchmergebot removed the merging label Jun 27, 2025

pytorchmergebot added the merging label Jun 28, 2025

pytorchmergebot closed this in 996206e Jun 28, 2025

pytorchmergebot added Merged and removed merging labels Jun 28, 2025

jeffdaily mentioned this pull request Jul 1, 2025

hipblaslt persistent workspace ROCm/pytorch#2290

Merged

pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request Jul 15, 2025

hipblaslt persistent workspace (#2290)

fc7568d

Cherry-pick of pytorch#156495. --------- Co-authored-by: Eddie Yan <eddiey@nvidia.com>

eqy mentioned this pull request Sep 18, 2025

[CUDA][Memory Snapshot] test_memory_compile_regions appears to interfere with subsequent record_memory_history tests #163202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cublaslt/hipblaslt persistent workspace #156495

cublaslt/hipblaslt persistent workspace #156495

jeffdaily commented Jun 20, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 20, 2025 •

edited

Loading

Uh oh!

jeffdaily commented Jun 20, 2025

Uh oh!

eqy commented Jun 21, 2025

Uh oh!

jeffdaily commented Jun 23, 2025

Uh oh!

jeffdaily commented Jun 27, 2025

Uh oh!

pytorchmergebot commented Jun 27, 2025

Uh oh!

pytorchmergebot commented Jun 27, 2025

Uh oh!

jeffdaily commented Jun 28, 2025

Uh oh!

pytorchmergebot commented Jun 28, 2025

Uh oh!

jeanschmidt commented Jun 30, 2025

Uh oh!

jeffdaily commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cublaslt/hipblaslt persistent workspace #156495

cublaslt/hipblaslt persistent workspace #156495

Conversation

jeffdaily commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156495

✅ No Failures

Uh oh!

jeffdaily commented Jun 20, 2025

Uh oh!

eqy commented Jun 21, 2025

Uh oh!

jeffdaily commented Jun 23, 2025

Uh oh!

jeffdaily commented Jun 27, 2025

Uh oh!

pytorchmergebot commented Jun 27, 2025

Merge started

Uh oh!

pytorchmergebot commented Jun 27, 2025

Merge failed

Uh oh!

jeffdaily commented Jun 28, 2025

Uh oh!

pytorchmergebot commented Jun 28, 2025

Merge started

Uh oh!

jeanschmidt commented Jun 30, 2025

Uh oh!

jeffdaily commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jeffdaily commented Jun 20, 2025 •

edited

Loading

pytorch-bot bot commented Jun 20, 2025 •

edited

Loading