[ROCm/Windows] Fix build failures and support some BLAS calls #161981

jammm · 2025-09-02T15:45:08Z

Support getrsBatched/geqrfBatched/gelsBatched on Windows ROCm (fixes [Issue]: The uni_pc sampler is not supported in comfyui(not supported for HIP on Windows) ROCm/TheRock#1367)
Fix windows pytorch build with USE_DISTRIBUTED=ON by default

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

pytorch-bot · 2025-09-02T15:45:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161981

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 188370b with merge base fca2601 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jammm · 2025-09-02T15:46:10Z

cc @ScottTodd

ScottTodd · 2025-09-02T18:37:24Z

aten/src/ATen/cuda/CUDABlas.cpp

I'll have more time to review these changes in detail later. In the meantime, can you link to some logs (github workflow history or local) showing what specific build failures these changes fix?

We may also want to check if these changes are compatible with both rocm-sdk (TheRock) and HIP SDK based Windows builds. cc @m-gallus

The first commit fixes a compiler error related to a missing static keyword in the prototype. I'll do a rebuild without this commit and include the error logs when its done.

The second commit fixes a runtime error due to unimplemented CUDA_BLAS calls on Windows ROCm/TheRock#1367

This could also solve ROCm/TheRock#1314

The first commit fixes a compiler error related to a missing static keyword in the prototype. I'll do a rebuild without this commit and include the error logs when its done.

The error is:

D:/b/torch/csrc/distributed/c10d/FileStore.cpp(36,5): error: no previous prototype for function 'flock_' [-Werror,-Wmissing-prototypes] 36 | int flock_(int fd, int op) { | ^ D:/b/torch/csrc/distributed/c10d/FileStore.cpp(36,1): note: declare 'static' if the function is not intended to be used outside of this translation unit 36 | int flock_(int fd, int op) { | ^ | static

Thanks for letting me know @ScottTodd , I'll have someone take a look.

Thanks. Our nightly Windows PyTorch builds in TheRock are running again and just ran into that error too: https://github.com/ROCm/TheRock/actions/runs/17433525478/job/49497325330

2025-09-03T12:58:05.3041175Z [3009/7072] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\distributed\c10d\FileStore.cpp.obj 2025-09-03T12:58:05.3041689Z FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/FileStore.cpp.obj 2025-09-03T12:58:05.3052618Z C:\home\runner\_work\_tool\Python\3.12.10\x64\Lib\site-packages\_rocm_sdk_devel\lib\llvm\bin\clang-cl.exe /nologo -TP -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DEXPORT_AOTI_FUNCTIONS -DFMT_HEADER_ONLY=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DROCM_ON_WINDOWS -DROCM_VERSION=70000 -DTORCH_HIP_VERSION=701 -DUSE_EXTERNAL_MZCRC -DUSE_MIMALLOC -DUSE_ROCM -DWIN32_LEAN_AND_MEAN -DXNN_LOG_LEVEL=0 -D_CRT_SECURE_NO_DEPRECATE=1 -D_UCRT_LEGACY_INFINITY -D__HIP_PLATFORM_AMD__ -Dtorch_cpu_EXPORTS -IB:\src\torch\build\aten\src -IB:\src\torch\aten\src -IB:\src\torch\build -IB:\src\torch -IC:\home\runner\_work\_tool\Python\3.12.10\x64\Lib\site-packages\_rocm_sdk_devel\include -IB:\src\torch\nlohmann -IB:\src\torch\moodycamel -IB:\src\torch\third_party\mimalloc\include -IB:\src\torch\torch\csrc\api -IB:\src\torch\torch\csrc\api\include -IB:\src\torch\caffe2\aten\src\TH -IB:\src\torch\build\caffe2\aten\src\TH -IB:\src\torch\build\caffe2\aten\src -IB:\src\torch\build\caffe2\..\aten\src -IB:\src\torch\torch\csrc -IB:\src\torch\torch\headeronly -IB:\src\torch\third_party\miniz-3.0.2 -IB:\src\torch\third_party\kineto\libkineto\include -IB:\src\torch\third_party\cpp-httplib -IB:\src\torch\aten\src\ATen\.. -IB:\src\torch\c10\.. -IB:\src\torch\third_party\pthreadpool\include -IB:\src\torch\third_party\cpuinfo\include -IB:\src\torch\third_party\FP16\include -IB:\src\torch\third_party\fmt\include -IB:\src\torch\build\third_party\ideep\mkl-dnn\include -IB:\src\torch\third_party\ideep\mkl-dnn\src\..\include -IB:\src\torch\third_party\onnx -IB:\src\torch\build\third_party\onnx -IB:\src\torch\third_party\flatbuffers\include -imsvcB:\src\torch\third_party\protobuf\src -imsvcB:\src\torch\third_party\XNNPACK\include -imsvcB:\src\torch\third_party\ittapi\include -imsvcB:\src\torch\cmake\..\third_party\eigen -imsvcB:\src\torch\third_party\ideep\mkl-dnn\include\oneapi\dnnl -imsvcB:\src\torch\third_party\ideep\include -imsvcB:\src\torch\INTERFACE -imsvcB:\src\torch\third_party\nlohmann\include -imsvcB:\src\torch\third_party\concurrentqueue -imsvcB:\src\torch\build\include /DWIN32 /D_WINDOWS /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273 -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /Zc:preprocessor /Zc:preprocessor /O2 /Ob2 /DNDEBUG /bigobj -DNDEBUG -std:c++17 -MD /permissive- /EHsc /bigobj -O2 -Wmissing-prototypes -Werror=missing-prototypes -DONNX_BUILD_MAIN_LIB -Xclang -fopenmp /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\distributed\c10d\FileStore.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ -c -- B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp 2025-09-03T12:58:05.3062232Z clang-cl: warning: argument unused during compilation: '/Zc:preprocessor' [-Wunused-command-line-argument] 2025-09-03T12:58:05.3062886Z clang-cl: warning: argument unused during compilation: '/Zc:preprocessor' [-Wunused-command-line-argument] 2025-09-03T12:58:05.3063438Z In file included from B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp:2: 2025-09-03T12:58:05.3063864Z In file included from B:\src\torch\torch/csrc/distributed/c10d/FileStore.hpp:8: 2025-09-03T12:58:05.3064278Z In file included from B:\src\torch\torch/csrc/distributed/c10d/Store.hpp:9: 2025-09-03T12:58:05.3064677Z In file included from B:\src\torch\torch/custom_class.h:3: 2025-09-03T12:58:05.3065105Z In file included from B:\src\torch\aten\src\ATen/core/builtin_function.h:3: 2025-09-03T12:58:05.3065548Z In file included from B:\src\torch\aten\src\ATen/core/function.h:3: 2025-09-03T12:58:05.3066451Z B:\src\torch\aten\src\ATen/core/function_schema.h(211,13): warning: 'dllexport' attribute only applies to functions, variables, classes, and Objective-C interfaces [-Wignored-attributes] 2025-09-03T12:58:05.3067260Z 211 | enum struct TORCH_API SchemaArgType { input, output }; 2025-09-03T12:58:05.3067524Z | ^ 2025-09-03T12:58:05.3067860Z B:\src\torch\torch/headeronly/macros/Export.h(98,19): note: expanded from macro 'TORCH_API' 2025-09-03T12:58:05.3068238Z 98 | #define TORCH_API C10_EXPORT 2025-09-03T12:58:05.3068444Z | ^ 2025-09-03T12:58:05.3068779Z B:\src\torch\torch/headeronly/macros/Export.h(52,31): note: expanded from macro 'C10_EXPORT' 2025-09-03T12:58:05.3069160Z 52 | #define C10_EXPORT __declspec(dllexport) 2025-09-03T12:58:05.3069394Z | ^ 2025-09-03T12:58:05.3069909Z B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp(36,5): error: no previous prototype for function 'flock_' [-Werror,-Wmissing-prototypes] 2025-09-03T12:58:05.3070432Z 36 | int flock_(int fd, int op) { 2025-09-03T12:58:05.3070640Z | ^ 2025-09-03T12:58:05.3071127Z B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp(36,1): note: declare 'static' if the function is not intended to be used outside of this translation unit 2025-09-03T12:58:05.3071669Z 36 | int flock_(int fd, int op) { 2025-09-03T12:58:05.3071861Z | ^ 2025-09-03T12:58:05.3071998Z | static

Great, thanks! Who can help with merging this? I've never merged a pytorch PR before :D

Someone already approved some workflows, so the next step as I see it is for one of the reviewers (or @jeffdaily ) to review, approve, and then trigger a merge.

I'm not sure how to add more reviewers. Hopefully Jeff sees this and approves ^^

thanks a lot @jeffdaily !

pytorch-bot · 2025-09-03T17:29:01Z

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

jeffdaily · 2025-09-03T17:42:58Z

@pytorchbot merge

pytorchmergebot · 2025-09-03T17:44:50Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…h#161981) * Support getrsBatched/geqrfBatched/gelsBatched on Windows ROCm (fixes ROCm/TheRock#1367) * Fix windows pytorch build with USE_DISTRIBUTED=ON by default Pull Request resolved: pytorch#161981 Approved by: https://github.com/ScottTodd, https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

jammm added 2 commits September 3, 2025 00:44

Support getrsBatched/geqrfBatched/gelsBatched on Windows ROCm

7c8622e

Fix windows pytorch build with USE_DISTRIBUTED=ON by default

396a12f

jammm requested review from eqy and syed-ahmed as code owners September 2, 2025 15:45

pytorch-bot bot added module: rocm AMD GPU support for Pytorch oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Sep 2, 2025

pytorchbot added the open source label Sep 2, 2025

jammm changed the title ~~[ROCm/Windows] Fix build failures~~ [ROCm/Windows] Fix build failures and support some CUDABLAS calls Sep 2, 2025

ScottTodd reviewed Sep 2, 2025

View reviewed changes

jammm requested a review from ScottTodd September 3, 2025 07:29

ScottTodd approved these changes Sep 3, 2025

View reviewed changes

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 3, 2025

lint

188370b

jeffdaily approved these changes Sep 3, 2025

View reviewed changes

jeffdaily changed the title ~~[ROCm/Windows] Fix build failures and support some CUDABLAS calls~~ [ROCm/Windows] Fix build failures and support some BLAS calls Sep 3, 2025

pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Sep 3, 2025

pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Sep 3, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 3, 2025

pytorchmergebot added the merging label Sep 3, 2025

ScottTodd mentioned this pull request Sep 3, 2025

Add patch to fix get{rf/lf/rs}Batched support ROCm/TheRock#1374

Closed

pytorchmergebot added the Merged label Sep 3, 2025

pytorchmergebot closed this in 8e23a12 Sep 3, 2025

pytorchmergebot removed the merging label Sep 3, 2025

[ROCm/Windows] Fix build failures and support some BLAS calls #161981

[ROCm/Windows] Fix build failures and support some BLAS calls #161981

Uh oh!

Conversation

jammm commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161981

✅ No Failures

Uh oh!

jammm commented Sep 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pytorch-bot bot commented Sep 3, 2025

Uh oh!

jeffdaily commented Sep 3, 2025

Uh oh!

pytorchmergebot commented Sep 3, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jammm commented Sep 2, 2025 •

edited

Loading

pytorch-bot bot commented Sep 2, 2025 •

edited

Loading