KEMBAR78
[ROCm/Windows] Fix build failures and support some BLAS calls by jammm · Pull Request #161981 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jammm
Copy link
Contributor

@jammm jammm commented Sep 2, 2025

@jammm jammm requested review from eqy and syed-ahmed as code owners September 2, 2025 15:45
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161981

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 188370b with merge base fca2601 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Sep 2, 2025
@jammm
Copy link
Contributor Author

jammm commented Sep 2, 2025

cc @ScottTodd

@jammm jammm changed the title [ROCm/Windows] Fix build failures [ROCm/Windows] Fix build failures and support some CUDABLAS calls Sep 2, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have more time to review these changes in detail later. In the meantime, can you link to some logs (github workflow history or local) showing what specific build failures these changes fix?

We may also want to check if these changes are compatible with both rocm-sdk (TheRock) and HIP SDK based Windows builds. cc @m-gallus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit fixes a compiler error related to a missing static keyword in the prototype. I'll do a rebuild without this commit and include the error logs when its done.

The second commit fixes a runtime error due to unimplemented CUDA_BLAS calls on Windows ROCm/TheRock#1367

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also solve ROCm/TheRock#1314

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit fixes a compiler error related to a missing static keyword in the prototype. I'll do a rebuild without this commit and include the error logs when its done.

The error is:

D:/b/torch/csrc/distributed/c10d/FileStore.cpp(36,5): error: no previous prototype for function 'flock_' [-Werror,-Wmissing-prototypes]
   36 | int flock_(int fd, int op) {
      |     ^
D:/b/torch/csrc/distributed/c10d/FileStore.cpp(36,1): note: declare 'static' if the function is not intended to be used outside of this translation unit
   36 | int flock_(int fd, int op) {
      | ^
      | static

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know @ScottTodd , I'll have someone take a look.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Our nightly Windows PyTorch builds in TheRock are running again and just ran into that error too: https://github.com/ROCm/TheRock/actions/runs/17433525478/job/49497325330

2025-09-03T12:58:05.3041175Z [3009/7072] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\distributed\c10d\FileStore.cpp.obj
2025-09-03T12:58:05.3041689Z FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/FileStore.cpp.obj 
2025-09-03T12:58:05.3052618Z C:\home\runner\_work\_tool\Python\3.12.10\x64\Lib\site-packages\_rocm_sdk_devel\lib\llvm\bin\clang-cl.exe  /nologo -TP -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DEXPORT_AOTI_FUNCTIONS -DFMT_HEADER_ONLY=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DROCM_ON_WINDOWS -DROCM_VERSION=70000 -DTORCH_HIP_VERSION=701 -DUSE_EXTERNAL_MZCRC -DUSE_MIMALLOC -DUSE_ROCM -DWIN32_LEAN_AND_MEAN -DXNN_LOG_LEVEL=0 -D_CRT_SECURE_NO_DEPRECATE=1 -D_UCRT_LEGACY_INFINITY -D__HIP_PLATFORM_AMD__ -Dtorch_cpu_EXPORTS -IB:\src\torch\build\aten\src -IB:\src\torch\aten\src -IB:\src\torch\build -IB:\src\torch -IC:\home\runner\_work\_tool\Python\3.12.10\x64\Lib\site-packages\_rocm_sdk_devel\include -IB:\src\torch\nlohmann -IB:\src\torch\moodycamel -IB:\src\torch\third_party\mimalloc\include -IB:\src\torch\torch\csrc\api -IB:\src\torch\torch\csrc\api\include -IB:\src\torch\caffe2\aten\src\TH -IB:\src\torch\build\caffe2\aten\src\TH -IB:\src\torch\build\caffe2\aten\src -IB:\src\torch\build\caffe2\..\aten\src -IB:\src\torch\torch\csrc -IB:\src\torch\torch\headeronly -IB:\src\torch\third_party\miniz-3.0.2 -IB:\src\torch\third_party\kineto\libkineto\include -IB:\src\torch\third_party\cpp-httplib -IB:\src\torch\aten\src\ATen\.. -IB:\src\torch\c10\.. -IB:\src\torch\third_party\pthreadpool\include -IB:\src\torch\third_party\cpuinfo\include -IB:\src\torch\third_party\FP16\include -IB:\src\torch\third_party\fmt\include -IB:\src\torch\build\third_party\ideep\mkl-dnn\include -IB:\src\torch\third_party\ideep\mkl-dnn\src\..\include -IB:\src\torch\third_party\onnx -IB:\src\torch\build\third_party\onnx -IB:\src\torch\third_party\flatbuffers\include -imsvcB:\src\torch\third_party\protobuf\src -imsvcB:\src\torch\third_party\XNNPACK\include -imsvcB:\src\torch\third_party\ittapi\include -imsvcB:\src\torch\cmake\..\third_party\eigen -imsvcB:\src\torch\third_party\ideep\mkl-dnn\include\oneapi\dnnl -imsvcB:\src\torch\third_party\ideep\include -imsvcB:\src\torch\INTERFACE -imsvcB:\src\torch\third_party\nlohmann\include -imsvcB:\src\torch\third_party\concurrentqueue -imsvcB:\src\torch\build\include /DWIN32 /D_WINDOWS /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273 -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /Zc:preprocessor /Zc:preprocessor /O2 /Ob2 /DNDEBUG /bigobj -DNDEBUG -std:c++17 -MD /permissive- /EHsc /bigobj -O2 -Wmissing-prototypes -Werror=missing-prototypes -DONNX_BUILD_MAIN_LIB -Xclang -fopenmp /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\distributed\c10d\FileStore.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ -c -- B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp
2025-09-03T12:58:05.3062232Z clang-cl: warning: argument unused during compilation: '/Zc:preprocessor' [-Wunused-command-line-argument]
2025-09-03T12:58:05.3062886Z clang-cl: warning: argument unused during compilation: '/Zc:preprocessor' [-Wunused-command-line-argument]
2025-09-03T12:58:05.3063438Z In file included from B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp:2:
2025-09-03T12:58:05.3063864Z In file included from B:\src\torch\torch/csrc/distributed/c10d/FileStore.hpp:8:
2025-09-03T12:58:05.3064278Z In file included from B:\src\torch\torch/csrc/distributed/c10d/Store.hpp:9:
2025-09-03T12:58:05.3064677Z In file included from B:\src\torch\torch/custom_class.h:3:
2025-09-03T12:58:05.3065105Z In file included from B:\src\torch\aten\src\ATen/core/builtin_function.h:3:
2025-09-03T12:58:05.3065548Z In file included from B:\src\torch\aten\src\ATen/core/function.h:3:
2025-09-03T12:58:05.3066451Z B:\src\torch\aten\src\ATen/core/function_schema.h(211,13): warning: 'dllexport' attribute only applies to functions, variables, classes, and Objective-C interfaces [-Wignored-attributes]
2025-09-03T12:58:05.3067260Z   211 | enum struct TORCH_API SchemaArgType { input, output };
2025-09-03T12:58:05.3067524Z       |             ^
2025-09-03T12:58:05.3067860Z B:\src\torch\torch/headeronly/macros/Export.h(98,19): note: expanded from macro 'TORCH_API'
2025-09-03T12:58:05.3068238Z    98 | #define TORCH_API C10_EXPORT
2025-09-03T12:58:05.3068444Z       |                   ^
2025-09-03T12:58:05.3068779Z B:\src\torch\torch/headeronly/macros/Export.h(52,31): note: expanded from macro 'C10_EXPORT'
2025-09-03T12:58:05.3069160Z    52 | #define C10_EXPORT __declspec(dllexport)
2025-09-03T12:58:05.3069394Z       |                               ^
2025-09-03T12:58:05.3069909Z B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp(36,5): error: no previous prototype for function 'flock_' [-Werror,-Wmissing-prototypes]
2025-09-03T12:58:05.3070432Z    36 | int flock_(int fd, int op) {
2025-09-03T12:58:05.3070640Z       |     ^
2025-09-03T12:58:05.3071127Z B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp(36,1): note: declare 'static' if the function is not intended to be used outside of this translation unit
2025-09-03T12:58:05.3071669Z    36 | int flock_(int fd, int op) {
2025-09-03T12:58:05.3071861Z       | ^
2025-09-03T12:58:05.3071998Z       | static 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Who can help with merging this? I've never merged a pytorch PR before :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone already approved some workflows, so the next step as I see it is for one of the reviewers (or @jeffdaily ) to review, approve, and then trigger a merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to add more reviewers. Hopefully Jeff sees this and approves ^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @jeffdaily !

@jammm jammm requested a review from ScottTodd September 3, 2025 07:29
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 3, 2025
@jeffdaily jeffdaily changed the title [ROCm/Windows] Fix build failures and support some CUDABLAS calls [ROCm/Windows] Fix build failures and support some BLAS calls Sep 3, 2025
@pytorch-bot pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Sep 3, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 3, 2025

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Sep 3, 2025
@jeffdaily
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 3, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…h#161981)

* Support getrsBatched/geqrfBatched/gelsBatched on Windows ROCm (fixes ROCm/TheRock#1367)
* Fix windows pytorch build with USE_DISTRIBUTED=ON by default

Pull Request resolved: pytorch#161981
Approved by: https://github.com/ScottTodd, https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…h#161981)

* Support getrsBatched/geqrfBatched/gelsBatched on Windows ROCm (fixes ROCm/TheRock#1367)
* Fix windows pytorch build with USE_DISTRIBUTED=ON by default

Pull Request resolved: pytorch#161981
Approved by: https://github.com/ScottTodd, https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…h#161981)

* Support getrsBatched/geqrfBatched/gelsBatched on Windows ROCm (fixes ROCm/TheRock#1367)
* Fix windows pytorch build with USE_DISTRIBUTED=ON by default

Pull Request resolved: pytorch#161981
Approved by: https://github.com/ScottTodd, https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (c10d) release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Issue]: The uni_pc sampler is not supported in comfyui(not supported for HIP on Windows)

8 participants