-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ROCm/Windows] Fix build failures and support some BLAS calls #161981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161981
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 188370b with merge base fca2601 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
cc @ScottTodd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have more time to review these changes in detail later. In the meantime, can you link to some logs (github workflow history or local) showing what specific build failures these changes fix?
We may also want to check if these changes are compatible with both rocm-sdk (TheRock) and HIP SDK based Windows builds. cc @m-gallus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first commit fixes a compiler error related to a missing static keyword in the prototype. I'll do a rebuild without this commit and include the error logs when its done.
The second commit fixes a runtime error due to unimplemented CUDA_BLAS calls on Windows ROCm/TheRock#1367
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also solve ROCm/TheRock#1314
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first commit fixes a compiler error related to a missing static keyword in the prototype. I'll do a rebuild without this commit and include the error logs when its done.
The error is:
D:/b/torch/csrc/distributed/c10d/FileStore.cpp(36,5): error: no previous prototype for function 'flock_' [-Werror,-Wmissing-prototypes]
36 | int flock_(int fd, int op) {
| ^
D:/b/torch/csrc/distributed/c10d/FileStore.cpp(36,1): note: declare 'static' if the function is not intended to be used outside of this translation unit
36 | int flock_(int fd, int op) {
| ^
| static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for letting me know @ScottTodd , I'll have someone take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Our nightly Windows PyTorch builds in TheRock are running again and just ran into that error too: https://github.com/ROCm/TheRock/actions/runs/17433525478/job/49497325330
2025-09-03T12:58:05.3041175Z [3009/7072] Building CXX object caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\distributed\c10d\FileStore.cpp.obj
2025-09-03T12:58:05.3041689Z FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/distributed/c10d/FileStore.cpp.obj
2025-09-03T12:58:05.3052618Z C:\home\runner\_work\_tool\Python\3.12.10\x64\Lib\site-packages\_rocm_sdk_devel\lib\llvm\bin\clang-cl.exe /nologo -TP -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DEXPORT_AOTI_FUNCTIONS -DFMT_HEADER_ONLY=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DROCM_ON_WINDOWS -DROCM_VERSION=70000 -DTORCH_HIP_VERSION=701 -DUSE_EXTERNAL_MZCRC -DUSE_MIMALLOC -DUSE_ROCM -DWIN32_LEAN_AND_MEAN -DXNN_LOG_LEVEL=0 -D_CRT_SECURE_NO_DEPRECATE=1 -D_UCRT_LEGACY_INFINITY -D__HIP_PLATFORM_AMD__ -Dtorch_cpu_EXPORTS -IB:\src\torch\build\aten\src -IB:\src\torch\aten\src -IB:\src\torch\build -IB:\src\torch -IC:\home\runner\_work\_tool\Python\3.12.10\x64\Lib\site-packages\_rocm_sdk_devel\include -IB:\src\torch\nlohmann -IB:\src\torch\moodycamel -IB:\src\torch\third_party\mimalloc\include -IB:\src\torch\torch\csrc\api -IB:\src\torch\torch\csrc\api\include -IB:\src\torch\caffe2\aten\src\TH -IB:\src\torch\build\caffe2\aten\src\TH -IB:\src\torch\build\caffe2\aten\src -IB:\src\torch\build\caffe2\..\aten\src -IB:\src\torch\torch\csrc -IB:\src\torch\torch\headeronly -IB:\src\torch\third_party\miniz-3.0.2 -IB:\src\torch\third_party\kineto\libkineto\include -IB:\src\torch\third_party\cpp-httplib -IB:\src\torch\aten\src\ATen\.. -IB:\src\torch\c10\.. -IB:\src\torch\third_party\pthreadpool\include -IB:\src\torch\third_party\cpuinfo\include -IB:\src\torch\third_party\FP16\include -IB:\src\torch\third_party\fmt\include -IB:\src\torch\build\third_party\ideep\mkl-dnn\include -IB:\src\torch\third_party\ideep\mkl-dnn\src\..\include -IB:\src\torch\third_party\onnx -IB:\src\torch\build\third_party\onnx -IB:\src\torch\third_party\flatbuffers\include -imsvcB:\src\torch\third_party\protobuf\src -imsvcB:\src\torch\third_party\XNNPACK\include -imsvcB:\src\torch\third_party\ittapi\include -imsvcB:\src\torch\cmake\..\third_party\eigen -imsvcB:\src\torch\third_party\ideep\mkl-dnn\include\oneapi\dnnl -imsvcB:\src\torch\third_party\ideep\include -imsvcB:\src\torch\INTERFACE -imsvcB:\src\torch\third_party\nlohmann\include -imsvcB:\src\torch\third_party\concurrentqueue -imsvcB:\src\torch\build\include /DWIN32 /D_WINDOWS /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273 -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /Zc:preprocessor /Zc:preprocessor /O2 /Ob2 /DNDEBUG /bigobj -DNDEBUG -std:c++17 -MD /permissive- /EHsc /bigobj -O2 -Wmissing-prototypes -Werror=missing-prototypes -DONNX_BUILD_MAIN_LIB -Xclang -fopenmp /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\distributed\c10d\FileStore.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ -c -- B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp
2025-09-03T12:58:05.3062232Z clang-cl: warning: argument unused during compilation: '/Zc:preprocessor' [-Wunused-command-line-argument]
2025-09-03T12:58:05.3062886Z clang-cl: warning: argument unused during compilation: '/Zc:preprocessor' [-Wunused-command-line-argument]
2025-09-03T12:58:05.3063438Z In file included from B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp:2:
2025-09-03T12:58:05.3063864Z In file included from B:\src\torch\torch/csrc/distributed/c10d/FileStore.hpp:8:
2025-09-03T12:58:05.3064278Z In file included from B:\src\torch\torch/csrc/distributed/c10d/Store.hpp:9:
2025-09-03T12:58:05.3064677Z In file included from B:\src\torch\torch/custom_class.h:3:
2025-09-03T12:58:05.3065105Z In file included from B:\src\torch\aten\src\ATen/core/builtin_function.h:3:
2025-09-03T12:58:05.3065548Z In file included from B:\src\torch\aten\src\ATen/core/function.h:3:
2025-09-03T12:58:05.3066451Z B:\src\torch\aten\src\ATen/core/function_schema.h(211,13): warning: 'dllexport' attribute only applies to functions, variables, classes, and Objective-C interfaces [-Wignored-attributes]
2025-09-03T12:58:05.3067260Z 211 | enum struct TORCH_API SchemaArgType { input, output };
2025-09-03T12:58:05.3067524Z | ^
2025-09-03T12:58:05.3067860Z B:\src\torch\torch/headeronly/macros/Export.h(98,19): note: expanded from macro 'TORCH_API'
2025-09-03T12:58:05.3068238Z 98 | #define TORCH_API C10_EXPORT
2025-09-03T12:58:05.3068444Z | ^
2025-09-03T12:58:05.3068779Z B:\src\torch\torch/headeronly/macros/Export.h(52,31): note: expanded from macro 'C10_EXPORT'
2025-09-03T12:58:05.3069160Z 52 | #define C10_EXPORT __declspec(dllexport)
2025-09-03T12:58:05.3069394Z | ^
2025-09-03T12:58:05.3069909Z B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp(36,5): error: no previous prototype for function 'flock_' [-Werror,-Wmissing-prototypes]
2025-09-03T12:58:05.3070432Z 36 | int flock_(int fd, int op) {
2025-09-03T12:58:05.3070640Z | ^
2025-09-03T12:58:05.3071127Z B:\src\torch\torch\csrc\distributed\c10d\FileStore.cpp(36,1): note: declare 'static' if the function is not intended to be used outside of this translation unit
2025-09-03T12:58:05.3071669Z 36 | int flock_(int fd, int op) {
2025-09-03T12:58:05.3071861Z | ^
2025-09-03T12:58:05.3071998Z | static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! Who can help with merging this? I've never merged a pytorch PR before :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone already approved some workflows, so the next step as I see it is for one of the reviewers (or @jeffdaily ) to review, approve, and then trigger a merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to add more reviewers. Hopefully Jeff sees this and approves ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot @jeffdaily !
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…h#161981) * Support getrsBatched/geqrfBatched/gelsBatched on Windows ROCm (fixes ROCm/TheRock#1367) * Fix windows pytorch build with USE_DISTRIBUTED=ON by default Pull Request resolved: pytorch#161981 Approved by: https://github.com/ScottTodd, https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>
…h#161981) * Support getrsBatched/geqrfBatched/gelsBatched on Windows ROCm (fixes ROCm/TheRock#1367) * Fix windows pytorch build with USE_DISTRIBUTED=ON by default Pull Request resolved: pytorch#161981 Approved by: https://github.com/ScottTodd, https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>
…h#161981) * Support getrsBatched/geqrfBatched/gelsBatched on Windows ROCm (fixes ROCm/TheRock#1367) * Fix windows pytorch build with USE_DISTRIBUTED=ON by default Pull Request resolved: pytorch#161981 Approved by: https://github.com/ScottTodd, https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd