KEMBAR78
[ROCm][Windows] Fix finding ROCm/HIP version by tvukovic-amd · Pull Request #156486 · pytorch/pytorch · GitHub
Skip to content

Conversation

@tvukovic-amd
Copy link
Contributor

@tvukovic-amd tvukovic-amd commented Jun 20, 2025

This commit fixes Windows build issue related to trying to use rocm-core (rocm-core doesn't exist on HIP SDK)

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156486

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 4eb607d with merge base 6c5227b (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Jun 20, 2025
@tvukovic-amd
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jun 20, 2025
@jerryzh168 jerryzh168 requested a review from jeffdaily June 21, 2025 00:32
@jerryzh168 jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 21, 2025
Copy link
Collaborator

@jeffdaily jeffdaily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hip-sdk isn't the only windows build. we should avoid adding if(UNIX) and if(WIN32) to the rocm logic. We should find something that works on both platforms. Best-effort, reasonable fallbacks, etc.

@ScottTodd
Copy link
Contributor

hip-sdk isn't the only windows build. we should avoid adding if(UNIX) and if(WIN32) to the rocm logic. We should find something that works on both platforms. Best-effort, reasonable fallbacks, etc.

Thanks, TheRock includes rocm-core on Windows: https://github.com/ROCm/TheRock. We also have a rather involved patch for this LoadHIP.cmake file here: https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/patches/pytorch/v2.7.0/pytorch/hipified/0001-Rework-LoadHIP.cmake-to-be-based-purely-on-CMAKE_PRE.patch which I think was originally authored by @stellaraccident on ROCm/TheRock#101.

Copy link
Contributor

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more hardcoded platform checks for hipsdk paths or features are permitted in pytorch. If you need these, you must parameterize them in some other way -- preferably by setting specific things in your own build invocation.

We're working to get all optional features that were left out of hipsdk available on Windows and we can't have this kind of patch anymore.

@tvukovic-amd
Copy link
Contributor Author

tvukovic-amd commented Jun 25, 2025

Thank you for explaining that we shouldn't use if(UNIX) and if(WIN32) in the ROCm logic in the future.

Here is the code change and explanation for the fix:

The following line caused issues on ROCm on Windows. This line sets ROCM_LIB_NAME to ROCM and changes it to HIP in case rocm-core/rocm_version.h is not found and hip/hip_version.h is found.

This is fine in case that we are passing once through this code. On Windows this part of code is executed twice during configuration, so on the second time when we are reaching this line ROCM_VERSION_HEADER_PATH is not empty and it has correct path to hip_version.h. In that case ROCM_LIB_NAME is not changed to HIP because it doesn't satisfy if statement which leads to error.

I committed fix without if(UNIX) code which solves this issue on Windows.

@tvukovic-amd tvukovic-amd changed the title [ROCm][Windows] Skip using rocm-core on Windows case [ROCm][Windows] Fix finding ROCm/HIP version Jun 25, 2025
@stellaraccident
Copy link
Contributor

stellaraccident commented Jun 25, 2025

Thank you for explaining that we shouldn't use if(UNIX) and if(WIN32) in the ROCm logic in the future.

Here is the code change and explanation for the fix:

The following line caused issues on ROCm on Windows. This line sets ROCM_LIB_NAME to ROCM and changes it to HIP in case rocm-core/rocm_version.h is not found and hip/hip_version.h is found.

This is fine in case that we are passing once through this code. On Windows this part of code is executed twice during configuration, so on the second time when we are reaching this line ROCM_VERSION_HEADER_PATH is not empty and it has correct path to hip_version.h. In that case ROCM_LIB_NAME is not changed to HIP because it doesn't satisfy if statement which leads to error.

I committed fix without if(UNIX) code which solves this issue on Windows.

It may solve the issue on hipsdk, but not on windows. I am willing to acknowledge that there is a bug, but the solution cannot be to fork around it for windows generally, as I believe this is only a problem with hipsdk.

As an aside, we will be removing the other if WIN32 diversions in this file today.

@tvukovic-amd
Copy link
Contributor Author

Thank you for explaining that we shouldn't use if(UNIX) and if(WIN32) in the ROCm logic in the future.
Here is the code change and explanation for the fix:
The following line caused issues on ROCm on Windows. This line sets ROCM_LIB_NAME to ROCM and changes it to HIP in case rocm-core/rocm_version.h is not found and hip/hip_version.h is found.
This is fine in case that we are passing once through this code. On Windows this part of code is executed twice during configuration, so on the second time when we are reaching this line ROCM_VERSION_HEADER_PATH is not empty and it has correct path to hip_version.h. In that case ROCM_LIB_NAME is not changed to HIP because it doesn't satisfy if statement which leads to error.
I committed fix without if(UNIX) code which solves this issue on Windows.

It may solve the issue on hipsdk, but not on windows. I am willing to acknowledge that there is a bug, but the solution cannot be to fork around it for windows generally, as I believe this is only a problem with hipsdk.

As an aside, we will be removing the other if WIN32 diversions in this file today.

I agree with you, so as you can see on the last commit we are not checking the OS. Instead, we are first searching for rocm_version.h and if it doesn't exist we are searching for hip_version.h and set ROCM_LIB_NAME to ROCM or HIP according to rocm_version.h or hip_version.h presence

Copy link
Contributor

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks better to me.

@jeffdaily
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 30, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-rocm6_4-build / build

Details for Dev Infra team Raised by workflow job

@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Jul 2, 2025
@tvukovic-amd
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 11, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@jeffdaily
Copy link
Collaborator

rocm linux build failed with these changes, must fix.

@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Jul 15, 2025
@jeffdaily
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 15, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 2, 2, linux.rocm.gpu.2)

Details for Dev Infra team Raised by workflow job

@jeffdaily
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

tvukovic-amd added a commit to ROCm/pytorch that referenced this pull request Jul 23, 2025
This commit fixes Windows build issue related to trying to use rocm-core (rocm-core doesn't exist on HIP SDK)

Pull Request resolved: pytorch#156486
Approved by: https://github.com/jeffdaily, https://github.com/stellaraccident
tvukovic-amd added a commit to ROCm/pytorch that referenced this pull request Jul 23, 2025
This commit fixes Windows build issue related to trying to use rocm-core (rocm-core doesn't exist on HIP SDK)

Pull Request resolved: pytorch#156486
Approved by: https://github.com/jeffdaily, https://github.com/stellaraccident
tvukovic-amd added a commit to ROCm/pytorch that referenced this pull request Jul 25, 2025
This commit fixes Windows build issue related to trying to use rocm-core (rocm-core doesn't exist on HIP SDK)

Pull Request resolved: pytorch#156486
Approved by: https://github.com/jeffdaily, https://github.com/stellaraccident
tvukovic-amd added a commit to ROCm/pytorch that referenced this pull request Aug 25, 2025
This commit fixes Windows build issue related to trying to use rocm-core (rocm-core doesn't exist on HIP SDK)

Pull Request resolved: pytorch#156486
Approved by: https://github.com/jeffdaily, https://github.com/stellaraccident
jeffdaily pushed a commit to ROCm/pytorch that referenced this pull request Aug 25, 2025
This commit fixes Windows build issue related to trying to
rocm-core/rocm_version.h or hip/hip_version.h.
This is cherry-picked commit from upstream main
([PR](pytorch#156486))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants