KEMBAR78
Download pre-compiled AOTriton from GitHub unless AOTRITON_INSTALL_FROM_SOURCE=1 is set by jithunnair-amd · Pull Request #136603 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jithunnair-amd
Copy link
Collaborator

@jithunnair-amd jithunnair-amd commented Sep 25, 2024

PyTorch community members have reported issues with building PyTorch from source for ROCm in an environment that doesn't have aotriton pre-installed, because aotriton is only installed in the CI/manywheel docker images. Building aotriton from source can take ~45 minutes.

This PR fixes the issue by downloading the aotriton tarball in such scenarios, unless the user explicitly wants to build aotriton from source using the AOTRITON_INSTALL_FROM_SOURCE=1 env var

Two more issues this PR addresses: #136603 (comment) and #136603 (comment)

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 25, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136603

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 6 Cancelled Jobs, 12 Unrelated Failures

As of commit e0c0948 with merge base e4cdc31 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jithunnair-amd jithunnair-amd added release notes: rocm mandatorylabel ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request labels Sep 25, 2024
@jithunnair-amd jithunnair-amd marked this pull request as ready for review September 25, 2024 20:11
@jithunnair-amd
Copy link
Collaborator Author

jithunnair-amd commented Sep 26, 2024

ROCm CI regular as well as manylinux build jobs succeeded. Those test the preinstalled_aotriton scenario. I tested the build-from-source as well as the non-preinstalled, non-build-from-source scenarios locally and both installed the aotriton 0.7b version as expected.

install(DIRECTORY
$ENV{AOTRITON_INSTALLED_PREFIX}/lib
$ENV{AOTRITON_INSTALLED_PREFIX}/include
DESTINATION ${__AOTRITON_INSTALL_DIR})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet adds the libaotriton_v2.so to the torch/lib directory so that it gets packaged with the pytorch wheel if building pytorch from source using a simple python setup.py bdist_wheel command, thus making the wheel portable.

INSTALL_DIR ${__AOTRITON_INSTALL_DIR}
CMAKE_ARGS -DCMAKE_INSTALL_PREFIX:PATH=${__AOTRITON_INSTALL_DIR}
-DAOTRITON_COMPRESS_KERNEL=OFF
-DAOTRITON_COMPRESS_KERNEL=ON
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel compression results in a smaller aotriton shared library, avoiding linker errors when we have a large number of gfx targets being built for.

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jithunnair-amd
Copy link
Collaborator Author

@pytorchbot merge -f "Unrelated CI failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jithunnair-amd
Copy link
Collaborator Author

@pytorchbot cherry-pick --onto release/2.5 -c critical

pytorchbot pushed a commit that referenced this pull request Sep 26, 2024
…OM_SOURCE=1 is set (#136603)

PyTorch community members have reported issues with building PyTorch from source for ROCm in an environment that doesn't have aotriton pre-installed, because aotriton is only installed in the [CI](https://github.com/pytorch/pytorch/blob/a8ed873ba2aa13437a336863ae9d73c235798acc/.ci/docker/ubuntu-rocm/Dockerfile#L110)/[manywheel](https://github.com/pytorch/pytorch/blob/a8ed873ba2aa13437a336863ae9d73c235798acc/.ci/docker/manywheel/Dockerfile#L197) docker images. Building aotriton from source can take ~45 minutes.

This PR fixes the issue by downloading the aotriton tarball in such scenarios, *unless the user explicitly wants to build aotriton from source using the AOTRITON_INSTALL_FROM_SOURCE=1 env var*

Pull Request resolved: #136603
Approved by: https://github.com/atalman

Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>
(cherry picked from commit 851b973)
@pytorchbot
Copy link
Collaborator

Cherry picking #136603

The cherry pick PR is at #136786 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

@malfet
Copy link
Contributor

malfet commented Sep 26, 2024

@pytorchbot cherry-pick --onto release/2.5 -c critical

@jithunnair-amd can you please elaborate why this is a critical to cherry-pick into release branch?

@jithunnair-amd
Copy link
Collaborator Author

@pytorchbot cherry-pick --onto release/2.5 -c critical

@jithunnair-amd can you please elaborate why this is a critical to cherry-pick into release branch?

It fixes the issues PyTorch community users face when doing build-from-source of PyTorch on ROCm. I assumed that would come under "critical".

@kit1980
Copy link
Contributor

kit1980 commented Sep 27, 2024

Is building from source relevant for release really?

@jithunnair-amd
Copy link
Collaborator Author

jithunnair-amd commented Sep 27, 2024

Is building from source relevant for release really?

I'll go with release management team's assessment on that. I assumed that we would want release branches to also build-from-source successfully, but if that's only considered important for main branch, we can close the other PR.

@jithunnair-amd jithunnair-amd deleted the jnair/aotriton-cmake-update-only branch September 28, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: rocm mandatorylabel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants