KEMBAR78
[cherry-pick] Modify cuda aarch64 install for cudnn and nccl. Cleanup aarch64 cuda 12.6 docker #149540 by atalman · Pull Request #149624 · pytorch/pytorch · GitHub
Skip to content

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Mar 20, 2025

Cherry-Pick of #149540 to release branch

…12.6 docker (pytorch#149540)

1. Use NCCL_VERSION=v2.26.2-1 . Fixes nccl cuda aarch64 related failure we see here: https://github.com/pytorch/pytorch/actions/runs/13955856471/job/39066681549?pr=149443 . After landing: pytorch#149351
TODO: Followup required to unify NCCL definitions across the x86 and aarch64 builds

3. Cleanup Remove older CUDA versions for aarch64 builds . CUDA 12.6 where removed by: pytorch#148895
Pull Request resolved: pytorch#149540
Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/nWEIdia
@atalman atalman requested review from a team and jeffdaily as code owners March 20, 2025 14:47
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149624

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit e826eef with merge base 924a247 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Mar 20, 2025
rm -rf tmp_cusparselt
}
NCCL_VERSION=v2.26.2-1
CUDNN_VERSION=9.8.0.87
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the common CUDNN install?

Copy link
Contributor Author

@atalman atalman Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only cuda 12.8 aarch64 bulds. We should really remove this script and relay on install_cuda.sh instead. Will look into this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skylion007 this is a cherry-pick, but I agree, we should have a common install script on trunk

@malfet malfet merged commit d80afc0 into pytorch:release/2.7 Mar 26, 2025
156 of 158 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants