-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[CD] Add CUDA 13.0 x86 nightly builds #160956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160956
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 3 PendingAs of commit b2ffcd6 with merge base 19c70c2 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@tinglvv current error is related to nvshmem and sm_75: |
|
Please note Docker builds are using correct version: |
|
Does the NVSHMEM error occur to sm_75 only? If so, it would point to NVSHMEM 3.3.20 dropping support of this arch? |
|
Don't see sm75 mentioned here: https://docs.nvidia.com/nvshmem/release-notes-install-guide/release-notes/release-3320.html |
Adding dependencies to unblock pytorch/pytorch#160956
|
https://download.pytorch.org/whl/nightly/cu130 is updated after pytorch/test-infra#7038. Rerunning the test. |
Related to pytorch/pytorch#160956 follow up for #7038 cc @atalman
|
Disabled sm_75 for NVSHMEM for CUDA 13 build temporarily (3069af6), will need to enable when hotfix in 3.3.21 is released. |
|
@pytorchmergebot merge -f "signal looks good" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
Is this the only code change related to "disable nvshmem"? |
Hi @kwen2501 , thanks for bringing this up. I updated to NVSHMEM 3.3.24 in a separate PR - #161321. |
The other PR looks good! For ARM, both issues are fixed (glibc issue and download link fixed), so we can enable it back. I'll open a PR to put it back. |
Related to #159779 Adding CUDA 13.0 libtorch builds, followup after #160956 Removing CUDA 12.9 builds, See #159980 Pull Request resolved: #161916 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007 Co-authored-by: Ting Lu <tingl@nvidia.com>
…161916) Related to pytorch#159779 Adding CUDA 13.0 libtorch builds, followup after pytorch#160956 Removing CUDA 12.9 builds, See pytorch#159980 Pull Request resolved: pytorch#161916 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007 Co-authored-by: Ting Lu <tingl@nvidia.com>
pytorch#159779 CUDA 13.0.0 NVSHMEM 3.3.20 CUDNN 9.12.0.46 Adding x86 linux builds for CUDA 13. Adding libtorch docker. Package naming changed for CUDA 13 (removed postfix -cu13 for some packages). Preparation checklist: 1. Update index https://download.pytorch.org/whl/nightly/cu130 with pypi packages 2. Update packaging name based on https://pypi.org/project/cuda-toolkit/ metadata Pull Request resolved: pytorch#160956 Approved by: https://github.com/atalman Co-authored-by: atalman <atalman@fb.com>
…161916) Related to pytorch#159779 Adding CUDA 13.0 libtorch builds, followup after pytorch#160956 Removing CUDA 12.9 builds, See pytorch#159980 Pull Request resolved: pytorch#161916 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007 Co-authored-by: Ting Lu <tingl@nvidia.com>
Undo changes introduced in #160956 as driver has been updated to 580 for both fleets
Undo changes introduced in #160956 as driver has been updated to 580 for both fleets Fixes #163342 Pull Request resolved: #163349 Approved by: https://github.com/seemethere
…161916) Related to pytorch#159779 Adding CUDA 13.0 libtorch builds, followup after pytorch#160956 Removing CUDA 12.9 builds, See pytorch#159980 Pull Request resolved: pytorch#161916 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007 Co-authored-by: Ting Lu <tingl@nvidia.com>
Undo changes introduced in pytorch#160956 as driver has been updated to 580 for both fleets Fixes pytorch#163342 Pull Request resolved: pytorch#163349 Approved by: https://github.com/seemethere
…161916) Related to pytorch#159779 Adding CUDA 13.0 libtorch builds, followup after pytorch#160956 Removing CUDA 12.9 builds, See pytorch#159980 Pull Request resolved: pytorch#161916 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007 Co-authored-by: Ting Lu <tingl@nvidia.com>
Undo changes introduced in pytorch#160956 as driver has been updated to 580 for both fleets Fixes pytorch#163342 Pull Request resolved: pytorch#163349 Approved by: https://github.com/seemethere
Undo changes introduced in pytorch#160956 as driver has been updated to 580 for both fleets Fixes pytorch#163342 Pull Request resolved: pytorch#163349 Approved by: https://github.com/seemethere
Undo changes introduced in #160956 as driver has been updated to 580 for both fleets Fixes #163342 Pull Request resolved: #163349 Approved by: https://github.com/seemethere Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
…161916) Related to pytorch#159779 Adding CUDA 13.0 libtorch builds, followup after pytorch#160956 Removing CUDA 12.9 builds, See pytorch#159980 Pull Request resolved: pytorch#161916 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007 Co-authored-by: Ting Lu <tingl@nvidia.com>
Undo changes introduced in pytorch#160956 as driver has been updated to 580 for both fleets Fixes pytorch#163342 Pull Request resolved: pytorch#163349 Approved by: https://github.com/seemethere
#159779
CUDA 13.0.0
NVSHMEM 3.3.20
CUDNN 9.12.0.46
Adding x86 linux builds for CUDA 13.
Adding sbsa docker.
Adding libtorch docker.
Package naming changed for CUDA 13 (removed postfix -cu13 for some packages).
Preparation checklist:
cc @ptrblck @nWEIdia @atalman @malfet @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta