Add CUDA 12.8 manywheel x86 Builds to Binaries Matrix #145792

tinglvv · 2025-01-27T23:21:58Z

#145570

Adding cuda 12.8.0 x86 builds first

TODO: resolve libtorch build failure and add build in #146084

cc @atalman @malfet @ptrblck @nWEIdia

pytorch-bot · 2025-01-27T23:22:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145792

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 121 Pending

As of commit d9d6492 with merge base 2af8767 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

.ci/manywheel/build_cuda.sh

tinglvv · 2025-01-28T18:54:43Z

Will keep the +PTX for nightlies.
Will remove it for 2.7.0 stable release.

Skylion007 · 2025-01-28T19:34:58Z

.github/scripts/generate_binary_build_matrix.py

+        "nvidia-cusolver-cu12==11.7.2.55; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-cusparse-cu12==12.5.7.53; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-cusparselt-cu12==0.6.3; platform_system == 'Linux' and platform_machine == 'x86_64' | "
+        "nvidia-nccl-cu12==2.21.5; platform_system == 'Linux' and platform_machine == 'x86_64' | "


We should add another PR updating NCCL

Skylion007 · 2025-01-28T20:50:07Z

This needs someone with S3 access to upload the relevant binaries to the nightly bucket? @albanD @malfet ?

malfet · 2025-01-28T21:01:09Z

This needs someone with S3 access to upload the relevant binaries to the nightly bucket? @albanD @malfet ?

@atalman you have taken care of this one already, haven't you?

.github/scripts/generate_binary_build_matrix.py

atalman · 2025-01-29T22:28:40Z

Hi @malfet binaries are uploaded to cu128 bucket

atalman

@tinglvv please fix lint

tinglvv · 2025-01-30T07:56:08Z

Libtorch Build failure - https://github.com/pytorch/pytorch/actions/runs/13042203635/job/36386381759
CUDAContext.cpp:(.text+0x157): additional relocation overflows omitted from the output /usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax collect2: error: ld returned 1 exit status

Seems the binary size might be too large, need to refine TORCH_CUDA_ARCH_LIST based on #39968.

Skipping the libtorch wheel addition for now.

tinglvv · 2025-01-30T16:17:29Z

Lint keeps getting this failure, unsure of the reason
File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 102, in <module> main() File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 98, in main run_cmd_or_die(f"docker exec -t {container_name} /exec") File "/home/ec2-user/actions-runner/_work/pytorch/pytorch/test-infra/.github/scripts/run_with_env_secrets.py", line 39, in run_cmd_or_die raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}") RuntimeError: Command docker exec -t d0819bbeba3a6439ccfdbf23e2ed66a4c6521cd53f005452b928484515f6acf2 /exec failed with exit code 5

Skylion007 · 2025-01-30T17:53:47Z

Too unblock try passing this flag: #39968 (comment) that might it link while we figure out best way to reduce the code size.

Skylion007 · 2025-01-30T18:28:46Z

.ci/manywheel/build_cuda.sh


 cuda_version_nodot=$(echo $CUDA_VERSION | tr -d '.')

 TORCH_CUDA_ARCH_LIST="5.0;6.0;7.0;7.5;8.0;8.6"


We should add --host-linker-script=use-lcs to the TORCH_NVCC_FLAGS at the top of this file, that should fix this issue without changing the CUDA_ARCH_LIST

Testing with https://github.com/pytorch/pytorch/actions/runs/13061839749/job/36446414920?pr=146084

tinglvv · 2025-01-30T21:21:33Z

New Build Failure in libtorch after the ld relink error https://github.com/pytorch/pytorch/actions/runs/13056929705/job/36430236293

[7278/7628] Linking CXX executable bin/example_allreduce
FAILED: bin/example_allreduce 
: && /opt/rh/devtoolset-9/root/usr/bin/c++ -Wno-deprecated-declarations -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic     -Wl,--no-as-needed test_cpp_c10d/CMakeFiles/example_allreduce.dir/example/allreduce.cpp.o -o bin/example_allreduce -L/lib/intel64   -L/lib/intel64_win   -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/pytorch/build/lib  lib/libtorch_cpu.so  lib/libtorch_cuda.so  lib/libc10_cuda.so  /usr/local/cuda/lib64/libcudart_static.a  -ldl  /lib64/librt.so  lib/libprotobuf.a  -pthread  lib/libc10.so  /opt/intel/lib/libmkl_intel_lp64.a  /opt/intel/lib/libmkl_gnu_thread.a  /opt/intel/lib/libmkl_core.a  -fopenmp  -lpthread  /lib64/libm.so  /lib64/libdl.so  -Wl,--no-as-needed,"/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed && /opt/_internal/cpython-3.9.0/lib/python3.9/site-packages/cmake/data/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=bin/example_allreduce && :
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/cuda/lib64/libcublasLt.so.12: undefined reference to `__cxa_thread_atexit_impl@GLIBC_2.18'
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/cuda/lib64/libcublasLt.so.12: undefined reference to `log2f@GLIBC_2.27'
collect2: error: ld returned 1 exit status

Let me just merge the manywheel changes for now.

tinglvv · 2025-01-31T08:49:33Z

@pytorchbot merge

pytorchmergebot · 2025-01-31T08:51:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-01-31T08:51:36Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / Test tools / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

atalman · 2025-01-31T14:35:00Z

@pytorchmergebot rebase -b main

pytorchmergebot · 2025-01-31T14:36:30Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

pytorchmergebot · 2025-01-31T14:36:33Z

Successfully rebased cu128-binaries onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout cu128-binaries && git pull --rebase)

atalman · 2025-01-31T16:10:27Z

@pytorchmergebot merge -f "lint is passing everything else was tested"

pytorchmergebot · 2025-01-31T16:11:50Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

tinglvv · 2025-01-31T17:44:01Z

Thanks Andrey for merging the PR. Nightly x86 builds should be available from tonight.

Skylion007 · 2025-07-05T14:21:55Z

New Build Failure in libtorch after the ld relink error https://github.com/pytorch/pytorch/actions/runs/13056929705/job/36430236293

[7278/7628] Linking CXX executable bin/example_allreduce
FAILED: bin/example_allreduce 
: && /opt/rh/devtoolset-9/root/usr/bin/c++ -Wno-deprecated-declarations -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic     -Wl,--no-as-needed test_cpp_c10d/CMakeFiles/example_allreduce.dir/example/allreduce.cpp.o -o bin/example_allreduce -L/lib/intel64   -L/lib/intel64_win   -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/pytorch/build/lib  lib/libtorch_cpu.so  lib/libtorch_cuda.so  lib/libc10_cuda.so  /usr/local/cuda/lib64/libcudart_static.a  -ldl  /lib64/librt.so  lib/libprotobuf.a  -pthread  lib/libc10.so  /opt/intel/lib/libmkl_intel_lp64.a  /opt/intel/lib/libmkl_gnu_thread.a  /opt/intel/lib/libmkl_core.a  -fopenmp  -lpthread  /lib64/libm.so  /lib64/libdl.so  -Wl,--no-as-needed,"/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed && /opt/_internal/cpython-3.9.0/lib/python3.9/site-packages/cmake/data/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=bin/example_allreduce && :
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/cuda/lib64/libcublasLt.so.12: undefined reference to `__cxa_thread_atexit_impl@GLIBC_2.18'
/opt/rh/devtoolset-9/root/usr/libexec/gcc/x86_64-redhat-linux/9/ld: /usr/local/cuda/lib64/libcublasLt.so.12: undefined reference to `log2f@GLIBC_2.27'
collect2: error: ld returned 1 exit status

Let me just merge the manywheel changes for now.

Apparentyl, we need to pass -Xlinker --script -Xlinker ./lcs to the relink script after generating it with nvcc --host-link-script gen-lcs

pytorch-bot bot added the topic: not user facing topic category label Jan 27, 2025

tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Jan 27, 2025

tinglvv mentioned this pull request Jan 27, 2025

Enable CUDA 12.8.0, Disable CUDA 12.4 #145570

Open

26 tasks

nWEIdia approved these changes Jan 27, 2025

View reviewed changes

pytorchbot added the open source label Jan 27, 2025

nWEIdia reviewed Jan 28, 2025

View reviewed changes

.ci/manywheel/build_cuda.sh Outdated Show resolved Hide resolved

nWEIdia approved these changes Jan 28, 2025

View reviewed changes

tinglvv marked this pull request as ready for review January 28, 2025 18:54

tinglvv requested a review from a team as a code owner January 28, 2025 18:54

Skylion007 reviewed Jan 28, 2025

View reviewed changes

tinglvv mentioned this pull request Jan 28, 2025

Update to NCCL 2.25.1 for 12.8 #145776

Closed

malfet approved these changes Jan 28, 2025

View reviewed changes

.github/scripts/generate_binary_build_matrix.py Outdated Show resolved Hide resolved

atalman approved these changes Jan 29, 2025

View reviewed changes

tinglvv changed the title ~~Add CUDA 12.8 Linux Builds to Binaries Matrix~~ Add CUDA 12.8 manywheel x86 Builds to Binaries Matrix Jan 30, 2025

Skylion007 reviewed Jan 30, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 31, 2025

pytorchmergebot added the merging label Jan 31, 2025

tinglvv added 11 commits January 31, 2025 14:36

Build CUDA 12.8 nightlies

5c35a97

Add CUDA 12.8 in build_cuda.sh

8fb1fae

retrigger test and small edit

73fa386

refactor with CUDA_ARCHES

74beae6

fix lint 2

fcc3f92

skip libtorch builds for now

303b156

fix lint 3

6389636

fix lint

d471b65

restore the modify build part

1fe683d

Fix lint and remove sm50 and sm60

2c33db9

Generate manywheel only

d9d6492

pytorchmergebot force-pushed the cu128-binaries branch from 0c296a1 to d9d6492 Compare January 31, 2025 14:36

pytorchmergebot added the merging label Jan 31, 2025

pytorchmergebot closed this in 9232355 Jan 31, 2025

pytorchmergebot added Merged and removed merging labels Jan 31, 2025

tinglvv mentioned this pull request Jan 31, 2025

Libtorch CUDA 12.8 Test with --host-linker-script=use-lcs #146084

Closed

tinglvv mentioned this pull request Feb 7, 2025

Add libtorch nightly build for CUDA 12.8 #146265

Closed

tinglvv mentioned this pull request May 7, 2025

Upgrade to CUDA 12.8.1 for nightly binaries #152923

Closed

tinglvv mentioned this pull request Jun 17, 2025

Add CUDA 12.9 libtorch nightly #155895

Closed

atalman mentioned this pull request Jul 3, 2025

[release 2.8-2.9] Delete support for Maxwell, Pascal, and Volta architectures for CUDA 12.8 and 12.9 builds #157517

Open


		cuda_version_nodot=$(echo $CUDA_VERSION \| tr -d '.')

		TORCH_CUDA_ARCH_LIST="5.0;6.0;7.0;7.5;8.0;8.6"

Add CUDA 12.8 manywheel x86 Builds to Binaries Matrix #145792

Add CUDA 12.8 manywheel x86 Builds to Binaries Matrix #145792

Uh oh!

Conversation

tinglvv commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145792

⏳ No Failures, 121 Pending

Uh oh!

Uh oh!

tinglvv commented Jan 28, 2025

Uh oh!

Skylion007 Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented Jan 28, 2025

Uh oh!

malfet commented Jan 28, 2025

Uh oh!

Uh oh!

atalman commented Jan 29, 2025

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

tinglvv commented Jan 30, 2025

Uh oh!

tinglvv commented Jan 30, 2025

Uh oh!

Skylion007 commented Jan 30, 2025

Uh oh!

Skylion007 Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tinglvv Jan 30, 2025

Choose a reason for hiding this comment

Uh oh!

tinglvv commented Jan 30, 2025

Uh oh!

tinglvv commented Jan 31, 2025

Uh oh!

pytorchmergebot commented Jan 31, 2025

Merge started

Uh oh!

pytorchmergebot commented Jan 31, 2025

Merge failed

Uh oh!

atalman commented Jan 31, 2025

Uh oh!

pytorchmergebot commented Jan 31, 2025

Uh oh!

pytorchmergebot commented Jan 31, 2025

Uh oh!

atalman commented Jan 31, 2025

Uh oh!

pytorchmergebot commented Jan 31, 2025

Merge started

Uh oh!

tinglvv commented Jan 31, 2025

Uh oh!

Skylion007 commented Jul 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

tinglvv commented Jan 27, 2025 •

edited

Loading

pytorch-bot bot commented Jan 27, 2025 •

edited

Loading

Skylion007 Jan 30, 2025 •

edited

Loading