Nccl update to 2.25.1 for cuda 12.4-12.8 #146073

atalman · 2025-01-30T21:09:07Z

Should resolve: #144768
We use one common nccl version for cuda builds 12.4-12.8 : NCCL_VERSION=v2.25.1-1
For CUDA 11.8 we use legacy NCCL_VERSION=v2.21.1-1
We use pinned version of NCCL rather then submodule.
Move nccl location from third_party/nccl/nccl to third_party/nccl

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @StrongerXi

pytorch-bot · 2025-01-30T21:09:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146073

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 15 Pending, 5 Unrelated Failures

As of commit aa9dd24 with merge base 858bc0c ():

NEW FAILURES - The following jobs have failed:

linux-binary-manywheel / manywheel-py3_9-rocm6_3-test (gh)
Unable to download artifact(s): Unable to download and extract artifact: Artifact download failed after 5 retries.
pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh)
test_fp8_matmul1 (torch.float8_e4m3fn)
trunk / win-vs2022-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) (gh)
test_cpp_extensions_jit.py::TestCppExtensionJIT::test_warning
windows-binary-wheel / wheel-py3_13t-cuda11_8-build (gh)
No files were found with the provided path: C:\actions-runner\_work\_temp/artifacts. No artifacts will be uploaded.

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

windows-binary-wheel / wheel-py3_10-cuda11_8-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
windows-binary-wheel / wheel-py3_11-cuda11_8-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
windows-binary-wheel / wheel-py3_12-cuda12_6-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
windows-binary-wheel / wheel-py3_13-cuda12_6-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
windows-binary-wheel / wheel-py3_13t-cuda12_4-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

.ci/manywheel/build_cuda.sh

.ci/docker/common/install_cuda.sh

malfet

Thank you, that should work for PyPI, but might be a bit challenging for Poetry, woudn't it? Why not just build nccl for 118 from source and publish the wheel to download.pytorch.org?

tinglvv · 2025-01-31T18:35:38Z

Hi @atalman, could you help add 12.8 to this PR as well. I'm planning to close https://github.com/pytorch/pytorch/pull/145776/files as it needed the NCCL submodule version update, which will be fixed by this PR.

Skylion007 · 2025-02-01T17:42:26Z

The reason we didn't do this before is NCCL 2.25.1 is not tested at Nvidia on the 11.8. It should be fine though, just something to flag (wouldn't be the first we time we ran NCCL on technically unsupported minor CUDA versions)

Ah nvm, i see Nvidia actually stopped publishing for CU11 and yanked the unsupported version (hence us building from scratch)

malfet · 2025-02-11T22:04:25Z

Ok, so the link problems you are seeing are because newer libnccl has two collectives.o, and looks like one of them is lost during the slimming, checking if undoing it will help

This reverts commit 8a30e7f.

This reverts commit eecee58. Reverted #146073 on behalf of https://github.com/atalman due to breaks Locally building benchmarks ([comment](#146073 (comment)))

atalman · 2025-02-19T03:50:25Z

@pytorchmergebot merge -f "Reverted because of false alarm"

pytorchmergebot · 2025-02-19T03:51:54Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

LunNova · 2025-02-20T22:30:44Z

This wastes some time checking out NCCL even for ROCM builds which will be using system RCCL instead.

Should resolve: #144768 We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1`` For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1`` We use pinned version of NCCL rather then submodule. Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl`` Pull Request resolved: #146073 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj

This reverts commit 06f4a5c. Reverted #146073 on behalf of https://github.com/atalman due to breaks macos builds: ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package ([comment](#146073 (comment)))

Should resolve: #144768 We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1`` For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1`` We use pinned version of NCCL rather then submodule. Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl`` Pull Request resolved: #146073 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj

This reverts commit eecee58. Reverted #146073 on behalf of https://github.com/atalman due to breaks Locally building benchmarks ([comment](#146073 (comment)))

Should resolve: #144768 We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1`` For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1`` We use pinned version of NCCL rather then submodule. Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl`` Pull Request resolved: #146073 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj

youreternity1997 · 2025-02-21T01:33:13Z

I installed the PyTorch cuda12.8 with Belta version, which come from “pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

It successfully installed a torch shown below:
torch 2.7.0.dev20250218+cu128
torchaudio 2.6.0.dev20250219+cu128
torchvision 0.22.0.dev20250219+cu128

It successfully installed a NCCL (2.25.1+cuda12.8) shown below:
dpkg-query --showformat=‘${Package} ${Version}\n’ --show libnccl2 libnccl-dev
libnccl-dev 2.25.1-1+cuda12.8
libnccl2 2.25.1-1+cuda12.8

However, when I run a code, it shows two errors.
First error: Why does the NCCL INFO show the NCCL version as 2.25.1+cuda12.2, as shown in the image below? Why?

Should resolve: pytorch#144768 We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1`` For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1`` We use pinned version of NCCL rather then submodule. Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl`` Pull Request resolved: pytorch#146073 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj

This reverts commit 06f4a5c. Reverted pytorch#146073 on behalf of https://github.com/atalman due to breaks macos builds: ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package ([comment](pytorch#146073 (comment)))

Should resolve: #144768 We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1`` For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1`` We use pinned version of NCCL rather then submodule. Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl`` Pull Request resolved: #146073 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj

This reverts commit eecee58. Reverted #146073 on behalf of https://github.com/atalman due to breaks Locally building benchmarks ([comment](#146073 (comment)))

Should resolve: #144768 We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1`` For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1`` We use pinned version of NCCL rather then submodule. Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl`` Pull Request resolved: #146073 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj

functionstackx · 2025-03-09T15:58:41Z

Note that there is a regression on 2.25 for RoCEv2 that isn't Spectrum-X

Generate source tarball with PEP 517 conform build tools instead of the custom routine in place right now. Closes #150461. The current procedure for generating the source tarball consists in creation of a source tree by manual copying and pruning of source files. This PR replaces that with a call to the standard [build tool](https://build.pypa.io/en/stable/), which works with the build backend to produce an sdist. For that to work correctly, the build backend also needs to be configured. In the case of Pytorch, the backend currently is (the legacy version of) the setuptools backend, the source dist part of which is mostly configured via the `MANIFEST.in` file. The resulting source distribution can be used to install directly from source with `pip install ./torch-{version}.tar.gz` or to build wheels directly from source with `pip wheel ./torch-{version}.tar.gz`; both should be considered experimental for now. ## Issues ### sdist name According to PEP 517, the name of the source distribution file must coincide with the project name, or [more precisely](https://peps.python.org/pep-0517/#source-distributions), the source distribution of a project that generates `{NAME}-{...}.whl` wheels are required to be named `{NAME}-{...}.tar.gz`. Currently, the source tarball is called `pytorch-{...}.tar.gz`, but the generated wheels and python package are called `torch-{...}`. ### Symbolic Links The source tree at the moment contains a small number of symbolic links. This [has been seen as problematic](pypa/pip#5919) largely because of lack of support on Windows, but also because of [a problem in setuptools](pypa/setuptools#4937). Particularly unfortunate is a circular symlink in the third party `ittapi` module, which can not be resolved by replacing it with a copy. PEP 721 (now integrated in the [Source Distribution Format Specification](https://packaging.python.org/en/latest/specifications/source-distribution-format/#source-distribution-archive-features)) allows for symbolic links, but only if they don't point outside the destination directory and if they don't contain `../` in their target. The list of symbolic links currently is as follows: <details> |source|target|problem|solution| |-|-|-|-| | `.dockerignore` | `.gitignore` | ✅ ok (individual file) || | `docs/requirements.txt` | `../.ci/docker/requirements-docs.txt` |❗`..` in target|swap source and target[^1]| | `functorch/docs/source/notebooks` | `../../notebooks/` |❗`..` in target|swap source and target[^1]| | `.github/ci_commit_pins/triton.txt` | `../../.ci/docker/ci_commit_pins/triton.txt` | ✅ ok (omitted from sdist)|| | `third_party/flatbuffers/docs/source/CONTRIBUTING.md` | `../../CONTRIBUTING.md` |❗`..` in target|omit from sdist[^2]| | `third_party/flatbuffers/java/src/test/java/DictionaryLookup` | `../../../../tests/DictionaryLookup` |❗`..` in target|omit from sdist[^3]| | `third_party/flatbuffers/java/src/test/java/MyGame` | `../../../../tests/MyGame` |❗`..` in target|omit from sdist[^3]| | `third_party/flatbuffers/java/src/test/java/NamespaceA` | `../../../../tests/namespace_test/NamespaceA` |❗`..` in target|omit from sdist[^3]| | `third_party/flatbuffers/java/src/test/java/NamespaceC` | `../../../../tests/namespace_test/NamespaceC` |❗`..` in target|omit from sdist[^3]| | `third_party/flatbuffers/java/src/test/java/optional_scalars` | `../../../../tests/optional_scalars` |❗`..` in target|omit from sdist[^3]| | `third_party/flatbuffers/java/src/test/java/union_vector` | `../../../../tests/union_vector` |❗`..` in target|omit from sdist[^3]| | `third_party/flatbuffers/kotlin/benchmark/src/jvmMain/java` | `../../../../java/src/main/java` |❗`..` in target|omit from sdist[^3]| | `third_party/ittapi/rust/ittapi-sys/c-library` | `../../` |❗`..` in target|omit from sdist[^4]| | `third_party/ittapi/rust/ittapi-sys/LICENSES` | `../../LICENSES` |❗`..` in target|omit from sdist[^4]| | `third_party/opentelemetry-cpp/buildscripts/pre-merge-commit` | `./pre-commit` |✅ ok (individual file)|| | `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-cmake/sample_client.cc` | `../../push/tests/integration/sample_client.cc` |❗`..` in target|omit from sdist[^5]| | `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-cmake/sample_server.cc` | `../../pull/tests/integration/sample_server.cc` |❗`..` in target|omit from sdist[^5]| | `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-pkgconfig/sample_client.cc` | `../../push/tests/integration/sample_client.cc` |❗`..` in target|omit from sdist[^5]| | `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-pkgconfig/sample_server.cc` | `../../pull/tests/integration/sample_server.cc` |❗`..` in target|omit from sdist[^5]| | `third_party/XNNPACK/tools/xngen` | `xngen.py` | ✅ ok (individual file)|| </details> The introduction of symbolic links inside the `.ci/docker` folder creates a new problem, however, because Docker's `COPY` command does not allow symlinks in this way. We work around that by using `tar ch` to dereference the symlinks before handing them over to `docker build`. [^1]: These resources can be naturally considered to be part of the docs, so moving the actual files into the place of the current symlinks and replacing them with (unproblematic) symlinks can be said to improve semantics as well. [^2]: The flatbuffers docs already actually use the original file, not the symlink and in the most recent releases, starting from flatbuffers-25.1.21 the symlink is replaced by the actual file thanks to a documentation overhaul. [^3]: These resources are flatbuffers tests for java and kotlin and can be omitted from our sdist. [^4]: We don't need to ship the rust bindings for ittapi. [^5]: These are demonstration examples for how to link to prometheus-cpp using cmake and can be omitted. ### Nccl Nccl used to be included as a submodule. However, with #146073 (first released in v2.7.0-rc1), the submodule was removed and replaced with a build time checkout procedure in `tools/build_pytorch_libs.py`, which checks out the required version of nccl from the upstream repository based on a commit pin recorded in `.ci/docker/ci_commit_pins/nccl-cu{11,12}.txt`. This means that a crucial third party dependency is missing from the source distribution and as the `.ci` folder is omitted from the source distribution, it is not possible to use the build time download. However, it *is* possible to use a system provided Nccl using the `USE_SYSTEM_NCCL` environment variable, which now also is the default for the official Pytorch wheels. Pull Request resolved: #152098 Approved by: https://github.com/atalman

pytorch-bot bot added the topic: not user facing topic category label Jan 30, 2025

atalman added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Jan 30, 2025

Skylion007 approved these changes Jan 30, 2025

View reviewed changes

atalman changed the title ~~[DRAFT] Nccl update to 2.25.1~~ Nccl update to 2.25.1 for all cuda versions Jan 31, 2025

atalman commented Jan 31, 2025

View reviewed changes

.ci/manywheel/build_cuda.sh Outdated Show resolved Hide resolved

atalman requested a review from malfet January 31, 2025 00:46

atalman marked this pull request as ready for review January 31, 2025 00:46

atalman requested review from a team and jeffdaily as code owners January 31, 2025 00:46

atalman commented Jan 31, 2025

View reviewed changes

.ci/docker/common/install_cuda.sh Show resolved Hide resolved

atalman changed the title ~~Nccl update to 2.25.1 for all cuda versions~~ Nccl update to 2.25.1 for cuda 11.8-12.6 Jan 31, 2025

atalman force-pushed the nccl_module_test branch from d14775d to 8a30e7f Compare January 31, 2025 14:15

malfet approved these changes Jan 31, 2025

View reviewed changes

atalman marked this pull request as draft January 31, 2025 19:52

atalman force-pushed the nccl_module_test branch from a87430b to 5dd816d Compare February 6, 2025 22:55

malfet added the no-runner-experiments Bypass Meta/LF runner determinator label Feb 11, 2025

pytorch-bot bot added the ciflow/inductor label Feb 12, 2025

atalman added 9 commits February 12, 2025 07:31

Nccl update to 2.25.1

e81cf07

Pin third_party nccl module to 2.21.5

e7c13f3

refactor

31c69d5

fix

44bfbb8

Remove maybe_libnccl_dev for cu11.8

f5f7b3c

Revert "Remove maybe_libnccl_dev for cu11.8"

a78009c

This reverts commit 8a30e7f.

try_nccl_checkout

267c576

test

17791ed

test

f37172d

pytorchmergebot reopened this Feb 18, 2025

tinglvv mentioned this pull request Feb 19, 2025

Add CUDA 12.8 windows nightly build #147037

Closed

pytorchmergebot added the merging label Feb 19, 2025

pytorchmergebot closed this in 4ece056 Feb 19, 2025

pytorchmergebot removed the merging label Feb 19, 2025

malfet mentioned this pull request Apr 7, 2025

Cannot checkout commits from when NCCL was still a submodule #150745

Closed

zklaus mentioned this pull request May 15, 2025

Add standard Python source distribution generation to (pre-)release workflow #152098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nccl update to 2.25.1 for cuda 12.4-12.8 #146073

Nccl update to 2.25.1 for cuda 12.4-12.8 #146073

Uh oh!

atalman commented Jan 30, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jan 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

malfet left a comment

Uh oh!

tinglvv commented Jan 31, 2025

Uh oh!

Skylion007 commented Feb 1, 2025 •

edited

Loading

Uh oh!

malfet commented Feb 11, 2025

Uh oh!

atalman commented Feb 19, 2025

Uh oh!

pytorchmergebot commented Feb 19, 2025

Uh oh!

LunNova commented Feb 20, 2025

Uh oh!

youreternity1997 commented Feb 21, 2025 •

edited

Loading

Uh oh!

functionstackx commented Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Nccl update to 2.25.1 for cuda 12.4-12.8 #146073

Nccl update to 2.25.1 for cuda 12.4-12.8 #146073

Uh oh!

Conversation

atalman commented Jan 30, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146073

❌ 4 New Failures, 15 Pending, 5 Unrelated Failures

Uh oh!

Uh oh!

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

tinglvv commented Jan 31, 2025

Uh oh!

Skylion007 commented Feb 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet commented Feb 11, 2025

Uh oh!

atalman commented Feb 19, 2025

Uh oh!

pytorchmergebot commented Feb 19, 2025

Merge started

Uh oh!

LunNova commented Feb 20, 2025

Uh oh!

youreternity1997 commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx commented Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

atalman commented Jan 30, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jan 30, 2025 •

edited

Loading

Skylion007 commented Feb 1, 2025 •

edited

Loading

youreternity1997 commented Feb 21, 2025 •

edited

Loading