KEMBAR78
Nccl update to 2.25.1 for cuda 12.4-12.8 by atalman · Pull Request #146073 · pytorch/pytorch · GitHub
Skip to content

Conversation

@atalman
Copy link
Contributor

@atalman atalman commented Jan 30, 2025

Should resolve: #144768
We use one common nccl version for cuda builds 12.4-12.8 : NCCL_VERSION=v2.25.1-1
For CUDA 11.8 we use legacy NCCL_VERSION=v2.21.1-1
We use pinned version of NCCL rather then submodule.
Move nccl location from third_party/nccl/nccl to third_party/nccl

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @StrongerXi

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Jan 30, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 30, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146073

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 15 Pending, 5 Unrelated Failures

As of commit aa9dd24 with merge base 858bc0c (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@atalman atalman added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Jan 30, 2025
@atalman atalman changed the title [DRAFT] Nccl update to 2.25.1 Nccl update to 2.25.1 for all cuda versions Jan 31, 2025
@atalman atalman requested a review from malfet January 31, 2025 00:46
@atalman atalman marked this pull request as ready for review January 31, 2025 00:46
@atalman atalman requested review from a team and jeffdaily as code owners January 31, 2025 00:46
@atalman atalman changed the title Nccl update to 2.25.1 for all cuda versions Nccl update to 2.25.1 for cuda 11.8-12.6 Jan 31, 2025
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that should work for PyPI, but might be a bit challenging for Poetry, woudn't it? Why not just build nccl for 118 from source and publish the wheel to download.pytorch.org?

@tinglvv
Copy link
Collaborator

tinglvv commented Jan 31, 2025

Hi @atalman, could you help add 12.8 to this PR as well. I'm planning to close https://github.com/pytorch/pytorch/pull/145776/files as it needed the NCCL submodule version update, which will be fixed by this PR.

@atalman atalman marked this pull request as draft January 31, 2025 19:52
@Skylion007
Copy link
Collaborator

Skylion007 commented Feb 1, 2025

The reason we didn't do this before is NCCL 2.25.1 is not tested at Nvidia on the 11.8. It should be fine though, just something to flag (wouldn't be the first we time we ran NCCL on technically unsupported minor CUDA versions)

Ah nvm, i see Nvidia actually stopped publishing for CU11 and yanked the unsupported version (hence us building from scratch)

@malfet malfet added the no-runner-experiments Bypass Meta/LF runner determinator label Feb 11, 2025
@malfet
Copy link
Contributor

malfet commented Feb 11, 2025

Ok, so the link problems you are seeing are because newer libnccl has two collectives.o, and looks like one of them is lost during the slimming, checking if undoing it will help

pytorchmergebot added a commit that referenced this pull request Feb 18, 2025
This reverts commit eecee58.

Reverted #146073 on behalf of https://github.com/atalman due to breaks Locally building benchmarks ([comment](#146073 (comment)))
@atalman
Copy link
Contributor Author

atalman commented Feb 19, 2025

@pytorchmergebot merge -f "Reverted because of false alarm"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@LunNova
Copy link

LunNova commented Feb 20, 2025

This wastes some time checking out NCCL even for ROCM builds which will be using system RCCL instead.

Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
Should resolve: #144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: #146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
This reverts commit 06f4a5c.

Reverted #146073 on behalf of https://github.com/atalman due to breaks macos builds: ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package ([comment](#146073 (comment)))
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
Should resolve: #144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: #146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
This reverts commit eecee58.

Reverted #146073 on behalf of https://github.com/atalman due to breaks Locally building benchmarks ([comment](#146073 (comment)))
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
Should resolve: #144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: #146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
@youreternity1997
Copy link

youreternity1997 commented Feb 21, 2025

I installed the PyTorch cuda12.8 with Belta version, which come from “pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

It successfully installed a torch shown below:
torch 2.7.0.dev20250218+cu128
torchaudio 2.6.0.dev20250219+cu128
torchvision 0.22.0.dev20250219+cu128

It successfully installed a NCCL (2.25.1+cuda12.8) shown below:
dpkg-query --showformat=‘${Package} ${Version}\n’ --show libnccl2 libnccl-dev
libnccl-dev 2.25.1-1+cuda12.8
libnccl2 2.25.1-1+cuda12.8

However, when I run a code, it shows two errors.
First error: Why does the NCCL INFO show the NCCL version as 2.25.1+cuda12.2, as shown in the image below? Why?
image (4)

Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Feb 24, 2025
Should resolve: pytorch#144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: pytorch#146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Feb 24, 2025
This reverts commit 06f4a5c.

Reverted pytorch#146073 on behalf of https://github.com/atalman due to breaks macos builds: ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package ([comment](pytorch#146073 (comment)))
pytorch-bot bot pushed a commit that referenced this pull request Feb 24, 2025
Should resolve: #144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: #146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
pytorch-bot bot pushed a commit that referenced this pull request Feb 24, 2025
This reverts commit eecee58.

Reverted #146073 on behalf of https://github.com/atalman due to breaks Locally building benchmarks ([comment](#146073 (comment)))
pytorch-bot bot pushed a commit that referenced this pull request Feb 24, 2025
Should resolve: #144768
We use one common nccl version for cuda builds 12.4-12.8 : ``NCCL_VERSION=v2.25.1-1``
For CUDA 11.8 we use legacy ``NCCL_VERSION=v2.21.1-1``
We use pinned version of NCCL rather then submodule.
Move nccl location from ``third_party/nccl/nccl`` to ``third_party/nccl``

Pull Request resolved: #146073
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kwen2501, https://github.com/fduwjj
@functionstackx
Copy link
Contributor

Note that there is a regression on 2.25 for RoCEv2 that isn't Spectrum-X

pytorchmergebot pushed a commit that referenced this pull request Jun 30, 2025
Generate source tarball with PEP 517 conform build tools instead of the custom routine in place right now.

Closes #150461.

The current procedure for generating the source tarball consists in creation of a source tree by manual copying and pruning of source files.

This PR replaces that with a call to the standard [build tool](https://build.pypa.io/en/stable/), which works with the build backend to produce an sdist. For that to work correctly, the build backend also needs to be configured. In the case of Pytorch, the backend currently is (the legacy version of) the setuptools backend, the source dist part of which is mostly configured via the `MANIFEST.in` file.

The resulting source distribution can be used to install directly from source with `pip install ./torch-{version}.tar.gz` or to build wheels directly from source with `pip wheel ./torch-{version}.tar.gz`; both should be considered experimental for now.

## Issues

### sdist name
According to PEP 517, the name of the source distribution file must coincide with the project name, or [more precisely](https://peps.python.org/pep-0517/#source-distributions), the source distribution of a project that generates `{NAME}-{...}.whl` wheels are required to be named `{NAME}-{...}.tar.gz`. Currently, the source tarball is called `pytorch-{...}.tar.gz`, but the generated wheels and python package are called `torch-{...}`.

### Symbolic Links
The source tree at the moment contains a small number of symbolic links. This [has been seen as problematic](pypa/pip#5919) largely because of lack of support on Windows, but also because of [a problem in setuptools](pypa/setuptools#4937). Particularly unfortunate is a circular symlink in the third party `ittapi` module, which can not be resolved by replacing it with a copy.

PEP 721 (now integrated in the [Source Distribution Format Specification](https://packaging.python.org/en/latest/specifications/source-distribution-format/#source-distribution-archive-features)) allows for symbolic links, but only if they don't point outside the destination directory and if they don't contain `../` in their target.

The list of symbolic links currently is as follows:

<details>

|source|target|problem|solution|
|-|-|-|-|
| `.dockerignore` | `.gitignore` | ✅ ok (individual file) ||
| `docs/requirements.txt` | `../.ci/docker/requirements-docs.txt` |❗`..` in target|swap source and target[^1]|
| `functorch/docs/source/notebooks` | `../../notebooks/` |❗`..` in target|swap source and target[^1]|
| `.github/ci_commit_pins/triton.txt` | `../../.ci/docker/ci_commit_pins/triton.txt` | ✅ ok (omitted from sdist)||
| `third_party/flatbuffers/docs/source/CONTRIBUTING.md` | `../../CONTRIBUTING.md` |❗`..` in target|omit from sdist[^2]|
| `third_party/flatbuffers/java/src/test/java/DictionaryLookup` | `../../../../tests/DictionaryLookup` |❗`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/MyGame` | `../../../../tests/MyGame` |❗`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/NamespaceA` | `../../../../tests/namespace_test/NamespaceA` |❗`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/NamespaceC` | `../../../../tests/namespace_test/NamespaceC` |❗`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/optional_scalars` | `../../../../tests/optional_scalars` |❗`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/java/src/test/java/union_vector` | `../../../../tests/union_vector` |❗`..` in target|omit from sdist[^3]|
| `third_party/flatbuffers/kotlin/benchmark/src/jvmMain/java` | `../../../../java/src/main/java` |❗`..` in target|omit from sdist[^3]|
| `third_party/ittapi/rust/ittapi-sys/c-library` | `../../` |❗`..` in target|omit from sdist[^4]|
| `third_party/ittapi/rust/ittapi-sys/LICENSES` | `../../LICENSES` |❗`..` in target|omit from sdist[^4]|
| `third_party/opentelemetry-cpp/buildscripts/pre-merge-commit` | `./pre-commit` |✅ ok (individual file)||
| `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-cmake/sample_client.cc` | `../../push/tests/integration/sample_client.cc` |❗`..` in target|omit from sdist[^5]|
| `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-cmake/sample_server.cc` | `../../pull/tests/integration/sample_server.cc` |❗`..` in target|omit from sdist[^5]|
| `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-pkgconfig/sample_client.cc` | `../../push/tests/integration/sample_client.cc` |❗`..` in target|omit from sdist[^5]|
| `third_party/opentelemetry-cpp/third_party/prometheus-cpp/cmake/project-import-pkgconfig/sample_server.cc` | `../../pull/tests/integration/sample_server.cc` |❗`..` in target|omit from sdist[^5]|
| `third_party/XNNPACK/tools/xngen` | `xngen.py` | ✅ ok (individual file)||

</details>

The introduction of symbolic links inside the `.ci/docker` folder creates a new problem, however, because Docker's `COPY` command does not allow symlinks in this way. We work around that by using `tar ch` to dereference the symlinks before handing them over to `docker build`.

[^1]: These resources can be naturally considered to be part of the docs, so moving the actual files into the place of the current symlinks and replacing them with (unproblematic) symlinks can be said to improve semantics as well.

[^2]: The flatbuffers docs already actually use the original file, not the symlink and in the most recent releases, starting from flatbuffers-25.1.21 the symlink is replaced by the actual file thanks to a documentation overhaul.

[^3]: These resources are flatbuffers tests for java and kotlin and can be omitted from our sdist.

[^4]: We don't need to ship the rust bindings for ittapi.

[^5]: These are demonstration examples for how to link to prometheus-cpp using cmake and can be omitted.

### Nccl
Nccl used to be included as a submodule. However, with #146073 (first released in v2.7.0-rc1), the submodule was removed and replaced with a build time checkout procedure in `tools/build_pytorch_libs.py`, which checks out the required version of nccl from the upstream repository based on a commit pin recorded in `.ci/docker/ci_commit_pins/nccl-cu{11,12}.txt`.
This means that a crucial third party dependency is missing from the source distribution and as the `.ci` folder is omitted from the source distribution, it is not possible to use the build time download.
However, it *is* possible to use a system provided Nccl using the `USE_SYSTEM_NCCL` environment variable, which now also is the default for the official Pytorch wheels.

Pull Request resolved: #152098
Approved by: https://github.com/atalman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: dynamo no-runner-experiments Bypass Meta/LF runner determinator oncall: distributed Add this issue/PR to distributed oncall triage queue Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is it possible to remove NCCL submodule and use only nccl binaries from pypi instead ?

10 participants