KEMBAR78
Move to small wheel approach for CUDA SBSA wheel by tinglvv · Pull Request #160720 · pytorch/pytorch · GitHub
Skip to content

Conversation

@tinglvv
Copy link
Collaborator

@tinglvv tinglvv commented Aug 15, 2025

#160673

Use download.pytorch.org's dependencies like x86 build instead of bundling libs into the wheel

cc @seemethere @malfet @atalman @ptrblck @msaroufim @eqy @jerryzh168 @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01 @nWEIdia

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160720

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 9 Pending, 16 Unrelated Failures

As of commit 3f3d1b2 with merge base 8906266 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@tinglvv tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Aug 15, 2025
"/usr/local/lib/libnvpl_lapack_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_blas_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_lapack_core.so.0",
"/usr/local/lib/libnvpl_blas_core.so.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are nvpl-blas nvpl-lapack in PyPi, so we can utilize those as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Aidyn! let me try pulling those two as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have to do it on a later stage, haven't tested nvpl-blas and nvpl-lapack with pypi before.

@Aidyn-A Aidyn-A added module: binaries Anything related to official binaries that we release to users module: cuda Related to torch.cuda, and CUDA support in general enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 release notes: cuda release notes category labels Aug 15, 2025
@tinglvv tinglvv changed the title Move to small wheel approach for CUDA SBSA wheel [WIP] Move to small wheel approach for CUDA SBSA wheel Aug 15, 2025
@tinglvv
Copy link
Collaborator Author

tinglvv commented Aug 15, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased sbsa-small-wheel onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout sbsa-small-wheel && git pull --rebase)

@ptrblck ptrblck moved this to In Progress in PyTorch + CUDA Aug 22, 2025
@tinglvv tinglvv marked this pull request as ready for review September 4, 2025 17:52
@tinglvv tinglvv requested a review from a team as a code owner September 4, 2025 17:52
@tinglvv
Copy link
Collaborator Author

tinglvv commented Sep 4, 2025

Build for sbsa build was successful. If just install through pip install without any extra instruction, would default install from NCCL on PyPI, which would encounter error and face security risks. Should use pip install torch-2.9.0.dev20250904+cu130-cp312-cp312-manylinux_2_28_aarch64.whl --index-url https://download.pytorch.org/whl/nightly/cu130

root@03cd90fca0f1:/tmp/artifacts# pip install torch-2.9.0.dev20250904+cu130-cp312-cp312-manylinux_2_28_aarch64.whl 
...
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement nvidia-nccl-cu13==2.27.7; platform_system == "Linux" and platform_machine == "aarch64" (from torch) (from versions: 0.0.0a0)

[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: pip install --upgrade pip
ERROR: No matching distribution found for nvidia-nccl-cu13==2.27.7; platform_system == "Linux" and platform_machine == "aarch64"

@tinglvv
Copy link
Collaborator Author

tinglvv commented Sep 8, 2025

After successful installation, seeing this error

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/torch/__init__.py", line 323, in _load_global_deps
    ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/local/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcudart.so.13: cannot open shared object file: No such file or directory

1st check: RPATH not set

readelf -d /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch.so 

Dynamic section at offset 0x3fc78 contains 31 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libtorch_cpu.so]
 0x0000000000000001 (NEEDED)             Shared library: [libtorch_cuda.so]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-aarch64.so.1]
 0x000000000000000e (SONAME)             Library soname: [libtorch.so]
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN:/usr/local/cuda/lib64]

2nd check:
Through LD_DEBUG, found that the libtorch_global_deps.so need to be patched with RPATH fix, instead of just libtorch.so and libtorch_python.so:

root@b4c39583b159:/usr/local/lib/python3.12/site-packages/torch/lib# LD_DEBUG=libs python3 -c "import torch" 2>&1 | grep -A5 -B5 cudart
       409:     calling init: /usr/local/lib/python3.12/lib-dynload/_datetime.cpython-312-aarch64-linux-gnu.so
       409:
       409:
       409:     calling init: /usr/local/lib/python3.12/lib-dynload/array.cpython-312-aarch64-linux-gnu.so
       409:
       409:     find library=libcudart.so.13 [0]; searching
       409:      search path=/usr/local/lib/python3.12/site-packages/torch/lib:/usr/local/cuda/lib64               (RUNPATH from file /usr/local/lib/python3.12/site-packages/torch/lib/libtorch_global_deps.so)
       409:       trying file=/usr/local/lib/python3.12/site-packages/torch/lib/libcudart.so.13
       409:       trying file=/usr/local/cuda/lib64/libcudart.so.13
       409:      search cache=/etc/ld.so.cache
       409:      search path=/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu:/lib:/usr/lib               (system search path)
       409:       trying file=/lib/aarch64-linux-gnu/libcudart.so.13
       409:       trying file=/usr/lib/aarch64-linux-gnu/libcudart.so.13
       409:       trying file=/lib/libcudart.so.13
       409:       trying file=/usr/lib/libcudart.so.13
       409:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/torch/__init__.py", line 323, in _load_global_deps
    ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/local/lib/python3.12/ctypes/__init__.py", line 379, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcudart.so.13: cannot open shared object file: No such file or directory

@tinglvv
Copy link
Collaborator Author

tinglvv commented Sep 8, 2025

After fixing RPATH with , new error

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/torch/__init__.py", line 415, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: libtorch.so: cannot open shared object file: No such file or directory

Through LD_DEBUG, can see that libtorch_python.so is loading libtorch.so, so need to fix RPATH for libtorch_python.so

root@b4c39583b159:/usr/local/lib/python3.12/site-packages/torch/lib# LD_DEBUG=libs python3 -c "import torch" 2>&1 | grep -A3 -B3 libtorch.so
       424:      search path=/usr/local/lib/python3.12/site-packages/torch/lib             (RPATH from file /usr/local/lib/python3.12/site-packages/torch/lib/libtorch_global_deps.so)
       424:       trying file=/usr/local/lib/python3.12/site-packages/torch/lib/libtorch_python.so
       424:
       424:     find library=libtorch.so [0]; searching

@tinglvv
Copy link
Collaborator Author

tinglvv commented Sep 8, 2025

Need to fix the RPATH for all libs loading pypi deps in

root@57ad84516286:/usr/local/lib/python3.12/site-packages/torch/lib# ls
libarm_compute.so        libnvpl_blas_core.so.0         libtorch.so
libarm_compute_graph.so  libnvpl_blas_lp64_gomp.so.0    libtorch_cpu.so
libc10.so                libnvpl_lapack_core.so.0       libtorch_cuda.so
libc10_cuda.so           libnvpl_lapack_lp64_gomp.so.0  libtorch_cuda_linalg.so
libcaffe2_nvrtc.so       libshm                         libtorch_global_deps.so
libgfortran.so.5         libshm.so                      libtorch_nvshmem.so
libgomp.so.1             libshm_windows                 libtorch_python.so

@tinglvv
Copy link
Collaborator Author

tinglvv commented Sep 8, 2025

Moving from zip to wheel pack will be in Andrey's PR #159481

@atalman atalman changed the title [WIP] Move to small wheel approach for CUDA SBSA wheel Move to small wheel approach for CUDA SBSA wheel Sep 8, 2025
@tinglvv
Copy link
Collaborator Author

tinglvv commented Sep 9, 2025

Verified 13.0 wheel passes.

Follow up:

  1. generalize the logic for cu13 wheels (to be compatible with future 13.1, etc).
  2. metadata needs change for all cuda dependencies:
    Requires-Dist: nvidia-cuda-nvrtc==13.0.48; platform_system == "Linux" and platform_machine == "aarch64" ->
    Requires-Dist: nvidia-cuda-nvrtc==13.0.48; platform_system == "Linux"

@atalman
Copy link
Contributor

atalman commented Sep 9, 2025

@pytorchmergebot merge -f "lint and build looks good"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor

huydhn commented Sep 11, 2025

I'm also seeing an import error trying to load libarm_compute.so when building domain library, i.e. https://github.com/pytorch/vision/actions/runs/17612462583/job/50037380520#step:15:61

import torch
  File "/__w/_temp/conda_environment_17612462583/lib/python3.10/site-packages/torch/__init__.py", line 415, in <module>
    from torch._C import *  # noqa: F403
ImportError: libarm_compute.so: cannot open shared object file: No such file or directory

It's probably coming from this PR. Any thoughts? @Aidyn-A @tinglvv

@Aidyn-A
Copy link
Collaborator

Aidyn-A commented Sep 11, 2025

I'm also seeing an import error trying to load libarm_compute.so when building domain library, i.e. https://github.com/pytorch/vision/actions/runs/17612462583/job/50037380520#step:15:61


import torch

  File "/__w/_temp/conda_environment_17612462583/lib/python3.10/site-packages/torch/__init__.py", line 415, in <module>

    from torch._C import *  # noqa: F403

ImportError: libarm_compute.so: cannot open shared object file: No such file or directory

It's probably coming from this PR. Any thoughts? @Aidyn-A @tinglvv

Yes, we are aware of that. @atalman has fixed it in #162566

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
pytorch#160673

Use download.pytorch.org's dependencies like x86 build instead of bundling libs into the wheel
Pull Request resolved: pytorch#160720
Approved by: https://github.com/atalman
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
pytorch#160673

Use download.pytorch.org's dependencies like x86 build instead of bundling libs into the wheel
Pull Request resolved: pytorch#160720
Approved by: https://github.com/atalman
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
pytorch#160673

Use download.pytorch.org's dependencies like x86 build instead of bundling libs into the wheel
Pull Request resolved: pytorch#160720
Approved by: https://github.com/atalman
@atalman atalman removed this from PyTorch + CUDA Sep 26, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
pytorch#160673

Use download.pytorch.org's dependencies like x86 build instead of bundling libs into the wheel
Pull Request resolved: pytorch#160720
Approved by: https://github.com/atalman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR enhancement Not as big of a feature, but technically not a bug. Should be easy to fix Merged module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 module: binaries Anything related to official binaries that we release to users module: cuda Related to torch.cuda, and CUDA support in general open source release notes: cuda release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants