KEMBAR78
torch_shm_manager: undefined reference to gloo · Issue #146239 · pytorch/pytorch · GitHub
Skip to content

torch_shm_manager: undefined reference to gloo #146239

@adamjstewart

Description

@adamjstewart

🐛 Describe the bug

I'm seeing PyTorch 2.6.0 build issues, but only when compiling with CUDA support and using the system gloo. The specific error message is:

  [7209/7228] Linking CXX executable bin/torch_shm_manager
  FAILED: bin/torch_shm_manager
  : && /builds/spack/spack/lib/spack/env/gcc/g++ -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-error=dangling-reference -Wno-error=redundant-move -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic     -Wl,--dependency-file=caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/link.d -Wl,--no-as-needed caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/manager.cpp.o -o bin/torch_shm_manager  -Wl,-rpath,/tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/cpuinfo-2024-09-26-6dpxpvi2tu4ifv4mkc7zyke7xb4exics/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/protobuf-3.13.0-hifms4p2pv4x7vh6gauu34hiatc42cpz/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/pthreadpool-2023-08-29-wr5i7t4emwmywdqm4yzzmoonjjwrwk7g/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/gloo-2023-12-03-sfe3fwtqo4p73apjqsmjoi2o7wj4ak75/lib:  lib/libshm.so  -lrt  lib/libc10.so  -Wl,-rpath-link,/tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/cpuinfo-2024-09-26-6dpxpvi2tu4ifv4mkc7zyke7xb4exics/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/protobuf-3.13.0-hifms4p2pv4x7vh6gauu34hiatc42cpz/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/pthreadpool-2023-08-29-wr5i7t4emwmywdqm4yzzmoonjjwrwk7g/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/gloo-2023-12-03-sfe3fwtqo4p73apjqsmjoi2o7wj4ak75/lib && /home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/cmake-3.31.4-hcuqrjfn4aeqmqd6xg7uaeqkihcg6ia4/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=bin/torch_shm_manager && :
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::allgather(gloo::AllgatherOptions&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::rendezvous::Store::~Store()'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::alltoallv(gloo::AlltoallvOptions&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::rendezvous::Context::connectFullMesh(gloo::rendezvous::Store&, std::shared_ptr<gloo::transport::Device>&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `typeinfo for gloo::rendezvous::Store'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::AllgathervOptions::setOutput(void*, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `vtable for gloo::rendezvous::PrefixStore'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::barrier(gloo::BarrierOptions&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::Context::getTimeout() const'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::allgatherv(gloo::AllgathervOptions&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::scatter(gloo::ScatterOptions&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::rendezvous::Context::Context(int, int, int)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::AlltoallvOptions::setInput(void*, std::vector<long, std::allocator<long> >, unsigned long)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::transport::tcp::CreateDevice(gloo::transport::tcp::attr const&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::BarrierOptions::BarrierOptions(std::shared_ptr<gloo::Context> const&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::Context::setTimeout(std::chrono::duration<long, std::ratio<1l, 1000l> >)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::AlltoallvOptions::setOutput(void*, std::vector<long, std::allocator<long> >, unsigned long)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::AllgathervOptions::setInput(void*, unsigned long, unsigned long)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::Context::createUnboundBuffer(void*, unsigned long)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::allreduce(gloo::AllreduceOptions const&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::broadcast(gloo::BroadcastOptions&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::alltoall(gloo::AlltoallOptions&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::reduce(gloo::ReduceOptions&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::gather(gloo::GatherOptions&)'
  /usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::rendezvous::PrefixStore::PrefixStore(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, gloo::rendezvous::Store&)'
  collect2: error: ld returned 1 exit status

The error occurs on both x86_64 and aarch64. It only occurs for PyTorch 2.6.0, not 2.5.1. And it only occurs when compiling with CUDA support, not for the CPU.

Attached is the full build log and environment variables needed for reproducibility:

Versions

Can't run script because it won't install, but here's a best attempt:

PyTorch version: 2.6.0
Is debug build: False
CUDA used to build PyTorch: 12.6.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04
GCC version: 13.2.0
Clang version: N/A
CMake version: 3.31.4
Libc version: ?

Python version: 3.12.8
Python platform: Linux
Is CUDA available: True
CUDA runtime version: 12.6.3?
CUDA_MODULE_LOADING set to: ?
GPU models and configuration: cuda_arch 8.0
Nvidia driver version: ?
cuDNN version: 8.9.7.29
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

cc @malfet @seemethere @ptrblck @msaroufim @eqy @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: buildBuild system issuesmodule: cudaRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions