-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Description
🐛 Describe the bug
I'm seeing PyTorch 2.6.0 build issues, but only when compiling with CUDA support and using the system gloo. The specific error message is:
[7209/7228] Linking CXX executable bin/torch_shm_manager
FAILED: bin/torch_shm_manager
: && /builds/spack/spack/lib/spack/env/gcc/g++ -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-error=dangling-reference -Wno-error=redundant-move -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic -Wl,--dependency-file=caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/link.d -Wl,--no-as-needed caffe2/torch/lib/libshm/CMakeFiles/torch_shm_manager.dir/manager.cpp.o -o bin/torch_shm_manager -Wl,-rpath,/tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/cpuinfo-2024-09-26-6dpxpvi2tu4ifv4mkc7zyke7xb4exics/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/protobuf-3.13.0-hifms4p2pv4x7vh6gauu34hiatc42cpz/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/pthreadpool-2023-08-29-wr5i7t4emwmywdqm4yzzmoonjjwrwk7g/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/gloo-2023-12-03-sfe3fwtqo4p73apjqsmjoi2o7wj4ak75/lib: lib/libshm.so -lrt lib/libc10.so -Wl,-rpath-link,/tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/cpuinfo-2024-09-26-6dpxpvi2tu4ifv4mkc7zyke7xb4exics/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/protobuf-3.13.0-hifms4p2pv4x7vh6gauu34hiatc42cpz/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/pthreadpool-2023-08-29-wr5i7t4emwmywdqm4yzzmoonjjwrwk7g/lib:/home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/gloo-2023-12-03-sfe3fwtqo4p73apjqsmjoi2o7wj4ak75/lib && /home/software/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeh/linux-ubuntu24.04-x86_64_v3/gcc-13.2.0/cmake-3.31.4-hcuqrjfn4aeqmqd6xg7uaeqkihcg6ia4/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=bin/torch_shm_manager && :
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::allgather(gloo::AllgatherOptions&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::rendezvous::Store::~Store()'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::alltoallv(gloo::AlltoallvOptions&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::rendezvous::Context::connectFullMesh(gloo::rendezvous::Store&, std::shared_ptr<gloo::transport::Device>&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `typeinfo for gloo::rendezvous::Store'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::AllgathervOptions::setOutput(void*, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `vtable for gloo::rendezvous::PrefixStore'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::barrier(gloo::BarrierOptions&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::Context::getTimeout() const'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::allgatherv(gloo::AllgathervOptions&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::scatter(gloo::ScatterOptions&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::rendezvous::Context::Context(int, int, int)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::AlltoallvOptions::setInput(void*, std::vector<long, std::allocator<long> >, unsigned long)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::transport::tcp::CreateDevice(gloo::transport::tcp::attr const&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::BarrierOptions::BarrierOptions(std::shared_ptr<gloo::Context> const&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::Context::setTimeout(std::chrono::duration<long, std::ratio<1l, 1000l> >)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::AlltoallvOptions::setOutput(void*, std::vector<long, std::allocator<long> >, unsigned long)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::AllgathervOptions::setInput(void*, unsigned long, unsigned long)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::Context::createUnboundBuffer(void*, unsigned long)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::allreduce(gloo::AllreduceOptions const&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::broadcast(gloo::BroadcastOptions&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::alltoall(gloo::AlltoallOptions&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::reduce(gloo::ReduceOptions&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::gather(gloo::GatherOptions&)'
/usr/bin/ld: /tmp/root/spack-stage/spack-stage-py-torch-2.6.0-kkm3ehmxkqnsoh6tob3zmjtb54eprcnw/spack-src/build/lib/libtorch_cpu.so: undefined reference to `gloo::rendezvous::PrefixStore::PrefixStore(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, gloo::rendezvous::Store&)'
collect2: error: ld returned 1 exit status
The error occurs on both x86_64 and aarch64. It only occurs for PyTorch 2.6.0, not 2.5.1. And it only occurs when compiling with CUDA support, not for the CPU.
Attached is the full build log and environment variables needed for reproducibility:
Versions
Can't run script because it won't install, but here's a best attempt:
PyTorch version: 2.6.0
Is debug build: False
CUDA used to build PyTorch: 12.6.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 24.04
GCC version: 13.2.0
Clang version: N/A
CMake version: 3.31.4
Libc version: ?
Python version: 3.12.8
Python platform: Linux
Is CUDA available: True
CUDA runtime version: 12.6.3?
CUDA_MODULE_LOADING set to: ?
GPU models and configuration: cuda_arch 8.0
Nvidia driver version: ?
cuDNN version: 8.9.7.29
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
cc @malfet @seemethere @ptrblck @msaroufim @eqy @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o