Add TORCH_API annotations to c10d #59695

lw · 2021-06-09T08:23:03Z

Stack from ghstack:

Move c10d to libtorch(_cuda) #59563 Move c10d to libtorch(_cuda)
Add guards for USE_C10D_FOO in relevant c10d files #59697 Add guards for USE_C10D_FOO in relevant c10d files
Fix Windows guards in c10d #59696 Fix Windows guards in c10d
Add TORCH_API annotations to c10d #59695 Add TORCH_API annotations to c10d
Remove USE_CUDA from c10d reducer/logger #59562 Remove USE_CUDA from c10d reducer/logger
Remove USE_CUDA from ProcessGroupGloo #59561 Remove USE_CUDA from ProcessGroupGloo

c10d apparently doesn't have hidden visibility by default, but libtorch has, so in order to prepare c10d to be merged into libtorch we need to add these annotations. I've done this "as needed", seeing what would fail and fixing it. Windows is much more picky about it than Linux, hence there's more annotations in the Windows-only files (e.g., Gloo, as compared to NCCL).

Differential Revision: D28987578

c10d apparently doesn't have hidden visibility by default, but libtorch has, so in order to prepare c10d to be merged into libtorch we need to add these annotations. I've done this "as needed", seeing what would fail and fixing it. Windows is much more picky about it than Linux, hence there's more annotations in the Windows-only files (e.g., Gloo, as compared to NCCL). Differential Revision: [D28987578](https://our.internmc.facebook.com/intern/diff/D28987578/) [ghstack-poisoned]

facebook-github-bot · 2021-06-09T08:23:09Z

💊 CI failures summary and remediations

As of commit 8383207 (more details on the Dr. CI page):

2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda11.1_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

FAILED: bin/TCPStoreTest.exe

c10d.lib(ProcessGroupWrapper.cpp.obj) : error LNK2019: unresolved external symbol "struct at::indexing::EllipsisIndexType const at::indexing::Ellipsis" (?Ellipsis@indexing@at@@3UEllipsisIndexType@12@B) referenced in function "public: __cdecl at::indexing::TensorIndex::TensorIndex(char const *)" (??0TensorIndex@indexing@at@@QEAA@PEBD@Z)
c10d.lib(Utils.cpp.obj) : error LNK2001: unresolved external symbol "struct at::indexing::EllipsisIndexType const at::indexing::Ellipsis" (?Ellipsis@indexing@at@@3UEllipsisIndexType@12@B)
c10d.lib(ProcessGroup.cpp.obj) : error LNK2001: unresolved external symbol "struct at::indexing::EllipsisIndexType const at::indexing::Ellipsis" (?Ellipsis@indexing@at@@3UEllipsisIndexType@12@B)
c10d.lib(ProcessGroupGloo.cpp.obj) : error LNK2001: unresolved external symbol "struct at::indexing::EllipsisIndexType const at::indexing::Ellipsis" (?Ellipsis@indexing@at@@3UEllipsisIndexType@12@B)
bin\FileStoreTest.exe : fatal error LNK1120: 2 unresolved externals
[5560/5664] C:\Users\circleci\project\build\win_tmp\bin\sccache-cl.exe   /TP -DBUILDING_TESTS -DFMT_HEADER_ONLY=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTHP_BUILD_MAIN_LIB -DTH_BLAS_MKL -DUSE_C10D -DUSE_C10D_GLOO -DUSE_CUDA -DUSE_CUDNN -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_NUMPY -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -Dtorch_python_EXPORTS -Iaten\src -I..\aten\src -I. -I..\ -I..\cmake\..\third_party\benchmark\include -I..\cmake\..\third_party\cudnn_frontend\include -Icaffe2\contrib\aten -I..\third_party\onnx -Ithird_party\onnx -I..\third_party\foxi -Ithird_party\foxi -I..\torch\.. -I..\torch\..\aten\src -I..\torch\..\aten\src\TH -Icaffe2\aten\src -Ithird_party -I..\torch\..\third_party\valgrind-headers -I..\torch\..\third_party\gloo -I..\torch\..\third_party\onnx -I..\torch\csrc -I..\torch\csrc\api\include -I..\torch\lib -I..\torch\lib\libshm_windows -I..\torch\csrc\api -I..\c10\.. -Ithird_party\ideep\mkl-dnn\include -I..\third_party\ideep\mkl-dnn\src\..\include -I..\c10\cuda\..\.. -I..\third_party\fmt\include -I..\torch\lib\c10d\.. -Ithird_party\gloo -I..\cmake\..\third_party\gloo -I..\cmake\..\third_party\googletest\googlemock\include -I..\cmake\..\third_party\googletest\googletest\include -I..\third_party\protobuf\src -Iwin_tmp\mkl\include -I..\third_party\XNNPACK\include -I..\third_party -I..\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -I..\cmake\..\third_party\pybind11\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\include" -Iwin_tmp\magma\include -I..\third_party\ideep\mkl-dnn\include -I..\third_party\ideep\include -I"C:\Program Files\NVIDIA Corporation\NvToolsExt\include" /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/Users/circleci/project/build/win_tmp/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -std:c++14 /showIncludes /Focaffe2\torch\CMakeFiles\torch_python.dir\csrc\jit\passes\onnx\peephole.cpp.obj /Fdcaffe2\torch\CMakeFiles\torch_python.dir\ /FS -c ..\torch\csrc\jit\passes\onnx\peephole.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29337 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

[5561/5664] cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=caffe2\lib_c10d\test\CMakeFiles\TCPStoreTest.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  caffe2\lib_c10d\test\CMakeFiles\TCPStoreTest.dir\TCPStoreTest.cpp.obj  /out:bin\TCPStoreTest.exe /implib:lib\TCPStoreTest.lib /pdb:bin\TCPStoreTest.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /INCREMENTAL:NO /subsystem:console  lib\c10d.lib  lib\gtest_main.lib  lib\torch.lib  lib\torch_cuda.lib  lib\torch_cuda_cu.lib  lib\torch_cuda_cpp.lib  lib\torch_cpu.lib  lib\libprotobuf.lib  win_tmp\mkl\lib\mkl_intel_lp64.lib  win_tmp\mkl\lib\mkl_intel_thread.lib  win_tmp\mkl\lib\mkl_core.lib  win_tmp\mkl\lib\libiomp5md.lib  lib\dnnl.lib  -INCLUDE:?warp_size@cuda@at@@YAHXZ  lib\c10_cuda.lib  lib\c10.lib  "C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\x64\nvToolsExt64_1.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cufft.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\curand.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cublas.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudnn.lib"  -INCLUDE:?searchsorted_cuda@native@at@@YA?AVTensor@2@AEBV32@0_N1@Z  lib\gloo_cuda.lib  lib\gloo.lib  C:\Jenkins\Miniconda3\Library\lib\uv.lib  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudart.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudart_static.lib"  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
FAILED: bin/TCPStoreTest.exe 
cmd.exe /C "cd . && C:\Jenkins\Miniconda3\Library\bin\cmake.exe -E vs_link_exe --intdir=caffe2\lib_c10d\test\CMakeFiles\TCPStoreTest.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe  caffe2\lib_c10d\test\CMakeFiles\TCPStoreTest.dir\TCPStoreTest.cpp.obj  /out:bin\TCPStoreTest.exe /implib:lib\TCPStoreTest.lib /pdb:bin\TCPStoreTest.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /INCREMENTAL:NO /subsystem:console  lib\c10d.lib  lib\gtest_main.lib  lib\torch.lib  lib\torch_cuda.lib  lib\torch_cuda_cu.lib  lib\torch_cuda_cpp.lib  lib\torch_cpu.lib  lib\libprotobuf.lib  win_tmp\mkl\lib\mkl_intel_lp64.lib  win_tmp\mkl\lib\mkl_intel_thread.lib  win_tmp\mkl\lib\mkl_core.lib  win_tmp\mkl\lib\libiomp5md.lib  lib\dnnl.lib  -INCLUDE:?warp_size@cuda@at@@YAHXZ  lib\c10_cuda.lib  lib\c10.lib  "C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\x64\nvToolsExt64_1.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cufft.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\curand.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cublas.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudnn.lib"  -INCLUDE:?searchsorted_cuda@native@at@@YA?AVTensor@2@AEBV32@0_N1@Z  lib\gloo_cuda.lib  lib\gloo.lib  C:\Jenkins\Miniconda3\Library\lib\uv.lib  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudart.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudart_static.lib"  lib\gtest.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
LINK: command "C:\PROGRA~2\MICROS~2\2019\BUILDT~1\VC\Tools\MSVC\1428~1.293\bin\Hostx64\x64\link.exe caffe2\lib_c10d\test\CMakeFiles\TCPStoreTest.dir\TCPStoreTest.cpp.obj /out:bin\TCPStoreTest.exe /implib:lib\TCPStoreTest.lib /pdb:bin\TCPStoreTest.pdb /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /INCREMENTAL:NO /subsystem:console lib\c10d.lib lib\gtest_main.lib lib\torch.lib lib\torch_cuda.lib lib\torch_cuda_cu.lib lib\torch_cuda_cpp.lib lib\torch_cpu.lib lib\libprotobuf.lib win_tmp\mkl\lib\mkl_intel_lp64.lib win_tmp\mkl\lib\mkl_intel_thread.lib win_tmp\mkl\lib\mkl_core.lib win_tmp\mkl\lib\libiomp5md.lib lib\dnnl.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ lib\c10_cuda.lib lib\c10.lib C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\x64\nvToolsExt64_1.lib C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cufft.lib C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\curand.lib C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cublas.lib C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudnn.lib -INCLUDE:?searchsorted_cuda@native@at@@YA?AVTensor@2@AEBV32@0_N1@Z lib\gloo_cuda.lib lib\gloo.lib C:\Jenkins\Miniconda3\Library\lib\uv.lib C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudart.lib C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64\cudart_static.lib lib\gtest.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:bin\TCPStoreTest.exe.manifest" failed (exit code 1120) with the following output:
Microsoft (R) Incremental Linker Version 14.28.29337.0
Copyright (C) Microsoft Corporation.  All rights reserved.

torch_cpu.lib(torch_cpu.dll) : error LNK2005: "public: __cdecl at::Tensor::Tensor(class at::Tensor &&)" (??0Tensor@at@@QEAA@$$QEAV01@@Z) already defined in c10d.lib(ProcessGroupWrapper.cpp.obj)
torch_cpu.lib(torch_cpu.dll) : error LNK2005: "public: __cdecl c10::IValue::IValue(struct c10::IValue const &)" (??0IValue@c10@@QEAA@AEBU01@@Z) already defined in c10d.lib(ProcessGroupWrapper.cpp.obj)
torch_cpu.lib(torch_cpu.dll) : error LNK2005: "public: __cdecl c10::IValue::IValue(struct c10::IValue &&)" (??0IValue@c10@@QEAA@$$QEAU01@@Z) already defined in c10d.lib(ProcessGroupWrapper.cpp.obj)
torch_cpu.lib(torch_cpu.dll) : error LNK2005: "public: __cdecl c10::IValue::~IValue(void)" (??1IValue@c10@@QEAA@XZ) already defined in c10d.lib(ProcessGroupWrapper.cpp.obj)
torch_cpu.lib(torch_cpu.dll) : error LNK2005: "public: struct c10::IValue & __cdecl c10::IValue::operator=(struct c10::IValue const &)& " (??4IValue@c10@@QEGAAAEAU01@AEBU01@@Z) already defined in c10d.lib(ProcessGroupWrapper.cpp.obj)

1 failure not recognized by patterns:

Job	Step	Action
^{Windows CI (pytorch-win-vs2019-cpu-py3) / render_test_results}	^{Download PyTorch Test Reports}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

c10d apparently doesn't have hidden visibility by default, but libtorch has, so in order to prepare c10d to be merged into libtorch we need to add these annotations. I've done this "as needed", seeing what would fail and fixing it. Windows is much more picky about it than Linux, hence there's more annotations in the Windows-only files (e.g., Gloo, as compared to NCCL). Differential Revision: [D28987578](https://our.internmc.facebook.com/intern/diff/D28987578/) [ghstack-poisoned]

cbalioglu

LGTM

c10d apparently doesn't have hidden visibility by default, but libtorch has, so in order to prepare c10d to be merged into libtorch we need to add these annotations. I've done this "as needed", seeing what would fail and fixing it. Windows is much more picky about it than Linux, hence there's more annotations in the Windows-only files (e.g., Gloo, as compared to NCCL). Differential Revision: [D28987578](https://our.internmc.facebook.com/intern/diff/D28987578/) [ghstack-poisoned]

lw · 2021-06-10T12:07:24Z

It actually turns out that it's complicated to use TORCH_API while building a library that is not libtorch: by default that macro would translate to dllimport, which is not what we need because in fact here we need to export those symbols. But if we change that macro to become dllexport then it would look like c10d is (re-)exporting all the symbols of libtorch and thus create conflicts.

The proper solution would be to introduce a temporary C10D_API macro of sorts. However, instead, I'm just going to merge this PR into #59563. This will solve the issue.

lw requested review from H-Huang, cbalioglu, mingzhe09088, mrshenli, pritamdamania87, rohan-varma, wayi1 and zhaojuanmao as code owners June 9, 2021 08:23

facebook-github-bot added oncall: distributed Add this issue/PR to distributed oncall triage queue cla signed labels Jun 9, 2021

cbalioglu approved these changes Jun 9, 2021

View reviewed changes

lw added 2 commits June 10, 2021 02:01

lw closed this Jun 10, 2021

facebook-github-bot deleted the gh/lw/212/head branch July 10, 2021 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add TORCH_API annotations to c10d #59695

Add TORCH_API annotations to c10d #59695

Uh oh!

lw commented Jun 9, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 9, 2021 •

edited

Loading

Uh oh!

cbalioglu left a comment

Uh oh!

lw commented Jun 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add TORCH_API annotations to c10d #59695

Add TORCH_API annotations to c10d #59695

Uh oh!

Conversation

lw commented Jun 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda11.1_build (1/1)

1 failure not recognized by patterns:

Uh oh!

cbalioglu left a comment

Choose a reason for hiding this comment

Uh oh!

lw commented Jun 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lw commented Jun 9, 2021 •

edited

Loading

facebook-github-bot commented Jun 9, 2021 •

edited

Loading