-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Forward Fix][PGNCCL] Add define guard for NCCL_SPLIT_NOCOLOR #138488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138488
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7ed82a8 with merge base 195d0a6 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
maybe next time, you want to import into fbcode and wait for internal CI as well? |
|
We do need this release note tag. Since it is not user facing, it won't get mentioned in the release note. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
- Added default value for `nccl_nonblocking_timeout` (30 mins, previous: -1). - Reuse C10D_CHECK_TIMEOUT in other CHECK macros Pull Request resolved: #138374 Approved by: https://github.com/eqy ghstack dependencies: #137855, #138488
Previously we only wait for comm to become ready after its initialization. That's not enough. There are other NCCL APIs that can cause the comm to be InProgress, e.g. P2P calls, commSplit, commFinalize, etc. Therefore, we just ensure comm is ready every "next time" we need to access ncclComm. The place to add such gate keeper is `getNcclComm`. Pull Request resolved: #138384 Approved by: https://github.com/shuqiangzhang, https://github.com/fduwjj ghstack dependencies: #137855, #138488, #138374
### Why use non-blocking mode in eager init? For overlapping comm init and model init, etc.  ### Why can we set non-blocking as default? If the setting is dangling -- i.e. not passed in by user nor set via env -- `ProcessGroupNCCL` can have some preferred logic. And torch-level API semantics does not change whether the NCCL comm is blocking or non-blocking (handled within `ProcessGroupNCCL`). ### Why not make non-blocking default for lazy mode as well? PR #137544 tried it. Two reasons why that's not preferred today: 1. It is hard -- too big a blast. 2. There is no gain by doing lazy init in non-blocking mode, because the right next CPU call is a collective, and we will block there waiting for comm to be ready, so same effect as blocked init, no "opening" compared to eager mode. Pull Request resolved: #138527 Approved by: https://github.com/wconstab ghstack dependencies: #137855, #138488, #138374, #138384
Forward fix for build issue introduced by #137855: ``` In file included from fbcode/caffe2/torch/csrc/distributed/c10d/NCCLUtils.cpp:2: fbcode/caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp:508:21: error: use of undeclared identifier 'NCCL_SPLIT_NOCOLOR' 508 | int split_color{NCCL_SPLIT_NOCOLOR - 1}; | ^ ``` Pull Request resolved: #138488 Approved by: https://github.com/fduwjj ghstack dependencies: #137855
- Added default value for `nccl_nonblocking_timeout` (30 mins, previous: -1). - Reuse C10D_CHECK_TIMEOUT in other CHECK macros Pull Request resolved: #138374 Approved by: https://github.com/eqy ghstack dependencies: #137855, #138488
Previously we only wait for comm to become ready after its initialization. That's not enough. There are other NCCL APIs that can cause the comm to be InProgress, e.g. P2P calls, commSplit, commFinalize, etc. Therefore, we just ensure comm is ready every "next time" we need to access ncclComm. The place to add such gate keeper is `getNcclComm`. Pull Request resolved: #138384 Approved by: https://github.com/shuqiangzhang, https://github.com/fduwjj ghstack dependencies: #137855, #138488, #138374
### Why use non-blocking mode in eager init? For overlapping comm init and model init, etc.  ### Why can we set non-blocking as default? If the setting is dangling -- i.e. not passed in by user nor set via env -- `ProcessGroupNCCL` can have some preferred logic. And torch-level API semantics does not change whether the NCCL comm is blocking or non-blocking (handled within `ProcessGroupNCCL`). ### Why not make non-blocking default for lazy mode as well? PR #137544 tried it. Two reasons why that's not preferred today: 1. It is hard -- too big a blast. 2. There is no gain by doing lazy init in non-blocking mode, because the right next CPU call is a collective, and we will block there waiting for comm to be ready, so same effect as blocked init, no "opening" compared to eager mode. Pull Request resolved: #138527 Approved by: https://github.com/wconstab ghstack dependencies: #137855, #138488, #138374, #138384
Stack from ghstack (oldest at bottom):
nccl_nonblocking_timeout#138374Forward fix for build issue introduced by #137855:
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o