KEMBAR78
[NCCL 2.19.3] Performance reduction when using NVLSTree algorithm when running nccl-tests all-reduce · Issue #117748 · pytorch/pytorch · GitHub
Skip to content

[NCCL 2.19.3] Performance reduction when using NVLSTree algorithm when running nccl-tests all-reduce  #117748

@atalman

Description

@atalman

🐛 Describe the bug

Please refer to this PR: NVIDIA/nccl#1112

We have noticed a significant performance reduction when using NVLSTree algorithm when running nccl-tests all-reduce on a cluster of 8 AWS P5 nodes.

nccl-tests 4GB all-reduce output:

                                                                  out-of-place                       in-place
           size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
            (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
     4294967296    1073741824     float     sum      -1    81146   52.93  104.20    N/A    80945   53.06  104.46    N/A

after fix is applied

                                                                  out-of-place                       in-place
           size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
            (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
     4294967296    1073741824     float     sum      -1    30222  142.11  279.78    N/A    30309  141.70  278.98    N/A

Possible solutions:
We build nccl from source after this PR: pytorch/builder#1670
Hence cherry-pick of the NVIDIA/nccl#1112 fix would be needed
We would need to use nccl branch with the cherry pick included here: https://github.com/pytorch/builder/blob/main/common/install_cuda.sh#L48

@ptrblck Can you recommend minimal repo steps for this issue so we can confirm the issue and resolution ?

cc @ptrblck @malfet

Versions

2.2.0
nightly

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: ncclProblems related to nccl supporttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    Cold Storage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions