Move CUDA-related stuff of TP agent to separate file #59377

lw · 2021-06-03T14:47:28Z

Stack from ghstack:

Prepare for TensorPipe separating its CUDA-specific headers #59788 Prepare for TensorPipe separating its CUDA-specific headers
Move CUDA-related stuff of TP agent to separate file #59377 Move CUDA-related stuff of TP agent to separate file
Make CUDA serde support for TP agent pluggable #59376 Make CUDA serde support for TP agent pluggable

This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python.

Differential Revision: D28796429

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)! [ghstack-poisoned]

facebook-github-bot · 2021-06-03T14:47:37Z

💊 CI failures summary and remediations

As of commit 53dea66 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)! [ghstack-poisoned]

Pull Request resolved: #59377 This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. ghstack-source-id: 130489657 Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)!

This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)! [ghstack-poisoned]

Pull Request resolved: #59377 This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. ghstack-source-id: 130495277 Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)!

This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)! [ghstack-poisoned]

Pull Request resolved: #59377 This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. ghstack-source-id: 130583769 Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)!

cbalioglu

LGTM

Nit: you might also "fix" the loop at line 138 in tensorpipe_agent.cpp while you are here.

#include <c10/util/irange.h>

for(const auto idx : c10::irange(indexBitset.size())) {
  ...
}

This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)! [ghstack-poisoned]

Pull Request resolved: #59377 This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. ghstack-source-id: 130589647 Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)!

This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)! [ghstack-poisoned]

codecov · 2021-06-11T21:16:01Z

Codecov Report

Merging #59377 (978df1f) into gh/lw/207/base (98fa047) will increase coverage by 0.10%.
The diff coverage is n/a.

❗ Current head 978df1f differs from pull request most recent head 53dea66. Consider uploading reports for the commit 53dea66 to get more accurate results

@@                Coverage Diff                 @@
##           gh/lw/207/base   #59377      +/-   ##
==================================================
+ Coverage           76.12%   76.22%   +0.10%     
==================================================
  Files                2041     2041              
  Lines              203821   203821              
==================================================
+ Hits               155160   155364     +204     
+ Misses              48661    48457     -204

mrshenli

+1 to @cbalioglu's comment. Please consider NOLINTNEXTLINE if those are intentional.

This PR demonstrates that now the CUDA parts of the TensorPipe agent just "plug on top" of the CPU-only parts. Thus ideally the CPU-only parts could go in libtorch while the CUDA-only parts could go in libtorch_cuda. Unfortunately we can't do that just yet, because the TensorPipe agent depends on c10d (for its Store and its ProcessGroup), which lives in libtorch_python. Differential Revision: [D28796429](https://our.internmc.facebook.com/intern/diff/D28796429/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28796429/)! [ghstack-poisoned]

lw · 2021-06-14T13:58:11Z

Addressed all lint concerns

facebook-github-bot · 2021-06-15T10:28:57Z

This pull request has been merged in 5e5ca06.

lw requested review from H-Huang, mingzhe09088, mrshenli, pritamdamania87, rohan-varma, wayi1 and zhaojuanmao as code owners June 3, 2021 14:47

facebook-github-bot added oncall: distributed Add this issue/PR to distributed oncall triage queue cla signed labels Jun 3, 2021

lw requested a review from cbalioglu as a code owner June 4, 2021 09:22

cbalioglu approved these changes Jun 4, 2021

View reviewed changes

lw mentioned this pull request Jun 10, 2021

Prepare for TensorPipe separating its CUDA-specific headers #59788

Closed

lw added 3 commits June 10, 2021 06:45

mrshenli reviewed Jun 13, 2021

View reviewed changes

facebook-github-bot closed this in 5e5ca06 Jun 15, 2021

facebook-github-bot added the Merged label Jun 15, 2021

facebook-github-bot deleted the gh/lw/207/head branch June 18, 2021 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move CUDA-related stuff of TP agent to separate file #59377

Move CUDA-related stuff of TP agent to separate file #59377

Uh oh!

lw commented Jun 3, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 3, 2021 •

edited

Loading

Uh oh!

cbalioglu left a comment

Uh oh!

codecov bot commented Jun 11, 2021 •

edited

Loading

Uh oh!

mrshenli left a comment

Uh oh!

lw commented Jun 14, 2021

Uh oh!

facebook-github-bot commented Jun 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Move CUDA-related stuff of TP agent to separate file #59377

Move CUDA-related stuff of TP agent to separate file #59377

Uh oh!

Conversation

lw commented Jun 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

cbalioglu left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

lw commented Jun 14, 2021

Uh oh!

facebook-github-bot commented Jun 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lw commented Jun 3, 2021 •

edited

Loading

facebook-github-bot commented Jun 3, 2021 •

edited

Loading

codecov bot commented Jun 11, 2021 •

edited

Loading