KEMBAR78
[c10d] Split custom class bindings out of python binding code by suo · Pull Request #58992 · pytorch/pytorch · GitHub
Skip to content

Conversation

@suo
Copy link
Member

@suo suo commented May 26, 2021

Stack from ghstack:

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:

  1. These custom classes are not available in a C++-only build.
  2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: D28690832

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

Differential Revision: D28690832

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 26, 2021

💊 CI failures summary and remediations

As of commit b6c18e2 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

1 failure not recognized by patterns:

Job Step Action
CircleCI pytorch_linux_bionic_py3_6_clang9_noarch_test Run tests 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels May 26, 2021
suo added a commit that referenced this pull request May 26, 2021
Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!

ghstack-source-id: 129959421
Pull Request resolved: #58992
…ode"

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832)

[ghstack-poisoned]
@suo
Copy link
Member Author

suo commented May 27, 2021

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…ode"

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832)

[ghstack-poisoned]
suo added a commit that referenced this pull request May 27, 2021
Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!

ghstack-source-id: dba676a
Pull Request resolved: #58992
@suo
Copy link
Member Author

suo commented May 27, 2021

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…ode"

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832)

[ghstack-poisoned]
@suo
Copy link
Member Author

suo commented May 28, 2021

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…ode"

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832)

[ghstack-poisoned]
suo added a commit that referenced this pull request May 28, 2021
Pull Request resolved: #58992


Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!
ghstack-source-id: 130152256
…ode"

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!


Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832)

[ghstack-poisoned]
…ode"

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!


Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832)

[ghstack-poisoned]
suo added a commit that referenced this pull request May 28, 2021
Pull Request resolved: #58992


Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!
ghstack-source-id: 32fa680
@suo
Copy link
Member Author

suo commented May 28, 2021

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

1 similar comment
@suo
Copy link
Member Author

suo commented May 28, 2021

@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…ode"

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!


Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832)

[ghstack-poisoned]
suo added a commit that referenced this pull request May 28, 2021
Pull Request resolved: #58992





Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.

Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)!
ghstack-source-id: 130168942
pg_names_[process_group] = name;
}

void initCustomClassBindings() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was struggling to get the linker not throw out these global static initializers on all platforms, so I gave up and made the initialization explicit, which seems like better practice anyway. Currently it's being initialized when we initialize the rest of the Python bindings (e.g. from libtorch_python), but since the actual binding code is in libc10d we can initialize it from other places (like libtorch_deploy).


if(USE_C10D_NCCL)
target_compile_definitions(c10d INTERFACE USE_C10D_NCCL)
target_compile_definitions(c10d PUBLIC USE_C10D_NCCL)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity, these have been changed from INTERFACE to PUBLIC because now we also use these compile definitions in the implementation of c10d itself (e.g. checking USE_C10D_NCCL in frontend.cpp to conditionally initialize nccl process group bindings).

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b977a3b.

@facebook-github-bot facebook-github-bot deleted the gh/suo/437/head branch June 1, 2021 14:17
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
…h#58992)

Summary:
Pull Request resolved: pytorch#58992

Currently, we define Torchbind custom classes in the same place that we define Python bindings.

This is nice from a code location perspective, but has two downsides:
1. These custom classes are not available in a C++-only build.
2. These break when included in torch::deploy.

Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).

This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.
ghstack-source-id: 130168942

Test Plan: CI

Reviewed By: wconstab

Differential Revision: D28690832

fbshipit-source-id: 3c5e3fff28abb8bcdb4a952794c07de1ee2ae5a8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants