-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[c10d] Split custom class bindings out of python binding code #58992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit b6c18e2 (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! ghstack-source-id: 129959421 Pull Request resolved: #58992
…ode" Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832) [ghstack-poisoned]
|
@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…ode" Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832) [ghstack-poisoned]
Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! ghstack-source-id: dba676a Pull Request resolved: #58992
|
@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…ode" Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832) [ghstack-poisoned]
|
@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…ode" Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832) [ghstack-poisoned]
Pull Request resolved: #58992 Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! ghstack-source-id: 130152256
…ode" Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832) [ghstack-poisoned]
…ode" Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832) [ghstack-poisoned]
Pull Request resolved: #58992 Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! ghstack-source-id: 32fa680
|
@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
1 similar comment
|
@suo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
…ode" Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832) [ghstack-poisoned]
Pull Request resolved: #58992 Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. Differential Revision: [D28690832](https://our.internmc.facebook.com/intern/diff/D28690832/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D28690832/)! ghstack-source-id: 130168942
| pg_names_[process_group] = name; | ||
| } | ||
|
|
||
| void initCustomClassBindings() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was struggling to get the linker not throw out these global static initializers on all platforms, so I gave up and made the initialization explicit, which seems like better practice anyway. Currently it's being initialized when we initialize the rest of the Python bindings (e.g. from libtorch_python), but since the actual binding code is in libc10d we can initialize it from other places (like libtorch_deploy).
|
|
||
| if(USE_C10D_NCCL) | ||
| target_compile_definitions(c10d INTERFACE USE_C10D_NCCL) | ||
| target_compile_definitions(c10d PUBLIC USE_C10D_NCCL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, these have been changed from INTERFACE to PUBLIC because now we also use these compile definitions in the implementation of c10d itself (e.g. checking USE_C10D_NCCL in frontend.cpp to conditionally initialize nccl process group bindings).
|
This pull request has been merged in b977a3b. |
…h#58992) Summary: Pull Request resolved: pytorch#58992 Currently, we define Torchbind custom classes in the same place that we define Python bindings. This is nice from a code location perspective, but has two downsides: 1. These custom classes are not available in a C++-only build. 2. These break when included in torch::deploy. Some explanation on the second issue: torch::deploy creates many Python interpreters, and creates a full copy of all the bindings for each one. This will run the static initialization code once for each copy of the bindings, leading to multiple registration of the custom classes (and therefore an error). This PR splits out the relevant custom class binding code into its own source file to be included in libc10d, which can be compiled and statically initialized a single time and linked against from the c10d python bindings. ghstack-source-id: 130168942 Test Plan: CI Reviewed By: wconstab Differential Revision: D28690832 fbshipit-source-id: 3c5e3fff28abb8bcdb4a952794c07de1ee2ae5a8
Stack from ghstack:
Currently, we define Torchbind custom classes in the same place that we define Python bindings.
This is nice from a code location perspective, but has two downsides:
Some explanation on the second issue: torch::deploy creates many Python
interpreters, and creates a full copy of all the bindings for each one. This
will run the static initialization code once for each copy of the bindings,
leading to multiple registration of the custom classes (and therefore an
error).
This PR splits out the relevant custom class binding code into its own source
file to be included in libc10d, which can be compiled and statically
initialized a single time and linked against from the c10d python bindings.
Differential Revision: D28690832
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!
Differential Revision: D28690832