KEMBAR78
Update pybind to 2.6.0 by malfet · Pull Request #46415 · pytorch/pytorch · GitHub
Skip to content

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Oct 15, 2020

Preserve PYBIND11 configuration options in torch._C._PYBIND11_COMPILER_TYPE and use them when building extensions

Also, use f-strings in torch.utils.cpp_extension

"Fixes" #46367

@malfet malfet marked this pull request as draft October 15, 2020 20:24
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Oct 15, 2020

💊 CI failures summary and remediations

As of commit 6f23698cbb (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

@dr-ci
Copy link

dr-ci bot commented Oct 15, 2020

💊 CI failures summary and remediations

As of commit 8e3ea24 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


codecov.io: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 79 times.

@henryiii
Copy link
Contributor

henryiii commented Oct 16, 2020

So there are only two errors (same in two jobs, other jobs passed),

Traceback (most recent call last):
  File "test_cpp_extensions_jit.py", line 504, in test_cpp_frontend_module_python_inter_op
    verbose=True,
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 993, in load
    keep_intermediates=keep_intermediates)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1209, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1560, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/opt/conda/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/opt/conda/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
  File "<frozen importlib._bootstrap>", line 684, in _load
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: generic_type: type "Net" referenced unknown base type "torch::nn::Module"

and

Traceback (most recent call last):
  File "test_cpp_extensions_jit.py", line 752, in test_warning
    warn_mod.foo(t, 1)
RuntimeError: Error with CPUDoubleType

Looks pybind11 related. The second one seems to be losing the "TypeError" to a more generic "RuntimeError", and the first one might be a result of some of the code correctness work? Reaching out to other devs and looking into it. The odd thing is that this only shows up on Clang?

@malfet malfet force-pushed the malfet/test-pybind-2.6.0rc2 branch 2 times, most recently from b278ebb to d2c7eea Compare October 17, 2020 01:52
@malfet malfet changed the title Test pybind-2.6.0rc2 Test pybind-2.6.0rc3 Oct 17, 2020
@henryiii
Copy link
Contributor

henryiii commented Oct 17, 2020

Interesting, the change causes macOS to fail similarly. That means the flags you are adding are being added when PyTorch is compiled, but not the JIT extension, I expect. Ideally, I think the "correct" solution would be to store the values of those two defines as added by pybind11 (which will be something like "_clang" or "_gcc" for PYBIND11_COMPILER_TYPE, and "_libcpp" or "_libstdcpp" for PYBIND11_STDLIB), and then set them to matching values via defines when compiling the JIT extension. (Assuming they really are ABI compatible, but as far as I know, I think they are, pybind11 is being extra careful here).

A test solution would be to also define them when defining the JITing. Though it's odd, it seems like all tests should fail if my JIT theory were correct; all mainline compilers get something defined here, so it should always mismatch, I'd think.

@henryiii
Copy link
Contributor

henryiii commented Oct 17, 2020

Ah, I missed one. Can you also add PYBIND11_BUILD_ABI to an empty string too? That one was also added after the version currently there, just in a different commit. I tested it locally and that was not enough, so I think it still is my JIT idea.

@malfet malfet force-pushed the malfet/test-pybind-2.6.0rc2 branch 2 times, most recently from 0eccd32 to 2942621 Compare October 28, 2020 04:14
@malfet malfet changed the title Test pybind-2.6.0rc3 Test pybind-2.6.0 Oct 28, 2020
@malfet
Copy link
Contributor Author

malfet commented Oct 28, 2020

I have the fix that addresses all the failures locally, will see if CI agrees with me.
But I'm still a bit uncertain, why those must be defined to empty string in the first place?

@henryiii
Copy link
Contributor

henryiii commented Oct 28, 2020

@malfet you need to add PYBIND11_BUILD_ABI too. If you do, I think this will "pass" and prove that the only issue is the addition of these three items to the ABI string. I still think the correct fix would be to capture the values of PYBIND11_COMPILER_TYPE, PYBIND11_STDLIB, and PYBIND11_BUILD_ABI when PyTorch is compiled, and then query them and define them manually when running the JIT compile.

The old pybind11 used to build an ABI string like this:

f"__pybind11_internals_v{PYBIND11_INTERNALS_VERSION}{PYBIND11_INTERNALS_KIND}{PYBIND11_BUILD_TYPE}__"

Since 2.4, it now builds one like this:

f"__pybind11_internals_v{PYBIND11_INTERNALS_VERSION}{PYBIND11_INTERNALS_KIND}{PYBIND11_COMPILER_TYPE}{PYBIND11_STDLIB}{PYBIND11_BUILD_ABI}{PYBIND11_BUILD_TYPE}__"

This was because mixing extensions built with different compilers sometimes segfaulted, so this was the "sledgehammer" solution. Ideally, compilers that are ABI compatible (at least for the parts that pybind11 uses in it's ABI) should have the same ABI string so they can interact, but for now, this was done to avoid hard-to-debug issues until someone can carefully investigate what is compatible. If you know it works (such as is it was working before), then you can control the values of these new components.

I think in the failing test, PYBIND11_COMPILER_TYPE is "_clang", PYBIND11_STDLIB is "_libcpp", and PYBIND_BUILD_ABI is "". When compiling the JIT, then PYBIND11_COMPILER_TYPE is "_gcc", PYBIND11_STDLIB is "_libstdcxx", and PYBIND_BUILD_ABI is f"_cxxabi{__GXX_ABI_VERSION}". If it can pass if we force all these to be the same, and the easiest way to do that is to set them to the empty string, then that will prove this is the issue.

The correct fix, I believe, will be to store these three values into the pytorch extension, say as "torch._detail.pybind11_compiler_type", etc, then to define the values to match when running the JIT compile step.

@malfet malfet changed the title Test pybind-2.6.0 Update pybind to 2.6.0 Oct 28, 2020
@malfet malfet marked this pull request as ready for review October 28, 2020 20:26
@ezyang
Copy link
Contributor

ezyang commented Oct 28, 2020

Would have been nice to have the diff without the f-string refactor

@ezyang
Copy link
Contributor

ezyang commented Oct 28, 2020

#46415 (comment) is a really good discussion and I'd expect some reference to it in the code

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a horrible hack but I understand the mechanism by which it works and agree that it is appropriate to restore old behavior.

I would like it to be hella more documented.

@malfet malfet force-pushed the malfet/test-pybind-2.6.0rc2 branch 2 times, most recently from a6b635e to abd2827 Compare October 28, 2020 21:29
@malfet malfet force-pushed the malfet/test-pybind-2.6.0rc2 branch from abd2827 to 8fb0deb Compare October 28, 2020 22:19
@malfet
Copy link
Contributor Author

malfet commented Oct 28, 2020

Would have been nice to have the diff without the f-string refactor

Moved the refactors to #47025

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@malfet malfet force-pushed the malfet/test-pybind-2.6.0rc2 branch from 559f97c to cb17c2f Compare October 29, 2020 06:31
@malfet malfet force-pushed the malfet/test-pybind-2.6.0rc2 branch from cb17c2f to f8b7375 Compare October 29, 2020 06:38
@malfet
Copy link
Contributor Author

malfet commented Oct 29, 2020

Rebased PR on top of code refactors

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@malfet
Copy link
Contributor Author

malfet commented Oct 29, 2020

Passing arguments with quotes via command line is non uniform on different platforms. On the other hand, I don't think there is much compiler variability on Windows

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@henryiii
Copy link
Contributor

Passing arguments with quotes via command line is non uniform on different platforms

Yes, I've done it before here: https://github.com/pybind/python_example/blob/fc12654bf8915c3237ff4265e7e6889d228b5da7/src/main.cpp#L39 & https://github.com/pybind/python_example/blob/fc12654bf8915c3237ff4265e7e6889d228b5da7/setup.py#L24 but I agree, I don't think it's needed on Windows.

@malfet malfet added this to the 1.7.1 milestone Oct 29, 2020
@malfet
Copy link
Contributor Author

malfet commented Oct 29, 2020

@henryiii Do you think it would make developers live easier if they are to be stringified on pybind11 side? (Can work on a PR if you want)

@malfet malfet deleted the malfet/test-pybind-2.6.0rc2 branch October 29, 2020 19:12
@facebook-github-bot
Copy link
Contributor

@malfet merged this pull request in 2b6a720.


for pname in ["COMPILER_TYPE", "STDLIB", "BUILD_ABI"]:
pval = getattr(torch._C, f"_PYBIND11_{pname}")
if pval is not None and not IS_WINDOWS:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please tell me why we add IS_WINDOWS check here?

# See note [Pybind11 ABI constants]
for name in ["COMPILER_TYPE", "STDLIB", "BUILD_ABI"]:
val = getattr(torch._C, f"_PYBIND11_{name}")
if val is not None and not IS_WINDOWS:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here?

@peterjc123
Copy link
Collaborator

Passing arguments with quotes via command line is non uniform on different platforms. On the other hand, I don't think there is much compiler variability on Windows

Well, then why don't we just don't quote all the args later on different platforms?

facebook-github-bot pushed a commit to pytorch/glow that referenced this pull request Nov 12, 2020
Summary:
Recent update of pybind version from PyTorch (pytorch/pytorch#46415) created compatibility issue, resulting in mysterious errors like
```
__________________ TestRemoveException.test_remove_exceptions __________________
self = <tests.functionality.remove_exceptions_test.TestRemoveException testMethod=test_remove_exceptions>

    def test_remove_exceptions(self):
        """Test Glow's removeExceptions JIT pass"""

        foo_jit = torch.jit.script(foo)
        graph = foo_jit.graph
        assert graph_contains_str(graph, "prim::RaiseException")
>       torch_glow.removeExceptions_(graph)
E       TypeError: removeExceptions_(): incompatible function arguments. The following argument types are supported:
E           1. (arg0: torch::jit::Graph) -> None
```
This PR aligns the pybind11 versions in glow and pytorch, hence fixing the issue.

Documentation:

[Optional Fixes #issue]

Pull Request resolved: #5067

Test Plan: CI

Reviewed By: hl475

Differential Revision: D24912990

Pulled By: yinghai

fbshipit-source-id: ed22a8c70b51638fa11a5a172b93339563152869
seemethere pushed a commit to seemethere/pytorch that referenced this pull request Nov 20, 2020
Summary:
Preserve PYBIND11 (pytorch@63ce3fb) configuration options in `torch._C._PYBIND11 (https://github.com/pytorch/pytorch/commit/63ce3fbde85672e112411ad731839e6db3c15c08)_COMPILER_TYPE` and use them when building extensions

Also, use f-strings in `torch.utils.cpp_extension`

"Fixes" pytorch#46367

Pull Request resolved: pytorch#46415

Reviewed By: VitalyFedyunin

Differential Revision: D24605949

Pulled By: malfet

fbshipit-source-id: 87340f2ed5308266a46ef8f0317316227dab9d4d
(cherry picked from commit 2b6a720)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants