KEMBAR78
[reland] Move RPC agents to libtorch by lw · Pull Request #60170 · pytorch/pytorch · GitHub
Skip to content

Conversation

@lw
Copy link
Contributor

@lw lw commented Jun 17, 2021

Summary: Reland of #59939.

Test Plan: CI

Differential Revision: D29193234

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 17, 2021

💊 CI failures summary and remediations

As of commit cfe3f80 (more details on the Dr. CI page and at hud.pytorch.org/pr/60170):


  • 4/4 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build Lint / clang-tidy (1/3)

Step: "Run clang-tidy" (full log | diagnosis details | 🔁 rerun)

2021-06-17T18:35:23.8668401Z /__w/pytorch/pytor...d preprocessing directive [clang-diagnostic-error]
2021-06-17T18:35:23.8658662Z     ^
2021-06-17T18:35:23.8659837Z /__w/pytorch/pytorch/cmake/Dependencies.cmake:76:3: error: invalid preprocessing directive [clang-diagnostic-error]
2021-06-17T18:35:23.8660967Z # ---[ Threads
2021-06-17T18:35:23.8661338Z   ^
2021-06-17T18:35:23.8662522Z /__w/pytorch/pytorch/cmake/Dependencies.cmake:87:5: error: invalid preprocessing directive [clang-diagnostic-error]
2021-06-17T18:35:23.8663724Z   # Unset our restrictive C++ flags here and reset them later.
2021-06-17T18:35:23.8664269Z     ^
2021-06-17T18:35:23.8665464Z /__w/pytorch/pytorch/cmake/Dependencies.cmake:88:5: error: invalid preprocessing directive [clang-diagnostic-error]
2021-06-17T18:35:23.8666649Z   # Remove this once we use proper target_compile_options.
2021-06-17T18:35:23.8667207Z     ^
2021-06-17T18:35:23.8668401Z /__w/pytorch/pytorch/cmake/Dependencies.cmake:104:3: error: invalid preprocessing directive [clang-diagnostic-error]
2021-06-17T18:35:23.8669540Z # ---[ protobuf
2021-06-17T18:35:23.8669929Z   ^
2021-06-17T18:35:23.8675320Z ##[error]Process completed with exit code 1.
2021-06-17T18:35:23.8783168Z Post job cleanup.
2021-06-17T18:35:23.8790529Z ##[command]/usr/bin/docker exec  4ccc2945bd5ca58e8e5b200831f8f4550582a7a51c05c592e6818ad614a0623a sh -c "cat /etc/*release | grep ^ID"
2021-06-17T18:35:24.3063184Z [command]/usr/bin/git version
2021-06-17T18:35:24.3118927Z git version 2.32.0
2021-06-17T18:35:24.3162070Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2021-06-17T18:35:24.3207722Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2021-06-17T18:35:24.3501789Z Entering 'android/libs/fbjni'

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (2/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 17 19:16:42 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in
Jun 17 19:16:42     #7 0x55f7ffd8351b in PyEval_EvalCode /tmp/build/80754af9/python_1614113050744/work/Python/ceval.c:731
Jun 17 19:16:42     #8 0x55f7ffe035e3 in run_mod /tmp/build/80754af9/python_1614113050744/work/Python/pythonrun.c:1025
Jun 17 19:16:42     #9 0x55f7ffe0367c in PyRun_StringFlags /tmp/build/80754af9/python_1614113050744/work/Python/pythonrun.c:949
Jun 17 19:16:42     #10 0x55f7ffe036de in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1614113050744/work/Python/pythonrun.c:445
Jun 17 19:16:42     #11 0x55f7ffe074e2 in run_command /tmp/build/80754af9/python_1614113050744/work/Modules/main.c:301
Jun 17 19:16:42     #12 0x55f7ffe074e2 in Py_Main /tmp/build/80754af9/python_1614113050744/work/Modules/main.c:749
Jun 17 19:16:42     #13 0x55f7ffcd1b0d in main /tmp/build/80754af9/python_1614113050744/work/Programs/python.c:69
Jun 17 19:16:42     #14 0x7f123443383f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
Jun 17 19:16:42     #15 0x55f7ffdb0d6f in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103
Jun 17 19:16:42 
Jun 17 19:16:42 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Jun 17 19:16:42 + retcode=1
Jun 17 19:16:42 + set -e
Jun 17 19:16:42 + return 1
Jun 17 19:16:42 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]]
Jun 17 19:16:42 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]]
Jun 17 19:16:42 + '[' -n https://github.com/pytorch/pytorch/pull/60170 ']'
Jun 17 19:16:42 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 != *coverage* ]]
Jun 17 19:16:42 ++ mktemp
Jun 17 19:16:42 + DETERMINE_FROM=/tmp/tmp.7VYPXwJuWa
Jun 17 19:16:42 + file_diff_from_base /tmp/tmp.7VYPXwJuWa

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (3/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 17 20:02:15 AssertionError: False is not tr...lowed difference with rtol=0 and atol=0 is only 0!
Jun 17 20:02:15 ----------------------------------------------------------------------
Jun 17 20:02:15 Traceback (most recent call last):
Jun 17 20:02:15   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 399, in wrapper
Jun 17 20:02:15     self._join_processes(fn)
Jun 17 20:02:15   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 610, in _join_processes
Jun 17 20:02:15     self._check_return_codes(elapsed_time)
Jun 17 20:02:15   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 659, in _check_return_codes
Jun 17 20:02:15     self.assertEqual(
Jun 17 20:02:15   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1426, in assertEqual
Jun 17 20:02:15     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jun 17 20:02:15 AssertionError: False is not true : Scalars failed to compare as equal! Comparing -6 and 0 gives a difference of 6, but the allowed difference with rtol=0 and atol=0 is only 0!
Jun 17 20:02:15 Expect process 3 exit code to match Process 0 exit code of 0, but got -6
Jun 17 20:02:15 
Jun 17 20:02:16 ----------------------------------------------------------------------
Jun 17 20:02:16 Ran 367 tests in 1350.828s
Jun 17 20:02:16 
Jun 17 20:02:16 FAILED (failures=1, skipped=5)
Jun 17 20:02:16 
Jun 17 20:02:16 Generating XML reports...
Jun 17 20:02:16 Generated XML report: test-reports/python-unittest/distributed.rpc.test_tensorpipe_agent/TEST-TensorPipeDdpComparisonTestWithSpawn-20210617193944.xml
Jun 17 20:02:16 Generated XML report: test-reports/python-unittest/distributed.rpc.test_tensorpipe_agent/TEST-TensorPipeDdpUnderDistAutogradTestWithSpawn-20210617193944.xml

1 failure not recognized by patterns:

Job Step Action
CircleCI pytorch_linux_bionic_py3_6_clang9_noarch_test Run tests 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D29193234

@lw lw added the ci/all label Jun 17, 2021
@lw lw force-pushed the export-D29193234 branch from 957b868 to e369fee Compare June 17, 2021 09:19
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D29193234

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D29193234

@lw lw force-pushed the export-D29193234 branch from e369fee to 0bb5e10 Compare June 17, 2021 09:54
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D29193234

@lw lw force-pushed the export-D29193234 branch from 0bb5e10 to 86bf707 Compare June 17, 2021 15:07
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @lw offline, the previous one was reverted due to irrelevant flakes.

lw added 4 commits June 17, 2021 11:28
Differential Revision: D29193014

fbshipit-source-id: 8e04d2aeb0e44c2666616e755d846f11a68f95d2
Summary: Reland of pytorch#59939.

Test Plan: CI

Differential Revision: D29193235

fbshipit-source-id: 3723b6dbd3aabcb61736d3dee2f9d2caa886ae87
Summary: Reland of pytorch#59939.

Test Plan: CI

Differential Revision: D29193233

fbshipit-source-id: fdee5578280cdd253d46dacd5f0e351ca4211ceb
Summary:
Pull Request resolved: pytorch#60170

Reland of pytorch#59939.

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29193234

fbshipit-source-id: 689b3ede8945a7f191c8c2b8a7d4ac51add63c44
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D29193234

@lw lw force-pushed the export-D29193234 branch from 86bf707 to cfe3f80 Compare June 17, 2021 18:30
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 08ce5ee.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants