DOC Describes behavior for None in module.register_* #60125

thomasjpfan · 2021-06-16T19:38:25Z

Fixes #45834

facebook-github-bot · 2021-06-16T19:38:32Z

💊 CI failures summary and remediations

As of commit 2e53166 (more details on the Dr. CI page and at hud.pytorch.org/pr/60125):

7/13 failures possibly* introduced in this PR
- 1/7 non-scanned failure(s)
2/13 tentatively recognized as flaky ❄️
- Click here to rerun these jobs
4/13 broken upstream at merge base 691183b on Jun 16 from 11:27am to 6:02pm

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test1 (1/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 16 21:35:43 ERROR [2.823s]: test_multiple_o...rad_is_view (__main__.DistributedDataParallelTest)

Jun 16 21:35:43     loss2.backward()
Jun 16 21:35:43   File "/opt/conda/lib/python3.6/site-packages/torch/_tensor.py", line 256, in backward
Jun 16 21:35:43     torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
Jun 16 21:35:43   File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 149, in backward
Jun 16 21:35:43     allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
Jun 16 21:35:43 RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
Jun 16 21:35:43 
Jun 16 21:35:43 
Jun 16 21:35:43 
Jun 16 21:35:43 ======================================================================
Jun 16 21:35:43 ERROR [2.823s]: test_multiple_outputs_multiple_backward_grad_is_view (__main__.DistributedDataParallelTest)
Jun 16 21:35:43 ----------------------------------------------------------------------
Jun 16 21:35:43 Traceback (most recent call last):
Jun 16 21:35:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 16 21:35:43     self._join_processes(fn)
Jun 16 21:35:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 16 21:35:43     self._check_return_codes(elapsed_time)
Jun 16 21:35:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 633, in _check_return_codes
Jun 16 21:35:43     raise RuntimeError(error)
Jun 16 21:35:43 RuntimeError: Process 0 exited with error code 10 and exception:
Jun 16 21:35:43 Traceback (most recent call last):

pytorch_linux_xenial_py3_clang5_asan_test2 (2/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 16 21:43:53 AssertionError: False is not tr...lowed difference with rtol=0 and atol=0 is only 0!

Jun 16 21:43:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 516, in run_test
Jun 16 21:43:53     getattr(self, test_name)()
Jun 16 21:43:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 400, in wrapper
Jun 16 21:43:53     fn()
Jun 16 21:43:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 4591, in test_ddp_logging_data_cpu
Jun 16 21:43:53     self.assertEqual(ddp_logging_data.get("find_unused_parameters"), 0)
Jun 16 21:43:53   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1419, in assertEqual
Jun 16 21:43:53     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Jun 16 21:43:53   File "/opt/conda/lib/python3.6/unittest/case.py", line 682, in assertTrue
Jun 16 21:43:53     raise self.failureException(msg)
Jun 16 21:43:53 AssertionError: False is not true : Scalars failed to compare as equal! Comparing 1 and 0 gives a difference of 1, but the allowed difference with rtol=0 and atol=0 is only 0!
Jun 16 21:43:53 
Jun 16 21:43:53 
Jun 16 21:43:53 
Jun 16 21:43:53 ----------------------------------------------------------------------
Jun 16 21:43:53 Ran 220 tests in 81.532s
Jun 16 21:43:53 
Jun 16 21:43:53 FAILED (errors=1, skipped=110)
Jun 16 21:43:53 
Jun 16 21:43:53 Generating XML reports...
Jun 16 21:43:53 Generated XML report: test-reports/dist-gloo/distributed.test_distributed_fork/TEST-TestDistBackendWithFork-20210616214231.xml

pytorch_linux_bionic_cuda10_2_cudnn7_py3_9_gcc7_test2 (3/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 16 20:45:00 AssertionError: True is not false : DDP unused parameters error not raised.

Jun 16 20:45:00   File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 400, in wrapper
Jun 16 20:45:00     fn()
Jun 16 20:45:00   File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 181, in wrapper
Jun 16 20:45:00     ret = func(*args, **kwargs)
Jun 16 20:45:00   File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 93, in wrapper
Jun 16 20:45:00     return func(*args, **kwargs)
Jun 16 20:45:00   File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 5857, in test_ddp_unused_params_rebuild_buckets_exception
Jun 16 20:45:00     self.assertFalse(
Jun 16 20:45:00   File "/opt/conda/lib/python3.9/unittest/case.py", line 676, in assertFalse
Jun 16 20:45:00     raise self.failureException(msg)
Jun 16 20:45:00 AssertionError: True is not false : DDP unused parameters error not raised.
Jun 16 20:45:00 
Jun 16 20:45:00 
Jun 16 20:45:00 
Jun 16 20:45:00 ----------------------------------------------------------------------
Jun 16 20:45:00 Ran 220 tests in 231.534s
Jun 16 20:45:00 
Jun 16 20:45:00 FAILED (errors=5, skipped=124)
Jun 16 20:45:00 
Jun 16 20:45:00 Generating XML reports...
Jun 16 20:45:00 Generated XML report: test-reports/dist-nccl/.var.lib.jenkins.workspace.test.distributed.test_distributed_fork/TEST-TestDistBackendWithFork-20210616204109.xml

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 (4/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 16 20:56:39 AssertionError: True is not false : DDP unused parameters error not raised.

Jun 16 20:56:39   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 400, in wrapper
Jun 16 20:56:39     fn()
Jun 16 20:56:39   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 181, in wrapper
Jun 16 20:56:39     ret = func(*args, **kwargs)
Jun 16 20:56:39   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 93, in wrapper
Jun 16 20:56:39     return func(*args, **kwargs)
Jun 16 20:56:39   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 5858, in test_ddp_unused_params_rebuild_buckets_exception
Jun 16 20:56:39     True, "DDP unused parameters error not raised."
Jun 16 20:56:39   File "/opt/conda/lib/python3.6/unittest/case.py", line 676, in assertFalse
Jun 16 20:56:39     raise self.failureException(msg)
Jun 16 20:56:39 AssertionError: True is not false : DDP unused parameters error not raised.
Jun 16 20:56:39 
Jun 16 20:56:39 
Jun 16 20:56:39 
Jun 16 20:56:39 ----------------------------------------------------------------------
Jun 16 20:56:39 Ran 220 tests in 377.894s
Jun 16 20:56:39 
Jun 16 20:56:39 FAILED (errors=5, skipped=123)
Jun 16 20:56:39 
Jun 16 20:56:39 Generating XML reports...
Jun 16 20:56:39 Generated XML report: test-reports/dist-nccl/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20210616205021.xml

pytorch_linux_bionic_cuda10_2_cudnn7_py3_9_gcc7_test1 (5/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 16 20:35:31 ERROR [2.422s]: test_multiple_o...rad_is_view (__main__.DistributedDataParallelTest)

Jun 16 20:35:31     loss2.backward()
Jun 16 20:35:31   File "/opt/conda/lib/python3.9/site-packages/torch/_tensor.py", line 256, in backward
Jun 16 20:35:31     torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
Jun 16 20:35:31   File "/opt/conda/lib/python3.9/site-packages/torch/autograd/__init__.py", line 147, in backward
Jun 16 20:35:31     Variable._execution_engine.run_backward(
Jun 16 20:35:31 RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
Jun 16 20:35:31 
Jun 16 20:35:31 
Jun 16 20:35:31 
Jun 16 20:35:31 ======================================================================
Jun 16 20:35:31 ERROR [2.422s]: test_multiple_outputs_multiple_backward_grad_is_view (__main__.DistributedDataParallelTest)
Jun 16 20:35:31 ----------------------------------------------------------------------
Jun 16 20:35:31 Traceback (most recent call last):
Jun 16 20:35:31   File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 16 20:35:31     self._join_processes(fn)
Jun 16 20:35:31   File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 16 20:35:31     self._check_return_codes(elapsed_time)
Jun 16 20:35:31   File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_distributed.py", line 633, in _check_return_codes
Jun 16 20:35:31     raise RuntimeError(error)
Jun 16 20:35:31 RuntimeError: Process 1 exited with error code 10 and exception:
Jun 16 20:35:31 Traceback (most recent call last):

1 failure not recognized by patterns:

Job	Step	Action
^{Linux CI (pytorch-linux-xenial-py3.6-gcc5.4) / calculate-docker-image}	^{Checkout PyTorch}	🔁 rerun

❄️ 2 failures tentatively classified as flaky

but reruns have not yet been triggered to confirm:

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jun 16 21:10:50 RuntimeError: Process 0 terminated or timed out after 100.06009531021118 seconds

Jun 16 21:10:50 ======================================================================
Jun 16 21:10:50 ERROR [100.158s]: test_backward_ddp_outside (__main__.TensorPipeDdpUnderDistAutogradTestWithSpawn)
Jun 16 21:10:50 ----------------------------------------------------------------------
Jun 16 21:10:50 Traceback (most recent call last):
Jun 16 21:10:50   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 16 21:10:50     self._join_processes(fn)
Jun 16 21:10:50   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 16 21:10:50     self._check_return_codes(elapsed_time)
Jun 16 21:10:50   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 638, in _check_return_codes
Jun 16 21:10:50     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time))
Jun 16 21:10:50 RuntimeError: Process 0 terminated or timed out after 100.06009531021118 seconds
Jun 16 21:10:50 
Jun 16 21:10:50 ======================================================================
Jun 16 21:10:50 ERROR [5.196s]: test_backward_ddp_outside_uneven_inputs (__main__.TensorPipeDdpUnderDistAutogradTestWithSpawn)
Jun 16 21:10:50 ----------------------------------------------------------------------
Jun 16 21:10:50 Traceback (most recent call last):
Jun 16 21:10:50   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 16 21:10:50     self._join_processes(fn)
Jun 16 21:10:50   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 16 21:10:50     self._check_return_codes(elapsed_time)
Jun 16 21:10:50   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 633, in _check_return_codes

pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Jun 16 21:14:20 RuntimeError: Process 0 terminated or timed out after 100.03983163833618 seconds

Jun 16 21:14:20 ======================================================================
Jun 16 21:14:20 ERROR [100.121s]: test_backward_ddp_outside (__main__.ProcessGroupDdpUnderDistAutogradTestWithSpawn)
Jun 16 21:14:20 ----------------------------------------------------------------------
Jun 16 21:14:20 Traceback (most recent call last):
Jun 16 21:14:20   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 16 21:14:20     self._join_processes(fn)
Jun 16 21:14:20   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 16 21:14:20     self._check_return_codes(elapsed_time)
Jun 16 21:14:20   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 638, in _check_return_codes
Jun 16 21:14:20     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time))
Jun 16 21:14:20 RuntimeError: Process 0 terminated or timed out after 100.03983163833618 seconds
Jun 16 21:14:20 
Jun 16 21:14:20 ======================================================================
Jun 16 21:14:20 ERROR [4.095s]: test_backward_ddp_outside_uneven_inputs (__main__.ProcessGroupDdpUnderDistAutogradTestWithSpawn)
Jun 16 21:14:20 ----------------------------------------------------------------------
Jun 16 21:14:20 Traceback (most recent call last):
Jun 16 21:14:20   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 16 21:14:20     self._join_processes(fn)
Jun 16 21:14:20   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 16 21:14:20     self._check_return_codes(elapsed_time)
Jun 16 21:14:20   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 633, in _check_return_codes

🚧 4 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Linux CI (pytorch-linux-xenial-py3.6-gcc5.4) / test on Jun 16 from 12:13pm to 1:00pm (2c5db9a - d88fbf0)
- 🔁 rerun
pytorch_linux_xenial_py3_6_gcc5_4_test on Jun 16 from 12:23pm to 6:02pm (691183b - 5686fe5)
- 🔁 rerun
pytorch_linux_bionic_py3_6_clang9_noarch_test on Jun 16 from 11:27am to 1:01pm (15f236f - bac6bcd)
- 🔁 rerun
pytorch_macos_10_13_py3_test on Jun 16 from 12:13pm to 5:58pm (2c5db9a - 5686fe5)
- 🔁 rerun

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.2-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

jbschlosser

LGTM!

facebook-github-bot · 2021-06-17T13:03:24Z

@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-18T02:21:04Z

@jbschlosser merged this pull request in 7e032f1.

DOC Describes behavior for None in module.register_*

2e53166

thomasjpfan added module: docs Related to our documentation, both in docs/ and docblocks module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 16, 2021

thomasjpfan requested review from albanD and jbschlosser as code owners June 16, 2021 19:38

facebook-github-bot added the cla signed label Jun 16, 2021

pytorchbot added the open source label Jun 16, 2021

jbschlosser approved these changes Jun 17, 2021

View reviewed changes

facebook-github-bot closed this in 7e032f1 Jun 18, 2021

facebook-github-bot added the Merged label Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC Describes behavior for None in module.register_* #60125

DOC Describes behavior for None in module.register_* #60125

Uh oh!

thomasjpfan commented Jun 16, 2021

Uh oh!

facebook-github-bot commented Jun 16, 2021 •

edited

Loading

Uh oh!

jbschlosser left a comment

Uh oh!

facebook-github-bot commented Jun 17, 2021

Uh oh!

facebook-github-bot commented Jun 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DOC Describes behavior for None in module.register_* #60125

DOC Describes behavior for None in module.register_* #60125

Uh oh!

Conversation

thomasjpfan commented Jun 16, 2021

Uh oh!

facebook-github-bot commented Jun 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 5 new failures recognized by patterns

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test1 (1/5)

pytorch_linux_xenial_py3_clang5_asan_test2 (2/5)

pytorch_linux_bionic_cuda10_2_cudnn7_py3_9_gcc7_test2 (3/5)

pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 (4/5)

pytorch_linux_bionic_cuda10_2_cudnn7_py3_9_gcc7_test1 (5/5)

1 failure not recognized by patterns:

❄️ 2 failures tentatively classified as flaky

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (1/2)

pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (2/2)

🚧 4 fixed upstream failures:

ci.pytorch.org: 1 failed

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 17, 2021

Uh oh!

facebook-github-bot commented Jun 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

facebook-github-bot commented Jun 16, 2021 •

edited

Loading