Avoid complex-to-real cast warning in CopyBackward #60021

peterbell10 · 2021-06-15T15:59:30Z

Stack from ghstack:

CopyBackward: Remove redundant src_device and unnecessary copy=True #60025 CopyBackward: Remove redundant src_device and unnecessary copy=True
Avoid complex-to-real cast warning in CopyBackward #60021 Avoid complex-to-real cast warning in CopyBackward
Set TORCH_WARN_ONCE to always warn inside of assertNotWarn #60020 Set TORCH_WARN_ONCE to always warn inside of assertNotWarn

Dropping the imaginary component is expected and gives the correct gradient
formula, so silencing the warning is appropriate.

Differential Revision: D29589371

Dropping the imaginary component is expected and gives the correct gradient formula, so silencing the warning is appropriate. [ghstack-poisoned]

facebook-github-bot · 2021-06-15T15:59:36Z

💊 CI failures summary and remediations

As of commit 3b16d43 (more details on the Dr. CI page and at hud.pytorch.org/pr/60021):

2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_bionic_cuda10_2_cudnn7_py3_9_gcc7_test1 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 15 17:03:30 ERROR [0.001s]: test_multitenancy (__main__.TCPStoreTest)

Jun 15 17:03:27 /opt/conda/lib/python3.9/site-packages/torch/distributed/rpc/__init__.py:162: UserWarning: RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. PyTorch v1.9 will be the last release that carries PROCESS_GROUP RPC backend. If you have concerns or suggestions please comment in https://github.com/pytorch/pytorch/issues/55615
Jun 15 17:03:27   warnings.warn(
Jun 15 17:03:28 ok (0.025s)
Jun 15 17:03:28   test_multi_worker_with_fixed_world_size (__main__.TCPStoreTest) ... ok (0.004s)
Jun 15 17:03:28   test_multi_worker_with_nonfixed_world_size (__main__.TCPStoreTest) ... ok (0.006s)
Jun 15 17:03:28   test_multitenancy (__main__.TCPStoreTest) ... ERROR (0.001s)
Jun 15 17:03:30   test_numkeys_delkeys (__main__.TCPStoreTest) ... ok (2.006s)
Jun 15 17:03:30   test_set_get (__main__.TCPStoreTest) ... ok (0.003s)
Jun 15 17:03:30 
Jun 15 17:03:30 ======================================================================
Jun 15 17:03:30 ERROR [0.001s]: test_multitenancy (__main__.TCPStoreTest)
Jun 15 17:03:30 ----------------------------------------------------------------------
Jun 15 17:03:30 Traceback (most recent call last):
Jun 15 17:03:30   File "/var/lib/jenkins/workspace/test/distributed/test_store.py", line 179, in test_multitenancy
Jun 15 17:03:30     store1 = dist.TCPStore(addr, port, 1, True, multi_tenant=True)  # type: ignore[call-arg] # noqa: F841
Jun 15 17:03:30 RuntimeError: Address already in use
Jun 15 17:03:30 
Jun 15 17:03:30 ----------------------------------------------------------------------
Jun 15 17:03:30 Ran 24 tests in 12.152s
Jun 15 17:03:30 
Jun 15 17:03:30 FAILED (errors=1)

Windows CI (pytorch-win-vs2019-cpu-py3) / test (2/2)

Step: "Run test scripts" (full log | diagnosis details | 🔁 rerun)

2021-06-15T16:49:21.1101950Z AssertionError: Fa...lowed difference with rtol=0 and atol=0 is only 0!

2021-06-15T16:49:21.1094429Z ----------------------------------------------------------------------
2021-06-15T16:49:21.1094896Z Traceback (most recent call last):
2021-06-15T16:49:21.1095684Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_distributed.py", line 398, in wrapper
2021-06-15T16:49:21.1096370Z     self._join_processes(fn)
2021-06-15T16:49:21.1097153Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_distributed.py", line 590, in _join_processes
2021-06-15T16:49:21.1098070Z     self._check_return_codes(elapsed_time)
2021-06-15T16:49:21.1098854Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_distributed.py", line 639, in _check_return_codes
2021-06-15T16:49:21.1099573Z     self.assertEqual(
2021-06-15T16:49:21.1100287Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 1430, in assertEqual
2021-06-15T16:49:21.1101075Z     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
2021-06-15T16:49:21.1101950Z AssertionError: False is not true : Scalars failed to compare as equal! Comparing 3221226505 and 0 gives a difference of 3221226505, but the allowed difference with rtol=0 and atol=0 is only 0!
2021-06-15T16:49:21.1102872Z Expect process 1 exit code to match Process 0 exit code of 0, but got 3221226505
2021-06-15T16:49:21.1103196Z 
2021-06-15T16:49:21.3659007Z ----------------------------------------------------------------------
2021-06-15T16:49:21.3659537Z Ran 83 tests in 132.746s
2021-06-15T16:49:21.3659777Z 
2021-06-15T16:49:21.3660096Z FAILED (failures=1, skipped=32)
2021-06-15T16:49:21.3660342Z 
2021-06-15T16:49:21.3660642Z Generating XML reports...
2021-06-15T16:49:21.3661418Z Generated XML report: test-reports\python-unittest\distributed.test_c10d_gloo\TEST-CommTest-20210615164708.xml
2021-06-15T16:49:21.3662859Z Generated XML report: test-reports\python-unittest\distributed.test_c10d_gloo\TEST-DistributedDataParallelTest-20210615164708.xml

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Dropping the imaginary component is expected and gives the correct gradient formula, so silencing the warning is appropriate. [ghstack-poisoned]

Dropping the imaginary component is expected and gives the correct gradient formula, so silencing the warning is appropriate. ghstack-source-id: ba5f4e8 Pull Request resolved: #60021

albanD

The removal of the warning looks ok
I would have a couple question about the function below though.

albanD · 2021-06-15T16:36:22Z

torch/csrc/autograd/functions/tensor.cpp

+      const bool copy = (grad->is_cuda() && grad->device() != src_device);
+      grad_inputs[1] = grad->to(
+          src_options,
+          /*non_blocking=*/false,


Why is the non_blockingness of the forward not propagated to here?

I think that creates lifetime issues. grad needs to be known to be alive at least until the next device sync, which I don't think we can guarantee.

cc @mcarilli we might want to double check that one later as part of the syncing in the backward.

torch/csrc/autograd/functions/tensor.cpp

albanD

LGTM

peterbell10 · 2021-06-30T18:47:02Z

@mruberry ping

mruberry

Thanks @peterbell10!

mruberry · 2021-07-07T16:07:05Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-07-07T22:29:16Z

@mruberry merged this pull request in 429436e.

zou3519 · 2021-07-09T14:02:18Z

torch/csrc/autograd/functions/tensor.cpp

-      } else {
-        grad_inputs[1] = grad.to(src_options);
-      }
+      const bool copy = (grad->is_cuda() && grad->device() != src_device);


Do we need to compute copy at all? If grad's device is different from the src device, then it shouldn't matter if we pass grad->to a copy parameter. Because the devices are always different, Tensor::to will always copy regardless of what the copy parameter is. Am I missing something?

That's what the next PR in the stack changes :)

Aha! Thanks for the clarification, I should have read the stack :)

Avoid complex-to-real cast warning in CopyBackward

a33eee4

Dropping the imaginary component is expected and gives the correct gradient formula, so silencing the warning is appropriate. [ghstack-poisoned]

peterbell10 requested review from albanD and soulitzer as code owners June 15, 2021 15:59

peterbell10 mentioned this pull request Jun 15, 2021

Set TORCH_WARN_ONCE to always warn inside of assertNotWarn #60020

Closed

facebook-github-bot added the cla signed label Jun 15, 2021

pytorchbot added the open source label Jun 15, 2021

Update on "Avoid complex-to-real cast warning in CopyBackward"

3b16d43

Dropping the imaginary component is expected and gives the correct gradient formula, so silencing the warning is appropriate. [ghstack-poisoned]

peterbell10 requested a review from mruberry June 15, 2021 16:07

peterbell10 added the module: complex Related to complex number support in PyTorch label Jun 15, 2021

albanD reviewed Jun 15, 2021

View reviewed changes

peterbell10 mentioned this pull request Jun 15, 2021

CopyBackward: Remove redundant src_device and unnecessary copy=True #60025

Closed

albanD approved these changes Jun 15, 2021

View reviewed changes

peterbell10 linked an issue Jun 16, 2021 that may be closed by this pull request

Complex number cast warnings from fft2 and fftn during backward pass #59524

Closed

peterbell10 mentioned this pull request Jun 16, 2021

Complex number cast warnings from fft2 and fftn during backward pass #59524

Closed

mruberry approved these changes Jul 7, 2021

View reviewed changes

facebook-github-bot closed this in 429436e Jul 7, 2021

facebook-github-bot added the Merged label Jul 7, 2021

zou3519 reviewed Jul 9, 2021

View reviewed changes

facebook-github-bot deleted the gh/peterbell10/74/head branch July 11, 2021 14:16

Avoid complex-to-real cast warning in CopyBackward #60021

Avoid complex-to-real cast warning in CopyBackward #60021

Uh oh!

Conversation

peterbell10 commented Jun 15, 2021 • edited by mruberry Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_bionic_cuda10_2_cudnn7_py3_9_gcc7_test1 (1/2)

Windows CI (pytorch-win-vs2019-cpu-py3) / test (2/2)

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Jun 15, 2021

Choose a reason for hiding this comment

Uh oh!

peterbell10 Jun 15, 2021

Choose a reason for hiding this comment

Uh oh!

albanD Jun 15, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

peterbell10 commented Jun 30, 2021

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

mruberry commented Jul 7, 2021

Uh oh!

facebook-github-bot commented Jul 7, 2021

Uh oh!

zou3519 Jul 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbell10 Jul 9, 2021

Choose a reason for hiding this comment

Uh oh!

zou3519 Jul 9, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

peterbell10 commented Jun 15, 2021 •

edited by mruberry

Loading

facebook-github-bot commented Jun 15, 2021 •

edited

Loading

zou3519 Jul 9, 2021 •

edited

Loading