-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Avoid complex-to-real cast warning in CopyBackward #60021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Dropping the imaginary component is expected and gives the correct gradient formula, so silencing the warning is appropriate. [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 3b16d43 (more details on the Dr. CI page and at hud.pytorch.org/pr/60021):
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Dropping the imaginary component is expected and gives the correct gradient formula, so silencing the warning is appropriate. [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of the warning looks ok
I would have a couple question about the function below though.
| const bool copy = (grad->is_cuda() && grad->device() != src_device); | ||
| grad_inputs[1] = grad->to( | ||
| src_options, | ||
| /*non_blocking=*/false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the non_blockingness of the forward not propagated to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that creates lifetime issues. grad needs to be known to be alive at least until the next device sync, which I don't think we can guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mcarilli we might want to double check that one later as part of the syncing in the backward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@mruberry ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @peterbell10!
|
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| } else { | ||
| grad_inputs[1] = grad.to(src_options); | ||
| } | ||
| const bool copy = (grad->is_cuda() && grad->device() != src_device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to compute copy at all? If grad's device is different from the src device, then it shouldn't matter if we pass grad->to a copy parameter. Because the devices are always different, Tensor::to will always copy regardless of what the copy parameter is. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what the next PR in the stack changes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Thanks for the clarification, I should have read the stack :)
Stack from ghstack:
Dropping the imaginary component is expected and gives the correct gradient
formula, so silencing the warning is appropriate.
Differential Revision: D29589371