-
Notifications
You must be signed in to change notification settings - Fork 25.7k
fix double backward for half softmax/logsoftmax #17330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tools/autograd/derivatives.yaml
Outdated
| - name: _log_softmax_backward_data(Tensor grad_output, Tensor output, int64_t dim, Tensor self) | ||
| grad_output: grad - (grad * output.exp()).sum(dim, true) | ||
| self: log_softmax_double_backward(grad, grad_output, dim, output).type_as(self) | ||
| grad_output: grad.type_as(output) - (grad.type_as(output) * output.exp()).sum(dim, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use .to(output.dtype()) instead? I know it's not a problem here, but type_as ignores device index and I think we should generally avoid using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also replace type_as in clamp derivative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if we can avoid calculating grad.type_as(output) twice, but I guess that requires putting this into a function, and can be left as a future optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've noticed it, but given that it's really a corner case (in most cases type_as will be a no-op), decided to leave it as is for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but let's wait until CI comes up again.
|
@pytorchbot rebase this please |
|
I don't think windows build failure is related, it's coming from thrust iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Fix for #17261, @ssnl do you have tests for it in your other PR? If not, I'll add to this. Example from #17261 now does not error out (and same for log_softmax).