KEMBAR78
fix double backward for half softmax/logsoftmax by ngimel · Pull Request #17330 · pytorch/pytorch · GitHub
Skip to content

Conversation

@ngimel
Copy link
Collaborator

@ngimel ngimel commented Feb 21, 2019

Fix for #17261, @ssnl do you have tests for it in your other PR? If not, I'll add to this. Example from #17261 now does not error out (and same for log_softmax).

@ssnl
Copy link
Collaborator

ssnl commented Feb 21, 2019

Thanks! Maybe removing this line is sufficient for a test.

# FIXME: add torch.half after https://github.com/pytorch/pytorch/issues/17261 is fixed

- name: _log_softmax_backward_data(Tensor grad_output, Tensor output, int64_t dim, Tensor self)
grad_output: grad - (grad * output.exp()).sum(dim, true)
self: log_softmax_double_backward(grad, grad_output, dim, output).type_as(self)
grad_output: grad.type_as(output) - (grad.type_as(output) * output.exp()).sum(dim, true)
Copy link
Collaborator

@ssnl ssnl Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use .to(output.dtype()) instead? I know it's not a problem here, but type_as ignores device index and I think we should generally avoid using it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also replace type_as in clamp derivative.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we can avoid calculating grad.type_as(output) twice, but I guess that requires putting this into a function, and can be left as a future optimization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've noticed it, but given that it's really a corner case (in most cases type_as will be a no-op), decided to leave it as is for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but let's wait until CI comes up again.

@ssnl
Copy link
Collaborator

ssnl commented Feb 21, 2019

@pytorchbot rebase this please

@ngimel
Copy link
Collaborator Author

ngimel commented Feb 21, 2019

I don't think windows build failure is related, it's coming from thrust iterator

15:10:39          C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.2/include\thrust/iterator/iterator_facade.h(517): error C2065: '__T0': undeclared identifier [C:\Jenkins\workspace\caffe2-builds\py2-cuda9.0-cudnn7-windows-build\build\caffe2\caffe2_gpu.vcxproj]

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ngimel ngimel deleted the softmax_double_backward branch April 4, 2019 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants