KEMBAR78
Add helpful information to the gradient/inplace operation exception by f0k · Pull Request #18523 · pytorch/pytorch · GitHub
Skip to content

Conversation

@f0k
Copy link
Contributor

@f0k f0k commented Mar 27, 2019

To debug a one of the variables needed for gradient computation has been modified by an inplace operation error, I wanted to know which variable has been modified, so I extended the error message with what information is easily available at this point.

Before:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

After:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [80, 1]], which is output 0 of UnsqueezeBackward0, is at version 1, not expected version 0. Hint: enable anomaly detection to find the forward pass operation which modified it.

The hint to enable anomaly detection is only shown when it is not enabled. It's meant to save people some googling. I'd even go further and reference torch.autograd.set_detect_anomaly(True), but maybe we're not running Python?

Disclaimer: I haven't looked at other parts of the code to check if using std::stringstream is acceptable practice, let me know if it isn't. Similarly, I haven't checked about indentation practices.

@soumith
Copy link
Member

soumith commented Mar 27, 2019

looks like a great change, thank you! I'm tagging @apaszke for review

@soumith soumith requested review from albanD and apaszke March 27, 2019 13:53
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think you want the "not" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was on purpose, but I can see how it's ambiguous, thanks. I meant to say "it's at version x, not at the expected version, which is y". I'll change it to "is at version x, expected y instead".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to point to the forward pass op that modified it? Or the one that needs it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot. From what I read in #15803 (comment), I assumed that it would find the inplace operation, but after debugging my problem, I found that this was a red herring. I will change the hint accordingly!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be quite interesting to actually be able to point out both for debugging purposes !
But it would be a bit tricky to do :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, where do the version counters live? Would it be possible to have a mode that warns if any gradient-required tensor is involved when increasing a version counter? Or a mode that simply warns on any in-place operation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are usually many (many) inplace ops. So that mode might be too verbose to be useful.
But maybe it's something we want to add to the anomaly detection mode: track which python line made the version counter bump so that we can use that information here for better error printing.
But that would be a much more involved change.

@f0k f0k force-pushed the informative-inplace-error branch from b31ab22 to 2dad0b1 Compare March 27, 2019 15:56
@f0k
Copy link
Contributor Author

f0k commented Mar 27, 2019

Following the review, I made the following updates:

  • "is at version 1, not expected version 0" is changed to "is at version 1, expected version 0 instead"
  • If anomaly detection is disabled, the hint now reads: "Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)." This means people will not even have to google/bing/baidu/duckduckgo how to enable anomaly detection.
  • If anomaly detection is enabled, there is a new hint that reads: "Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!" This should make clear that anomaly detection does not necessarily point at the line to be changed.

Let me know if you think I overdid it with the hints or the best wishes, but I guess many newcomers will hit this error sooner or later, and this would minimize the time spent on search engines.

@soumith
Copy link
Member

soumith commented Mar 29, 2019

@albanD @apaszke is this good to go?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@soumith merged this pull request in b77e3c2.

@f0k f0k deleted the informative-inplace-error branch April 3, 2019 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants