KEMBAR78
Enables previously "slow" `gradgrad` checks on CUDA by krshrimali · Pull Request #57802 · pytorch/pytorch · GitHub
Skip to content

Conversation

@krshrimali
Copy link
Contributor

@krshrimali krshrimali commented May 7, 2021

Fixes #57508

Earlier, a few CUDA gradgrad checks (see the list of ops below) were disabled because of them being too slow. There have been improvements (see #57508 for reference) and this PR aimed on:

  1. Time taken by gradgrad checks on CUDA for the ops listed below.
  2. Enabling the tests again if the times sound reasonable

Ops considered: addbmm, baddbmm, bmm, cholesky, symeig, inverse, linalg.cholesky, linalg.cholesky_ex, linalg.eigh, linalg.qr, lu, qr, solve, triangular_solve, linalg.pinv, svd, linalg.svd, pinverse, linalg.householder_product, linalg.solve.

For numbers (on time taken) on a separate CI run: #57802 (comment).

cc: @mruberry @albanD @pmeier

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 7, 2021

💊 CI failures summary and remediations

As of commit 22dd008 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@krshrimali krshrimali changed the title [DO NOT MERGE] Recording time taken for slowTest enabled tests (CUDA grad/gradgrad checks) [DO NOT MERGE] [WIP] Recording time taken for slowTest enabled tests (CUDA grad/gradgrad checks) May 7, 2021
@codecov
Copy link

codecov bot commented May 7, 2021

Codecov Report

Merging #57802 (d58e146) into master (a427820) will increase coverage by 0.37%.
The diff coverage is n/a.

❗ Current head d58e146 differs from pull request most recent head 22dd008. Consider uploading reports for the commit 22dd008 to get more accurate results

@@            Coverage Diff             @@
##           master   #57802      +/-   ##
==========================================
+ Coverage   76.44%   76.81%   +0.37%     
==========================================
  Files        2022     1980      -42     
  Lines      202303   196861    -5442     
==========================================
- Hits       154652   151228    -3424     
+ Misses      47651    45633    -2018     

@krshrimali
Copy link
Contributor Author

krshrimali commented May 8, 2021

Closing this, just realized to run all CI jobs, I need to create branch on upstream with ci-all/ prefix. (The issue will now be addressed here: #57895)

@krshrimali krshrimali closed this May 8, 2021
@mruberry
Copy link
Collaborator

mruberry commented May 8, 2021

@krshrimali Why do you want to run all the CI jobs?

@krshrimali krshrimali reopened this May 9, 2021
@krshrimali
Copy link
Contributor Author

krshrimali commented May 10, 2021

Update: Results from profiling only CUDA gradgrad checks which were earlier disabled. The list of such ops:

addbmm, baddbmm, bmm, cholesky, symeig, inverse, linalg.cholesky, linalg.cholesky_ex, linalg.eigh, linalg.qr, lu, qr, solve, triangular_solve, linalg.pinv, svd, linalg.svd, pinverse, householder_product, linalg.solve

Below is the table with profiling results: (time is in second)

Function Name Time Taken (complex128) Time Taken (float64)
addbmm 1.416 0.379
baddbmm 1.354 0.369
bmm 0.167 0.132
cholesky 0.449 0.227
symeig 0.790 0.391
inverse 0.335 0.185
linalg.cholesky 0.511 0.250
linalg.cholesky_ex 0.576 0.280
linalg.eigh 0.522 0.271
linalg.qr 2.086 0.767
lu 1.546 0.665
qr 2.077 0.786
solve 5.226 2.127
triangular_solve 1.056 0.462
linalg.pinv 1.167 0.336
svd 3.619 1.004
linalg.svd 3.667 1.018
pinverse 1.297 0.336
householder_product 1.841 0.794
linalg.solve 6.571 2.838

cc: @mruberry

@krshrimali krshrimali changed the title [DO NOT MERGE] [WIP] Recording time taken for slowTest enabled tests (CUDA grad/gradgrad checks) [DO NOT MERGE] Recording time taken for slowTest enabled tests (CUDA grad/gradgrad checks) May 10, 2021
@krshrimali krshrimali changed the title [DO NOT MERGE] Recording time taken for slowTest enabled tests (CUDA grad/gradgrad checks) [DO NOT MERGE] Recording time taken for slowTest enabled tests (CUDA gradgrad checks) May 10, 2021
@krshrimali krshrimali requested a review from mruberry May 10, 2021 08:35
@mruberry
Copy link
Collaborator

Thanks for the updated numbers @krshrimali; I think these times are acceptable, cc @albanD and @soulitzer in case they find them egregious

This is going to conflict with a PR in @albanD's forward-mode AD stack. @albanD, do you want to remove the skip additions for slow grad/gradgrad in that PR, too? If so, I think this PR can remove the current slow skips.

@mruberry mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 12, 2021
@mruberry mruberry requested a review from albanD May 12, 2021 03:48
@krshrimali
Copy link
Contributor Author

Hi, @mruberry, @albanD - gentle ping. Should we disable the gradgrad checks for all the ops tested in this PR? Or has this been taken in a separate PR already? Thanks!

@mruberry
Copy link
Collaborator

Hi, @mruberry, @albanD - gentle ping. Should we disable the gradgrad checks for all the ops tested in this PR? Or has this been taken in a separate PR already? Thanks!

Thanks for the ping, @krshrimali! Let me add this to our review on Friday so we can let @albanD review, too. We should have an answer for you then.

@krshrimali
Copy link
Contributor Author

Hi, @mruberry, @albanD - gentle ping. Should we disable the gradgrad checks for all the ops tested in this PR? Or has this been taken in a separate PR already? Thanks!

Thanks for the ping, @krshrimali! Let me add this to our review on Friday so we can let @albanD review, too. We should have an answer for you then.

Gentle ping @mruberry! Has this been taken up separately?

@albanD
Copy link
Collaborator

albanD commented May 26, 2021

Hey!
We did discuss it last week but did not update here sorry.
There were two things:

  • All these tests are fast enough that we can re-enable them indeed.
  • It would be nice to get a rule for when a test in slow in this context, for example anything >5 or 10s and then time all the tests and apply that rule.
  • We could automatically enforce that rule by failing if a user adds a test that is larger than the given threshold without marking it as slow.

@mruberry
Copy link
Collaborator

mruberry commented May 30, 2021

Just wanted to ping on this. The action here (following @albanD's write-up) is to:

  • merge this PR with master
  • remove the [DO NOT MERGE] text
  • let's land this!

Sound good, @krshrimali? Thanks for helping us with this analysis. Separately, I owe @albanD a more general issue for how we handle slow tests.

@krshrimali
Copy link
Contributor Author

krshrimali commented May 30, 2021

Just wanted to ping on this. The action here (following @albanD's write-up) is to:

  • merge this PR with master
  • remove the [DO NOT MERGE] text
  • let's land this!

Sound good, @krshrimali? Thanks for helping us with this analysis. Separately, I owe @albanD a more general issue for how we handle slow tests.

Thanks, @albanD for the write-up. @mruberry - Thank you for the ping, just confirming, we want to remove all the tests from slowTest for now, right? In the scope of this PR, I'm guessing that we don't want to add any rule to test (if it's slow, as mentioned by Alban) when a test is added, right? We can probably take this up in another PR.

Please let me know if this sounds good, and I'll make the changes!

@mruberry
Copy link
Collaborator

Just wanted to ping on this. The action here (following @albanD's write-up) is to:

  • merge this PR with master
  • remove the [DO NOT MERGE] text
  • let's land this!

Sound good, @krshrimali? Thanks for helping us with this analysis. Separately, I owe @albanD a more general issue for how we handle slow tests.

Thanks, @albanD for the write-up. @mruberry - Thank you for the ping, just confirming, we want to remove all the tests from slowTest for now, right?

All the tests that this PR timed, yes.

In the scope of this PR, I'm guessing that we don't want to add any rule to test (if it's slow, as mentioned by Alban) when a test is added, right? We can probably take this up in another PR.

Correct; that's a separate issue that requires additional design.

Please let me know if this sounds good, and I'll make the changes!

Sounds great! Let me know if you have any questions

@krshrimali
Copy link
Contributor Author

Just wanted to ping on this. The action here (following @albanD's write-up) is to:

  • merge this PR with master
  • remove the [DO NOT MERGE] text
  • let's land this!

Sound good, @krshrimali? Thanks for helping us with this analysis. Separately, I owe @albanD a more general issue for how we handle slow tests.

Thanks, @albanD for the write-up. @mruberry - Thank you for the ping, just confirming, we want to remove all the tests from slowTest for now, right?

All the tests that this PR timed, yes.

In the scope of this PR, I'm guessing that we don't want to add any rule to test (if it's slow, as mentioned by Alban) when a test is added, right? We can probably take this up in another PR.

Correct; that's a separate issue that requires additional design.

Please let me know if this sounds good, and I'll make the changes!

Sounds great! Let me know if you have any questions

Thanks, @mruberry! On it. :)

@krshrimali krshrimali changed the title [DO NOT MERGE] Recording time taken for slowTest enabled tests (CUDA gradgrad checks) Enabling gradgrad checks on CUDA (earlier disabled for being too slow) May 30, 2021
@mruberry mruberry changed the title Enabling gradgrad checks on CUDA (earlier disabled for being too slow) Enables previously "slow" gradgrad checks on CUDA May 31, 2021
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice investigation, @krshrimali, and cool that these skips are now gone.

cc @soulitzer

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 6d45d7a.

deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
Fixes pytorch#57508

Earlier, a few CUDA `gradgrad` checks (see the list of ops below) were disabled because of them being too slow. There have been improvements (see pytorch#57508 for reference) and this PR aimed on:

1. Time taken by `gradgrad` checks on CUDA for the ops listed below.
2. Enabling the tests again if the times sound reasonable

Ops considered: `addbmm, baddbmm, bmm, cholesky, symeig, inverse, linalg.cholesky, linalg.cholesky_ex, linalg.eigh, linalg.qr, lu, qr, solve, triangular_solve, linalg.pinv, svd, linalg.svd, pinverse, linalg.householder_product, linalg.solve`.

For numbers (on time taken) on a separate CI run: pytorch#57802 (comment).

cc: mruberry albanD pmeier

Pull Request resolved: pytorch#57802

Reviewed By: ngimel

Differential Revision: D28784106

Pulled By: mruberry

fbshipit-source-id: 9b15238319f143c59f83d500e831d66d98542ff8
facebook-github-bot pushed a commit that referenced this pull request Jul 15, 2021
Summary:
**Description:** `SpectralFuncInfo` defines decorator mentioning: "gradgrad is quite slow". This PR re-analyses that statement since things have changed with gradient tests.

**Test times:** #60435 (comment)

**Follow-up** of #57802

cc: mruberry

Pull Request resolved: #60435

Reviewed By: gchanan

Differential Revision: D29707444

Pulled By: mruberry

fbshipit-source-id: 444b4863bac8556c7e8fcc8ff58d81a91bd96a21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unskip CUDA grad/gradgrad checks or no longer mark as slow test

5 participants