-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Enables previously "slow" gradgrad checks on CUDA
#57802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 22dd008 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Codecov Report
@@ Coverage Diff @@
## master #57802 +/- ##
==========================================
+ Coverage 76.44% 76.81% +0.37%
==========================================
Files 2022 1980 -42
Lines 202303 196861 -5442
==========================================
- Hits 154652 151228 -3424
+ Misses 47651 45633 -2018 |
|
Closing this, just realized to run all CI jobs, I need to create branch on upstream with |
|
@krshrimali Why do you want to run all the CI jobs? |
|
Update: Results from profiling only CUDA Below is the table with profiling results: (time is in second)
cc: @mruberry |
|
Thanks for the updated numbers @krshrimali; I think these times are acceptable, cc @albanD and @soulitzer in case they find them egregious This is going to conflict with a PR in @albanD's forward-mode AD stack. @albanD, do you want to remove the skip additions for slow grad/gradgrad in that PR, too? If so, I think this PR can remove the current slow skips. |
Thanks for the ping, @krshrimali! Let me add this to our review on Friday so we can let @albanD review, too. We should have an answer for you then. |
Gentle ping @mruberry! Has this been taken up separately? |
|
Hey!
|
|
Just wanted to ping on this. The action here (following @albanD's write-up) is to:
Sound good, @krshrimali? Thanks for helping us with this analysis. Separately, I owe @albanD a more general issue for how we handle slow tests. |
Thanks, @albanD for the write-up. @mruberry - Thank you for the ping, just confirming, we want to remove all the tests from Please let me know if this sounds good, and I'll make the changes! |
All the tests that this PR timed, yes.
Correct; that's a separate issue that requires additional design.
Sounds great! Let me know if you have any questions |
Thanks, @mruberry! On it. :) |
gradgrad checks on CUDA (earlier disabled for being too slow)
gradgrad checks on CUDA (earlier disabled for being too slow)gradgrad checks on CUDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice investigation, @krshrimali, and cool that these skips are now gone.
cc @soulitzer
|
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Fixes pytorch#57508 Earlier, a few CUDA `gradgrad` checks (see the list of ops below) were disabled because of them being too slow. There have been improvements (see pytorch#57508 for reference) and this PR aimed on: 1. Time taken by `gradgrad` checks on CUDA for the ops listed below. 2. Enabling the tests again if the times sound reasonable Ops considered: `addbmm, baddbmm, bmm, cholesky, symeig, inverse, linalg.cholesky, linalg.cholesky_ex, linalg.eigh, linalg.qr, lu, qr, solve, triangular_solve, linalg.pinv, svd, linalg.svd, pinverse, linalg.householder_product, linalg.solve`. For numbers (on time taken) on a separate CI run: pytorch#57802 (comment). cc: mruberry albanD pmeier Pull Request resolved: pytorch#57802 Reviewed By: ngimel Differential Revision: D28784106 Pulled By: mruberry fbshipit-source-id: 9b15238319f143c59f83d500e831d66d98542ff8
Summary: **Description:** `SpectralFuncInfo` defines decorator mentioning: "gradgrad is quite slow". This PR re-analyses that statement since things have changed with gradient tests. **Test times:** #60435 (comment) **Follow-up** of #57802 cc: mruberry Pull Request resolved: #60435 Reviewed By: gchanan Differential Revision: D29707444 Pulled By: mruberry fbshipit-source-id: 444b4863bac8556c7e8fcc8ff58d81a91bd96a21
Fixes #57508
Earlier, a few CUDA
gradgradchecks (see the list of ops below) were disabled because of them being too slow. There have been improvements (see #57508 for reference) and this PR aimed on:gradgradchecks on CUDA for the ops listed below.Ops considered:
addbmm, baddbmm, bmm, cholesky, symeig, inverse, linalg.cholesky, linalg.cholesky_ex, linalg.eigh, linalg.qr, lu, qr, solve, triangular_solve, linalg.pinv, svd, linalg.svd, pinverse, linalg.householder_product, linalg.solve.For numbers (on time taken) on a separate CI run: #57802 (comment).
cc: @mruberry @albanD @pmeier