-
Notifications
You must be signed in to change notification settings - Fork 25.7k
logsumexp for multiple dimensions #16475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think the higher time for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umanwizard has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umanwizard has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umanwizard has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umanwizard has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umanwizard has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umanwizard has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umanwizard has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umanwizard has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Move `logsumexp` and `max_values` to `TensorIterator` and use it to make `logsumexp` work for multiple dimensions. Timings on a tensor of shape `(10,1000000,10)`, for each combination of (cpu, single-threaded cpu, gpu) and dimension: **before** 208 ms ± 2.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 279 ms ± 5.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 199 ms ± 2.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.11 s ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.25 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.11 s ± 6.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 15.4 ms ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 132 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.6 ms ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) **after** 199 ms ± 8.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 307 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 207 ms ± 7.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 1.16 s ± 8.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.26 s ± 47.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.13 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 15.4 ms ± 868 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 132 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.6 ms ± 21.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Pull Request resolved: pytorch/pytorch#16475 Differential Revision: D13855746 Pulled By: umanwizard fbshipit-source-id: aaacc0b967c3f89073487e1952ae6f76b7bd7ad3
Move
logsumexpandmax_valuestoTensorIteratorand use it to makelogsumexpwork for multiple dimensions.Timings on a tensor of shape
(10,1000000,10), for each combination of (cpu, single-threaded cpu, gpu) and dimension:before
208 ms ± 2.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
279 ms ± 5.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
199 ms ± 2.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.11 s ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.25 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.11 s ± 6.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 ms ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
132 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
39.6 ms ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
after
199 ms ± 8.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
307 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
207 ms ± 7.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.16 s ± 8.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.26 s ± 47.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.13 s ± 13.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
15.4 ms ± 868 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
132 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
39.6 ms ± 21.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)