-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[MPS] Move max_pool2d to Metal for stride != 1
#157876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157876
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit d7f8382 with merge base f89c28c ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Leaving this in draft mode for now because I'm still investigating performance improvements |
|
I wrote a performance measurement script here: https://github.com/kurtamohler/pytorch-perf-test-scripts/blob/55ef32a127c746d13d7310375068a6b300bda92d/max_pool_mps/perf.py Before this PR I get this: After this PR I get this: In most of those cases, performance improves a little bit or stays basically the same. But in five of them, performance gets worse. Looks like the worse cases involve either dilation or large kernel sizes. So I'll see what I can do to improve those. I also haven't looked into the backward call performance, so I'll need to do that. |
|
Performance update: With the new changes from #157875, this PR now improves all but one of the cases that my script checks. I've updated the PR description with details. |
|
For now, I opted to just enable the Metal impl for uint8 when MacOS < 14.0, which is the currently unsupported case. In a follow up PR I'll try to enable the Metal impl in all cases if I can figure out how to improve performance. I found a handful of cases where the Metal impl is only around 30% as fast as the graph impl. |
|
I just realized that support for macOS 13 has ended, so nevermind making a special case for macOS < 14. @malfet I guess I'm not sure, in which cases were you suggesting I enable the new impl? |
|
@kurtamohler i think MPS implementation of all pooling ops expect tensors shape to be divisible by pool shape, but that's not the case for CPU nor CUDA implementation of the op. Oops, my apologies, I've meant adaptive pool, not max_pool, see #96056 , which does not crash but returns silently incorrect errors... |
|
@malfet, I added more coverage to my performance script and I found out that when I updated the PR description with latest performance measurements. If it's too long, I could cut it down to a smaller set. In almost every case that I tested, this PR now either gives the same or better performance. The worst case is 157, where the new impl is only ~80% as fast as the old. It uses a 1000x1000 input with a kernel size of 250x250. From what I understand, it's uncommon to use such a large kernel. We could consider falling back to the old impl for large kernel sizes though |
stride != 1
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This PR updates `max_pool2d` to use a Metal kernel instead of the old MPS graph impl. However, when the `stride` argument is 1 in all dimensions, the old implementation gives significantly better performance, so we fall back to it in that case. Below is a performance comparison of `max_pool2d` before and after this PR, obtained from this script: https://github.com/kurtamohler/pytorch-perf-test-scripts/blob/2f02f2bf7ad8e1b80d8eb728612b179d48fe92d7/max_pool_mps/perf.py <details><summary>Click to expand</summary> case | before PR | after PR | speedup | | case info -- | -- | -- | -- | -- | -- 0 | 0.014264 | 0.004473 | 3.188911245 | | (3, 2, 2), {'kernel_size': 2, 'return_indices': True} 1 | 0.010752 | 0.00421 | 2.55391924 | | (3, 2, 2), {'kernel_size': 2, 'return_indices': False} 2 | 0.020777 | 0.006123 | 3.393271272 | | (3, 10, 10), {'kernel_size': 5, 'return_indices': True} 3 | 0.011065 | 0.005759 | 1.921340511 | | (3, 10, 10), {'kernel_size': 5, 'return_indices': False} 4 | 0.01452 | 0.007829 | 1.854642994 | | (3, 100, 100), {'kernel_size': 5, 'return_indices': True} 5 | 0.009258 | 0.007075 | 1.308551237 | | (3, 100, 100), {'kernel_size': 5, 'return_indices': False} 6 | 0.188137 | 0.168688 | 1.115295694 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 0, 'return_indices': True} 7 | 0.161362 | 0.154746 | 1.042753932 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 0, 'return_indices': False} 8 | 0.182883 | 0.16945 | 1.079274122 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 1, 'return_indices': True} 9 | 0.156875 | 0.163346 | 0.9603847049 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 1, 'return_indices': False} 10 | 0.193433 | 0.167396 | 1.155541351 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 2, 'return_indices': True} 11 | 0.158967 | 0.151246 | 1.051049284 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 2, 'return_indices': False} 12 | 0.931071 | 0.932883 | 0.9980576342 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 0, 'return_indices': True} 13 | 0.324496 | 0.3252 | 0.9978351784 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 0, 'return_indices': False} 14 | 0.944071 | 0.936246 | 1.008357846 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 1, 'return_indices': True} 15 | 0.322171 | 0.314854 | 1.023239343 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 1, 'return_indices': False} 16 | 0.894158 | 0.886408 | 1.008743152 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 2, 'return_indices': True} 17 | 0.309338 | 0.304146 | 1.017070749 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 2, 'return_indices': False} 18 | 0.606 | 0.260546 | 2.325884873 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 0, 'return_indices': True} 19 | 0.30445 | 0.231054 | 1.317657344 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 0, 'return_indices': False} 20 | 0.474708 | 0.261925 | 1.812381407 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 1, 'return_indices': True} 21 | 0.23175 | 0.231883 | 0.9994264349 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 1, 'return_indices': False} 22 | 0.434475 | 0.266246 | 1.631855502 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 2, 'return_indices': True} 23 | 0.236942 | 0.231792 | 1.022218196 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 2, 'return_indices': False} 24 | 0.202396 | 0.174888 | 1.157289237 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 0, 'return_indices': True} 25 | 0.160679 | 0.158246 | 1.015374796 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 0, 'return_indices': False} 26 | 0.200354 | 0.184133 | 1.088093932 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 1, 'return_indices': True} 27 | 0.160779 | 0.160679 | 1.000622359 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 1, 'return_indices': False} 28 | 0.199175 | 0.178625 | 1.115045486 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 2, 'return_indices': True} 29 | 0.159458 | 0.160883 | 0.9911426316 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 2, 'return_indices': False} 30 | 0.199021 | 0.165329 | 1.203787599 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 0, 'return_indices': True} 31 | 0.156337 | 0.158213 | 0.9881425673 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 0, 'return_indices': False} 32 | 0.180146 | 0.174483 | 1.032455884 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 1, 'return_indices': True} 33 | 0.156988 | 0.158167 | 0.9925458534 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 1, 'return_indices': False} 34 | 0.182133 | 0.176521 | 1.031792251 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 2, 'return_indices': True} 35 | 0.169042 | 0.156483 | 1.080257919 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 2, 'return_indices': False} 36 | 1.767821 | 1.766254 | 1.000887188 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 0, 'return_indices': True} 37 | 1.059346 | 1.058775 | 1.000539302 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 0, 'return_indices': False} 38 | 1.85755 | 1.859429 | 0.9989894747 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 1, 'return_indices': True} 39 | 1.100417 | 1.097683 | 1.002490701 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 1, 'return_indices': False} 40 | 1.843167 | 1.847558 | 0.9976233493 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 2, 'return_indices': True} 41 | 1.090142 | 1.093163 | 0.9972364597 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 2, 'return_indices': False} 42 | 0.480867 | 0.251733 | 1.910226311 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 0, 'return_indices': True} 43 | 0.319246 | 0.236479 | 1.349997251 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 0, 'return_indices': False} 44 | 0.49315 | 0.256408 | 1.923301925 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 1, 'return_indices': True} 45 | 0.316746 | 0.227854 | 1.390127011 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 1, 'return_indices': False} 46 | 0.4912 | 0.257762 | 1.905633879 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 2, 'return_indices': True} 47 | 0.324771 | 0.229371 | 1.41592006 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 2, 'return_indices': False} 48 | 0.152904 | 0.095079 | 1.608178462 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 0, 'return_indices': True} 49 | 0.102963 | 0.089217 | 1.154073775 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 0, 'return_indices': False} 50 | 0.155158 | 0.095429 | 1.625899884 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 1, 'return_indices': True} 51 | 0.104338 | 0.089979 | 1.15958168 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 1, 'return_indices': False} 52 | 0.153121 | 0.096429 | 1.587914424 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 2, 'return_indices': True} 53 | 0.103642 | 0.090254 | 1.148336916 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 2, 'return_indices': False} 54 | 0.191071 | 0.165125 | 1.157129447 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 0, 'return_indices': True} 55 | 0.153971 | 0.149021 | 1.033216795 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 0, 'return_indices': False} 56 | 0.193192 | 0.166892 | 1.157586942 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 1, 'return_indices': True} 57 | 0.156617 | 0.15215 | 1.029359185 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 1, 'return_indices': False} 58 | 0.178033 | 0.167308 | 1.06410333 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 2, 'return_indices': True} 59 | 0.157425 | 0.164404 | 0.9575496947 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 2, 'return_indices': False} 60 | 1.757638 | 1.750896 | 1.0038506 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 0, 'return_indices': True} 61 | 1.048471 | 1.047967 | 1.000480931 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 0, 'return_indices': False} 62 | 1.790708 | 1.789767 | 1.000525767 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 1, 'return_indices': True} 63 | 1.054575 | 1.054796 | 0.9997904808 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 1, 'return_indices': False} 64 | 1.785837 | 1.784192 | 1.000921986 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 2, 'return_indices': True} 65 | 1.054713 | 1.054492 | 1.00020958 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 2, 'return_indices': False} 66 | 0.478267 | 0.261017 | 1.832321266 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 0, 'return_indices': True} 67 | 0.32005 | 0.226654 | 1.412064204 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 0, 'return_indices': False} 68 | 0.484008 | 0.254721 | 1.900149575 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 1, 'return_indices': True} 69 | 0.321 | 0.218842 | 1.466811672 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 1, 'return_indices': False} 70 | 0.482087 | 0.248771 | 1.937874591 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 2, 'return_indices': True} 71 | 0.316558 | 0.230533 | 1.373156988 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 2, 'return_indices': False} 72 | 0.137842 | 0.085088 | 1.619993419 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 0, 'return_indices': True} 73 | 0.100671 | 0.0769 | 1.309115735 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 0, 'return_indices': False} 74 | 0.148321 | 0.086967 | 1.705485989 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 1, 'return_indices': True} 75 | 0.101392 | 0.075454 | 1.343759112 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 1, 'return_indices': False} 76 | 0.150208 | 0.083742 | 1.793699697 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 2, 'return_indices': True} 77 | 0.099587 | 0.075825 | 1.313379492 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 2, 'return_indices': False} 78 | 0.622546 | 0.602729 | 1.03287879 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 0, 'return_indices': True} 79 | 0.531696 | 0.5067 | 1.049330965 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 0, 'return_indices': False} 80 | 0.626646 | 0.617038 | 1.015571164 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 1, 'return_indices': True} 81 | 0.530354 | 0.525367 | 1.009492412 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 1, 'return_indices': False} 82 | 0.633933 | 0.577775 | 1.097197006 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 2, 'return_indices': True} 83 | 0.533067 | 0.526954 | 1.011600633 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 2, 'return_indices': False} 84 | 3.372867 | 3.386412 | 0.9960001914 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 0, 'return_indices': True} 85 | 1.155975 | 1.156604 | 0.9994561665 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 0, 'return_indices': False} 86 | 3.401921 | 3.39755 | 1.001286515 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 1, 'return_indices': True} 87 | 1.202829 | 1.192538 | 1.008629494 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 1, 'return_indices': False} 88 | 3.23675 | 3.220238 | 1.005127571 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 2, 'return_indices': True} 89 | 1.077067 | 1.085613 | 0.9921279498 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 2, 'return_indices': False} 90 | 1.572925 | 0.925625 | 1.699311276 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 0, 'return_indices': True} 91 | 0.791204 | 0.793454 | 0.9971642969 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 0, 'return_indices': False} 92 | 1.572742 | 0.922729 | 1.704446268 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 1, 'return_indices': True} 93 | 0.784292 | 0.788871 | 0.9941955022 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 1, 'return_indices': False} 94 | 1.526546 | 0.925708 | 1.649057802 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 2, 'return_indices': True} 95 | 0.769321 | 0.787675 | 0.9766985114 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 2, 'return_indices': False} 96 | 0.736033 | 0.612808 | 1.201082558 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 0, 'return_indices': True} 97 | 0.574625 | 0.530925 | 1.082309177 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 0, 'return_indices': False} 98 | 0.722021 | 0.614488 | 1.174996094 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 1, 'return_indices': True} 99 | 0.563171 | 0.533721 | 1.055178642 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 1, 'return_indices': False} 100 | 0.735725 | 0.613992 | 1.198264798 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 2, 'return_indices': True} 101 | 0.583487 | 0.532513 | 1.095723485 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 2, 'return_indices': False} 102 | 0.656383 | 0.575313 | 1.140914598 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 0, 'return_indices': True} 103 | 0.559796 | 0.509079 | 1.099625009 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 0, 'return_indices': False} 104 | 0.662046 | 0.572362 | 1.156691045 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 1, 'return_indices': True} 105 | 0.552633 | 0.508671 | 1.086425214 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 1, 'return_indices': False} 106 | 0.634108 | 0.574629 | 1.103508525 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 2, 'return_indices': True} 107 | 0.534013 | 0.510996 | 1.045043405 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 2, 'return_indices': False} 108 | 7.056642 | 7.066717 | 0.9985743026 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 0, 'return_indices': True} 109 | 4.144275 | 4.142658 | 1.000390329 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 0, 'return_indices': False} 110 | 7.172683 | 7.189867 | 0.9976099697 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 1, 'return_indices': True} 111 | 4.162538 | 4.158875 | 1.000880767 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 1, 'return_indices': False} 112 | 7.194233 | 7.181837 | 1.001726021 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 2, 'return_indices': True} 113 | 4.294083 | 4.196062 | 1.023360236 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 2, 'return_indices': False} 114 | 1.875692 | 0.891071 | 2.104986022 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 0, 'return_indices': True} 115 | 1.097479 | 0.781175 | 1.404907991 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 0, 'return_indices': False} 116 | 1.8883 | 0.89015 | 2.121327866 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 1, 'return_indices': True} 117 | 1.101329 | 0.778542 | 1.414604479 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 1, 'return_indices': False} 118 | 1.872833 | 0.893654 | 2.095702587 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 2, 'return_indices': True} 119 | 1.096712 | 0.784579 | 1.397835017 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 2, 'return_indices': False} 120 | 0.513029 | 0.374417 | 1.370207549 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 0, 'return_indices': True} 121 | 0.349546 | 0.305763 | 1.143192603 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 0, 'return_indices': False} 122 | 0.518929 | 0.377487 | 1.374693698 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 1, 'return_indices': True} 123 | 0.364662 | 0.3145 | 1.159497615 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 1, 'return_indices': False} 124 | 0.521275 | 0.375242 | 1.389170189 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 2, 'return_indices': True} 125 | 0.367488 | 0.308354 | 1.191773092 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 2, 'return_indices': False} 126 | 0.652342 | 0.569308 | 1.145850752 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 0, 'return_indices': True} 127 | 0.555696 | 0.506892 | 1.096280865 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 0, 'return_indices': False} 128 | 0.654333 | 0.570367 | 1.147213987 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 1, 'return_indices': True} 129 | 0.548925 | 0.505825 | 1.085207335 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 1, 'return_indices': False} 130 | 0.655908 | 0.571904 | 1.146884792 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 2, 'return_indices': True} 131 | 0.560808 | 0.508238 | 1.103435792 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 2, 'return_indices': False} 132 | 6.949462 | 6.949112 | 1.000050366 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 0, 'return_indices': True} 133 | 4.072913 | 4.065013 | 1.001943413 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 0, 'return_indices': False} 134 | 7.200896 | 7.197792 | 1.000431243 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 1, 'return_indices': True} 135 | 4.291367 | 4.218538 | 1.017264038 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 1, 'return_indices': False} 136 | 7.1823 | 7.306933 | 0.9829431856 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 2, 'return_indices': True} 137 | 4.151175 | 4.149592 | 1.000381483 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 2, 'return_indices': False} 138 | 1.781279 | 0.884288 | 2.014365229 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 0, 'return_indices': True} 139 | 1.050804 | 0.774362 | 1.356993241 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 0, 'return_indices': False} 140 | 1.860758 | 0.884637 | 2.103414169 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 1, 'return_indices': True} 141 | 1.099908 | 0.775887 | 1.417613647 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 1, 'return_indices': False} 142 | 1.857387 | 0.885738 | 2.096993693 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 2, 'return_indices': True} 143 | 1.105279 | 0.77365 | 1.428655077 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 2, 'return_indices': False} 144 | 0.489408 | 0.269583 | 1.815426047 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 0, 'return_indices': True} 145 | 0.322525 | 0.236979 | 1.360985573 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 0, 'return_indices': False} 146 | 0.515475 | 0.265813 | 1.93923924 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 1, 'return_indices': True} 147 | 0.315525 | 0.228146 | 1.382995976 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 1, 'return_indices': False} 148 | 0.503438 | 0.277204 | 1.816128194 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 2, 'return_indices': True} 149 | 0.335421 | 0.228275 | 1.469372467 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 2, 'return_indices': False} 150 | 5.72495 | 4.909554 | 1.166083518 | | (10, 10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': None, 'return_indices': True} 151 | 4.45215 | 4.251333 | 1.047236243 | | (10, 10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': None, 'return_indices': False} 152 | 29.953021 | 29.879879 | 1.002447868 | | (10, 10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': 1, 'return_indices': True} 153 | 9.854683 | 9.839517 | 1.001541336 | | (10, 10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': 1, 'return_indices': False} 154 | 6.178033 | 5.697375 | 1.084364817 | | (10, 10, 1000, 1000), {'kernel_size': 100, 'padding': 50, 'return_indices': True} 155 | 6.280317 | 5.712525 | 1.099394226 | | (10, 10, 1000, 1000), {'kernel_size': 100, 'padding': 50, 'return_indices': False} 156 | 10.256062 | 11.336527 | 0.9046917103 | | (10, 10, 1000, 1000), {'kernel_size': 250, 'padding': 50, 'return_indices': True} 157 | 9.469546 | 11.33705 | 0.8352742556 | | (10, 10, 1000, 1000), {'kernel_size': 250, 'padding': 50, 'return_indices': False} 158 | 0.119087 | 0.0797 | 1.494190715 | | (10, 10, 100, 100), {'kernel_size': 2, 'return_indices': True} 159 | 0.098713 | 0.047173 | 2.092574142 | | (10, 10, 100, 100), {'kernel_size': 2, 'return_indices': False} 160 | 0.960812 | 0.675762 | 1.421820108 | | (10, 10, 300, 300), {'kernel_size': 2, 'return_indices': True} 161 | 0.536546 | 0.485958 | 1.104099531 | | (10, 10, 300, 300), {'kernel_size': 2, 'return_indices': False} 162 | 2.555225 | 1.791567 | 1.426251432 | | (10, 10, 500, 500), {'kernel_size': 2, 'return_indices': True} 163 | 1.419087 | 1.305137 | 1.087308842 | | (10, 10, 500, 500), {'kernel_size': 2, 'return_indices': False} 164 | 5.182008 | 3.48085 | 1.488719135 | | (10, 10, 700, 700), {'kernel_size': 2, 'return_indices': True} 165 | 2.831779 | 2.498537 | 1.133374851 | | (10, 10, 700, 700), {'kernel_size': 2, 'return_indices': False} 166 | 8.546038 | 5.7783 | 1.478988284 | | (10, 10, 900, 900), {'kernel_size': 2, 'return_indices': True} 167 | 4.731004 | 4.161975 | 1.136720908 | | (10, 10, 900, 900), {'kernel_size': 2, 'return_indices': False} 168 | 0.084754 | 0.07435 | 1.139932751 | | (10, 10, 100, 100), {'kernel_size': 2, 'return_indices': True} 169 | 0.057933 | 0.043096 | 1.344277891 | | (10, 10, 100, 100), {'kernel_size': 2, 'return_indices': False} 170 | 2.568592 | 1.802117 | 1.425319222 | | (10, 10, 500, 500), {'kernel_size': 2, 'return_indices': True} 171 | 1.433054 | 1.307342 | 1.096158465 | | (10, 10, 500, 500), {'kernel_size': 2, 'return_indices': False} 172 | 10.3213 | 7.111604 | 1.451332217 | | (10, 10, 1000, 1000), {'kernel_size': 2, 'return_indices': True} 173 | 5.680525 | 5.168129 | 1.099145358 | | (10, 10, 1000, 1000), {'kernel_size': 2, 'return_indices': False} 174 | 1.02255 | 1.01375 | 1.008680641 | | (10, 1000, 1000), {'kernel_size': 2, 'padding': 1, 'stride': 1, 'return_indices': False} 175 | 3.074233 | 3.094383 | 0.993488201 | | (10, 1000, 1000), {'kernel_size': 2, 'padding': 1, 'stride': 1, 'return_indices': True} 176 | 1.016812 | 1.030575 | 0.9866453194 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': 1, 'return_indices': False} 177 | 3.053658 | 3.089504 | 0.9883974903 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': 1, 'return_indices': True} 178 | 1.025863 | 1.032088 | 0.9939685376 | | (10, 1000, 1000), {'kernel_size': 8, 'padding': 1, 'stride': 1, 'return_indices': False} 179 | 3.798942 | 3.799213 | 0.9999286694 | | (10, 1000, 1000), {'kernel_size': 8, 'padding': 1, 'stride': 1, 'return_indices': True} 180 | 4.492979 | 4.493421 | 0.999901634 | | (10, 1000, 1000), {'kernel_size': 16, 'padding': 1, 'stride': 1, 'return_indices': False} 181 | 51.543363 | 51.266204 | 1.005406271 | | (10, 1000, 1000), {'kernel_size': 16, 'padding': 1, 'stride': 1, 'return_indices': True} 182 | 1.018008 | 1.001587 | 1.016394981 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (1, 1), 'return_indices': False} 183 | 3.035404 | 3.003113 | 1.010752509 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (1, 1), 'return_indices': True} 184 | 0.610421 | 0.56 | 1.0900375 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (1, 4), 'return_indices': False} 185 | 1.138983 | 0.757296 | 1.504012962 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (1, 4), 'return_indices': True} 186 | 0.641558 | 0.557808 | 1.150141267 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (4, 1), 'return_indices': False} 187 | 1.181475 | 0.754725 | 1.565437742 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (4, 1), 'return_indices': True} 188 | 1.03045 | 1.026904 | 1.003453098 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (1, 1), 'return_indices': False} 189 | 3.041421 | 3.0263 | 1.00499653 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (1, 1), 'return_indices': True} 190 | 0.609929 | 0.572304 | 1.065743032 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (1, 4), 'return_indices': False} 191 | 1.146875 | 0.756446 | 1.516135983 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (1, 4), 'return_indices': True} 192 | 0.645187 | 0.561708 | 1.148616363 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (4, 1), 'return_indices': False} 193 | 1.181721 | 0.758054 | 1.558887625 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (4, 1), 'return_indices': True} 194 | 0.927654 | 0.925946 | 1.0018446 | | (10, 1000, 1000), {'kernel_size': 1, 'return_indices': False} 195 | 2.749983 | 2.740354 | 1.00351378 | | (10, 1000, 1000), {'kernel_size': 1, 'return_indices': True} </details> Pull Request resolved: pytorch#157876 Approved by: https://github.com/malfet
This PR updates `max_pool2d` to use a Metal kernel instead of the old MPS graph impl. However, when the `stride` argument is 1 in all dimensions, the old implementation gives significantly better performance, so we fall back to it in that case. Below is a performance comparison of `max_pool2d` before and after this PR, obtained from this script: https://github.com/kurtamohler/pytorch-perf-test-scripts/blob/2f02f2bf7ad8e1b80d8eb728612b179d48fe92d7/max_pool_mps/perf.py <details><summary>Click to expand</summary> case | before PR | after PR | speedup | | case info -- | -- | -- | -- | -- | -- 0 | 0.014264 | 0.004473 | 3.188911245 | | (3, 2, 2), {'kernel_size': 2, 'return_indices': True} 1 | 0.010752 | 0.00421 | 2.55391924 | | (3, 2, 2), {'kernel_size': 2, 'return_indices': False} 2 | 0.020777 | 0.006123 | 3.393271272 | | (3, 10, 10), {'kernel_size': 5, 'return_indices': True} 3 | 0.011065 | 0.005759 | 1.921340511 | | (3, 10, 10), {'kernel_size': 5, 'return_indices': False} 4 | 0.01452 | 0.007829 | 1.854642994 | | (3, 100, 100), {'kernel_size': 5, 'return_indices': True} 5 | 0.009258 | 0.007075 | 1.308551237 | | (3, 100, 100), {'kernel_size': 5, 'return_indices': False} 6 | 0.188137 | 0.168688 | 1.115295694 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 0, 'return_indices': True} 7 | 0.161362 | 0.154746 | 1.042753932 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 0, 'return_indices': False} 8 | 0.182883 | 0.16945 | 1.079274122 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 1, 'return_indices': True} 9 | 0.156875 | 0.163346 | 0.9603847049 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 1, 'return_indices': False} 10 | 0.193433 | 0.167396 | 1.155541351 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 2, 'return_indices': True} 11 | 0.158967 | 0.151246 | 1.051049284 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 2, 'return_indices': False} 12 | 0.931071 | 0.932883 | 0.9980576342 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 0, 'return_indices': True} 13 | 0.324496 | 0.3252 | 0.9978351784 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 0, 'return_indices': False} 14 | 0.944071 | 0.936246 | 1.008357846 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 1, 'return_indices': True} 15 | 0.322171 | 0.314854 | 1.023239343 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 1, 'return_indices': False} 16 | 0.894158 | 0.886408 | 1.008743152 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 2, 'return_indices': True} 17 | 0.309338 | 0.304146 | 1.017070749 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 2, 'return_indices': False} 18 | 0.606 | 0.260546 | 2.325884873 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 0, 'return_indices': True} 19 | 0.30445 | 0.231054 | 1.317657344 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 0, 'return_indices': False} 20 | 0.474708 | 0.261925 | 1.812381407 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 1, 'return_indices': True} 21 | 0.23175 | 0.231883 | 0.9994264349 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 1, 'return_indices': False} 22 | 0.434475 | 0.266246 | 1.631855502 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 2, 'return_indices': True} 23 | 0.236942 | 0.231792 | 1.022218196 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 2, 'return_indices': False} 24 | 0.202396 | 0.174888 | 1.157289237 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 0, 'return_indices': True} 25 | 0.160679 | 0.158246 | 1.015374796 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 0, 'return_indices': False} 26 | 0.200354 | 0.184133 | 1.088093932 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 1, 'return_indices': True} 27 | 0.160779 | 0.160679 | 1.000622359 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 1, 'return_indices': False} 28 | 0.199175 | 0.178625 | 1.115045486 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 2, 'return_indices': True} 29 | 0.159458 | 0.160883 | 0.9911426316 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 2, 'return_indices': False} 30 | 0.199021 | 0.165329 | 1.203787599 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 0, 'return_indices': True} 31 | 0.156337 | 0.158213 | 0.9881425673 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 0, 'return_indices': False} 32 | 0.180146 | 0.174483 | 1.032455884 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 1, 'return_indices': True} 33 | 0.156988 | 0.158167 | 0.9925458534 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 1, 'return_indices': False} 34 | 0.182133 | 0.176521 | 1.031792251 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 2, 'return_indices': True} 35 | 0.169042 | 0.156483 | 1.080257919 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 2, 'return_indices': False} 36 | 1.767821 | 1.766254 | 1.000887188 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 0, 'return_indices': True} 37 | 1.059346 | 1.058775 | 1.000539302 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 0, 'return_indices': False} 38 | 1.85755 | 1.859429 | 0.9989894747 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 1, 'return_indices': True} 39 | 1.100417 | 1.097683 | 1.002490701 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 1, 'return_indices': False} 40 | 1.843167 | 1.847558 | 0.9976233493 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 2, 'return_indices': True} 41 | 1.090142 | 1.093163 | 0.9972364597 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 2, 'return_indices': False} 42 | 0.480867 | 0.251733 | 1.910226311 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 0, 'return_indices': True} 43 | 0.319246 | 0.236479 | 1.349997251 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 0, 'return_indices': False} 44 | 0.49315 | 0.256408 | 1.923301925 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 1, 'return_indices': True} 45 | 0.316746 | 0.227854 | 1.390127011 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 1, 'return_indices': False} 46 | 0.4912 | 0.257762 | 1.905633879 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 2, 'return_indices': True} 47 | 0.324771 | 0.229371 | 1.41592006 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 2, 'return_indices': False} 48 | 0.152904 | 0.095079 | 1.608178462 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 0, 'return_indices': True} 49 | 0.102963 | 0.089217 | 1.154073775 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 0, 'return_indices': False} 50 | 0.155158 | 0.095429 | 1.625899884 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 1, 'return_indices': True} 51 | 0.104338 | 0.089979 | 1.15958168 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 1, 'return_indices': False} 52 | 0.153121 | 0.096429 | 1.587914424 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 2, 'return_indices': True} 53 | 0.103642 | 0.090254 | 1.148336916 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 2, 'return_indices': False} 54 | 0.191071 | 0.165125 | 1.157129447 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 0, 'return_indices': True} 55 | 0.153971 | 0.149021 | 1.033216795 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 0, 'return_indices': False} 56 | 0.193192 | 0.166892 | 1.157586942 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 1, 'return_indices': True} 57 | 0.156617 | 0.15215 | 1.029359185 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 1, 'return_indices': False} 58 | 0.178033 | 0.167308 | 1.06410333 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 2, 'return_indices': True} 59 | 0.157425 | 0.164404 | 0.9575496947 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 2, 'return_indices': False} 60 | 1.757638 | 1.750896 | 1.0038506 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 0, 'return_indices': True} 61 | 1.048471 | 1.047967 | 1.000480931 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 0, 'return_indices': False} 62 | 1.790708 | 1.789767 | 1.000525767 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 1, 'return_indices': True} 63 | 1.054575 | 1.054796 | 0.9997904808 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 1, 'return_indices': False} 64 | 1.785837 | 1.784192 | 1.000921986 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 2, 'return_indices': True} 65 | 1.054713 | 1.054492 | 1.00020958 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 2, 'return_indices': False} 66 | 0.478267 | 0.261017 | 1.832321266 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 0, 'return_indices': True} 67 | 0.32005 | 0.226654 | 1.412064204 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 0, 'return_indices': False} 68 | 0.484008 | 0.254721 | 1.900149575 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 1, 'return_indices': True} 69 | 0.321 | 0.218842 | 1.466811672 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 1, 'return_indices': False} 70 | 0.482087 | 0.248771 | 1.937874591 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 2, 'return_indices': True} 71 | 0.316558 | 0.230533 | 1.373156988 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 2, 'return_indices': False} 72 | 0.137842 | 0.085088 | 1.619993419 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 0, 'return_indices': True} 73 | 0.100671 | 0.0769 | 1.309115735 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 0, 'return_indices': False} 74 | 0.148321 | 0.086967 | 1.705485989 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 1, 'return_indices': True} 75 | 0.101392 | 0.075454 | 1.343759112 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 1, 'return_indices': False} 76 | 0.150208 | 0.083742 | 1.793699697 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 2, 'return_indices': True} 77 | 0.099587 | 0.075825 | 1.313379492 | | (3, 1000, 1000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 2, 'return_indices': False} 78 | 0.622546 | 0.602729 | 1.03287879 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 0, 'return_indices': True} 79 | 0.531696 | 0.5067 | 1.049330965 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 0, 'return_indices': False} 80 | 0.626646 | 0.617038 | 1.015571164 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 1, 'return_indices': True} 81 | 0.530354 | 0.525367 | 1.009492412 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 1, 'return_indices': False} 82 | 0.633933 | 0.577775 | 1.097197006 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 2, 'return_indices': True} 83 | 0.533067 | 0.526954 | 1.011600633 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': None, 'padding': 2, 'return_indices': False} 84 | 3.372867 | 3.386412 | 0.9960001914 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 0, 'return_indices': True} 85 | 1.155975 | 1.156604 | 0.9994561665 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 0, 'return_indices': False} 86 | 3.401921 | 3.39755 | 1.001286515 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 1, 'return_indices': True} 87 | 1.202829 | 1.192538 | 1.008629494 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 1, 'return_indices': False} 88 | 3.23675 | 3.220238 | 1.005127571 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 2, 'return_indices': True} 89 | 1.077067 | 1.085613 | 0.9921279498 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 1, 'padding': 2, 'return_indices': False} 90 | 1.572925 | 0.925625 | 1.699311276 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 0, 'return_indices': True} 91 | 0.791204 | 0.793454 | 0.9971642969 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 0, 'return_indices': False} 92 | 1.572742 | 0.922729 | 1.704446268 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 1, 'return_indices': True} 93 | 0.784292 | 0.788871 | 0.9941955022 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 1, 'return_indices': False} 94 | 1.526546 | 0.925708 | 1.649057802 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 2, 'return_indices': True} 95 | 0.769321 | 0.787675 | 0.9766985114 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 2, 'padding': 2, 'return_indices': False} 96 | 0.736033 | 0.612808 | 1.201082558 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 0, 'return_indices': True} 97 | 0.574625 | 0.530925 | 1.082309177 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 0, 'return_indices': False} 98 | 0.722021 | 0.614488 | 1.174996094 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 1, 'return_indices': True} 99 | 0.563171 | 0.533721 | 1.055178642 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 1, 'return_indices': False} 100 | 0.735725 | 0.613992 | 1.198264798 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 2, 'return_indices': True} 101 | 0.583487 | 0.532513 | 1.095723485 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 1, 'stride': 4, 'padding': 2, 'return_indices': False} 102 | 0.656383 | 0.575313 | 1.140914598 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 0, 'return_indices': True} 103 | 0.559796 | 0.509079 | 1.099625009 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 0, 'return_indices': False} 104 | 0.662046 | 0.572362 | 1.156691045 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 1, 'return_indices': True} 105 | 0.552633 | 0.508671 | 1.086425214 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 1, 'return_indices': False} 106 | 0.634108 | 0.574629 | 1.103508525 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 2, 'return_indices': True} 107 | 0.534013 | 0.510996 | 1.045043405 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': None, 'padding': 2, 'return_indices': False} 108 | 7.056642 | 7.066717 | 0.9985743026 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 0, 'return_indices': True} 109 | 4.144275 | 4.142658 | 1.000390329 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 0, 'return_indices': False} 110 | 7.172683 | 7.189867 | 0.9976099697 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 1, 'return_indices': True} 111 | 4.162538 | 4.158875 | 1.000880767 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 1, 'return_indices': False} 112 | 7.194233 | 7.181837 | 1.001726021 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 2, 'return_indices': True} 113 | 4.294083 | 4.196062 | 1.023360236 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 1, 'padding': 2, 'return_indices': False} 114 | 1.875692 | 0.891071 | 2.104986022 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 0, 'return_indices': True} 115 | 1.097479 | 0.781175 | 1.404907991 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 0, 'return_indices': False} 116 | 1.8883 | 0.89015 | 2.121327866 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 1, 'return_indices': True} 117 | 1.101329 | 0.778542 | 1.414604479 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 1, 'return_indices': False} 118 | 1.872833 | 0.893654 | 2.095702587 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 2, 'return_indices': True} 119 | 1.096712 | 0.784579 | 1.397835017 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 2, 'padding': 2, 'return_indices': False} 120 | 0.513029 | 0.374417 | 1.370207549 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 0, 'return_indices': True} 121 | 0.349546 | 0.305763 | 1.143192603 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 0, 'return_indices': False} 122 | 0.518929 | 0.377487 | 1.374693698 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 1, 'return_indices': True} 123 | 0.364662 | 0.3145 | 1.159497615 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 1, 'return_indices': False} 124 | 0.521275 | 0.375242 | 1.389170189 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 2, 'return_indices': True} 125 | 0.367488 | 0.308354 | 1.191773092 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 2, 'stride': 4, 'padding': 2, 'return_indices': False} 126 | 0.652342 | 0.569308 | 1.145850752 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 0, 'return_indices': True} 127 | 0.555696 | 0.506892 | 1.096280865 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 0, 'return_indices': False} 128 | 0.654333 | 0.570367 | 1.147213987 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 1, 'return_indices': True} 129 | 0.548925 | 0.505825 | 1.085207335 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 1, 'return_indices': False} 130 | 0.655908 | 0.571904 | 1.146884792 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 2, 'return_indices': True} 131 | 0.560808 | 0.508238 | 1.103435792 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': None, 'padding': 2, 'return_indices': False} 132 | 6.949462 | 6.949112 | 1.000050366 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 0, 'return_indices': True} 133 | 4.072913 | 4.065013 | 1.001943413 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 0, 'return_indices': False} 134 | 7.200896 | 7.197792 | 1.000431243 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 1, 'return_indices': True} 135 | 4.291367 | 4.218538 | 1.017264038 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 1, 'return_indices': False} 136 | 7.1823 | 7.306933 | 0.9829431856 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 2, 'return_indices': True} 137 | 4.151175 | 4.149592 | 1.000381483 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 1, 'padding': 2, 'return_indices': False} 138 | 1.781279 | 0.884288 | 2.014365229 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 0, 'return_indices': True} 139 | 1.050804 | 0.774362 | 1.356993241 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 0, 'return_indices': False} 140 | 1.860758 | 0.884637 | 2.103414169 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 1, 'return_indices': True} 141 | 1.099908 | 0.775887 | 1.417613647 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 1, 'return_indices': False} 142 | 1.857387 | 0.885738 | 2.096993693 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 2, 'return_indices': True} 143 | 1.105279 | 0.77365 | 1.428655077 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 2, 'padding': 2, 'return_indices': False} 144 | 0.489408 | 0.269583 | 1.815426047 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 0, 'return_indices': True} 145 | 0.322525 | 0.236979 | 1.360985573 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 0, 'return_indices': False} 146 | 0.515475 | 0.265813 | 1.93923924 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 1, 'return_indices': True} 147 | 0.315525 | 0.228146 | 1.382995976 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 1, 'return_indices': False} 148 | 0.503438 | 0.277204 | 1.816128194 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 2, 'return_indices': True} 149 | 0.335421 | 0.228275 | 1.469372467 | | (3, 2000, 2000), {'kernel_size': 5, 'dilation': 4, 'stride': 4, 'padding': 2, 'return_indices': False} 150 | 5.72495 | 4.909554 | 1.166083518 | | (10, 10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': None, 'return_indices': True} 151 | 4.45215 | 4.251333 | 1.047236243 | | (10, 10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': None, 'return_indices': False} 152 | 29.953021 | 29.879879 | 1.002447868 | | (10, 10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': 1, 'return_indices': True} 153 | 9.854683 | 9.839517 | 1.001541336 | | (10, 10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': 1, 'return_indices': False} 154 | 6.178033 | 5.697375 | 1.084364817 | | (10, 10, 1000, 1000), {'kernel_size': 100, 'padding': 50, 'return_indices': True} 155 | 6.280317 | 5.712525 | 1.099394226 | | (10, 10, 1000, 1000), {'kernel_size': 100, 'padding': 50, 'return_indices': False} 156 | 10.256062 | 11.336527 | 0.9046917103 | | (10, 10, 1000, 1000), {'kernel_size': 250, 'padding': 50, 'return_indices': True} 157 | 9.469546 | 11.33705 | 0.8352742556 | | (10, 10, 1000, 1000), {'kernel_size': 250, 'padding': 50, 'return_indices': False} 158 | 0.119087 | 0.0797 | 1.494190715 | | (10, 10, 100, 100), {'kernel_size': 2, 'return_indices': True} 159 | 0.098713 | 0.047173 | 2.092574142 | | (10, 10, 100, 100), {'kernel_size': 2, 'return_indices': False} 160 | 0.960812 | 0.675762 | 1.421820108 | | (10, 10, 300, 300), {'kernel_size': 2, 'return_indices': True} 161 | 0.536546 | 0.485958 | 1.104099531 | | (10, 10, 300, 300), {'kernel_size': 2, 'return_indices': False} 162 | 2.555225 | 1.791567 | 1.426251432 | | (10, 10, 500, 500), {'kernel_size': 2, 'return_indices': True} 163 | 1.419087 | 1.305137 | 1.087308842 | | (10, 10, 500, 500), {'kernel_size': 2, 'return_indices': False} 164 | 5.182008 | 3.48085 | 1.488719135 | | (10, 10, 700, 700), {'kernel_size': 2, 'return_indices': True} 165 | 2.831779 | 2.498537 | 1.133374851 | | (10, 10, 700, 700), {'kernel_size': 2, 'return_indices': False} 166 | 8.546038 | 5.7783 | 1.478988284 | | (10, 10, 900, 900), {'kernel_size': 2, 'return_indices': True} 167 | 4.731004 | 4.161975 | 1.136720908 | | (10, 10, 900, 900), {'kernel_size': 2, 'return_indices': False} 168 | 0.084754 | 0.07435 | 1.139932751 | | (10, 10, 100, 100), {'kernel_size': 2, 'return_indices': True} 169 | 0.057933 | 0.043096 | 1.344277891 | | (10, 10, 100, 100), {'kernel_size': 2, 'return_indices': False} 170 | 2.568592 | 1.802117 | 1.425319222 | | (10, 10, 500, 500), {'kernel_size': 2, 'return_indices': True} 171 | 1.433054 | 1.307342 | 1.096158465 | | (10, 10, 500, 500), {'kernel_size': 2, 'return_indices': False} 172 | 10.3213 | 7.111604 | 1.451332217 | | (10, 10, 1000, 1000), {'kernel_size': 2, 'return_indices': True} 173 | 5.680525 | 5.168129 | 1.099145358 | | (10, 10, 1000, 1000), {'kernel_size': 2, 'return_indices': False} 174 | 1.02255 | 1.01375 | 1.008680641 | | (10, 1000, 1000), {'kernel_size': 2, 'padding': 1, 'stride': 1, 'return_indices': False} 175 | 3.074233 | 3.094383 | 0.993488201 | | (10, 1000, 1000), {'kernel_size': 2, 'padding': 1, 'stride': 1, 'return_indices': True} 176 | 1.016812 | 1.030575 | 0.9866453194 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': 1, 'return_indices': False} 177 | 3.053658 | 3.089504 | 0.9883974903 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': 1, 'return_indices': True} 178 | 1.025863 | 1.032088 | 0.9939685376 | | (10, 1000, 1000), {'kernel_size': 8, 'padding': 1, 'stride': 1, 'return_indices': False} 179 | 3.798942 | 3.799213 | 0.9999286694 | | (10, 1000, 1000), {'kernel_size': 8, 'padding': 1, 'stride': 1, 'return_indices': True} 180 | 4.492979 | 4.493421 | 0.999901634 | | (10, 1000, 1000), {'kernel_size': 16, 'padding': 1, 'stride': 1, 'return_indices': False} 181 | 51.543363 | 51.266204 | 1.005406271 | | (10, 1000, 1000), {'kernel_size': 16, 'padding': 1, 'stride': 1, 'return_indices': True} 182 | 1.018008 | 1.001587 | 1.016394981 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (1, 1), 'return_indices': False} 183 | 3.035404 | 3.003113 | 1.010752509 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (1, 1), 'return_indices': True} 184 | 0.610421 | 0.56 | 1.0900375 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (1, 4), 'return_indices': False} 185 | 1.138983 | 0.757296 | 1.504012962 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (1, 4), 'return_indices': True} 186 | 0.641558 | 0.557808 | 1.150141267 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (4, 1), 'return_indices': False} 187 | 1.181475 | 0.754725 | 1.565437742 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 0, 'stride': (4, 1), 'return_indices': True} 188 | 1.03045 | 1.026904 | 1.003453098 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (1, 1), 'return_indices': False} 189 | 3.041421 | 3.0263 | 1.00499653 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (1, 1), 'return_indices': True} 190 | 0.609929 | 0.572304 | 1.065743032 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (1, 4), 'return_indices': False} 191 | 1.146875 | 0.756446 | 1.516135983 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (1, 4), 'return_indices': True} 192 | 0.645187 | 0.561708 | 1.148616363 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (4, 1), 'return_indices': False} 193 | 1.181721 | 0.758054 | 1.558887625 | | (10, 1000, 1000), {'kernel_size': 4, 'padding': 1, 'stride': (4, 1), 'return_indices': True} 194 | 0.927654 | 0.925946 | 1.0018446 | | (10, 1000, 1000), {'kernel_size': 1, 'return_indices': False} 195 | 2.749983 | 2.740354 | 1.00351378 | | (10, 1000, 1000), {'kernel_size': 1, 'return_indices': True} </details> Pull Request resolved: pytorch#157876 Approved by: https://github.com/malfet
Stack from ghstack (oldest at bottom):
stride != 1#157876This PR updates
max_pool2dto use a Metal kernel instead of the old MPS graph impl. However, when thestrideargument is 1 in all dimensions, the old implementation gives significantly better performance, so we fall back to it in that case. Below is a performance comparison ofmax_pool2dbefore and after this PR, obtained from this script: https://github.com/kurtamohler/pytorch-perf-test-scripts/blob/2f02f2bf7ad8e1b80d8eb728612b179d48fe92d7/max_pool_mps/perf.pyClick to expand