-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Vectorize find-like algorithms for Clang for more types #5767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8ae3cc4 to
3e6f674
Compare
3e6f674 to
5e6c38f
Compare
Co-authored-by: S. B. Tam <cpplearner@outlook.com>
|
Thanks! 😻 I really like the removal of all of the 64-bit/32-bit special cases for pointers, and the I had to think for a moment why |
Ah, I initially added the template parameter to pass obviously-cromulent values, but in benchmark results it looked ugly and long, it made table cells multiline, so I abandoned that, preferring neat benchmark results over clarity, but didn't undo the template parameters change. |
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
🎉 😻 🐈 |
Resolves #5479.
⚙️ Optimization
_Is_same_and_builtin_trivially_equality_comparablein another place. Make that part common, as it repeats._Could_compare_equal_to_value_typeto returntruefor the same types. This also handlesstd::byteconcisely._Find_arg_cast. Which only widens or truncates integers, and uses_Bit_castfor the rest. Also makes pointers non-special.That's it!
🏁 Benchmark
The benchmark shows that:
Only
find_and_countought to be enough. It is sufficient to show that the dispatch works.To see that other algorithms (
search_n,replace,remove,remove_copy) also benefit from the change, use their existing benchmark, andset CXXFLAGS=/arch:AVX2 /D_USE_STD_VECTOR_ALGORITHMS=0to imitate non-optimization on integer types, and thenset CXXFLAGS=/arch:AVX2 /D_USE_STD_VECTOR_ALGORITHMS=1to see the optimization results. The newly vectorized types should perform the same, as corresponding integer types.⏱️ Benchmark results
It was run with
set CXXFLAGS=/arch:AVX2to compete with Clang auto-vectorization fairly: the algorithms detect AVX2 at runtime.bm<point, not_highly_aligned_allocator, Op::FindSized>/8021/3056bm<point, not_highly_aligned_allocator, Op::FindSized>/63/62bm<point, not_highly_aligned_allocator, Op::FindSized>/31/30bm<point, not_highly_aligned_allocator, Op::FindSized>/15/14bm<point, not_highly_aligned_allocator, Op::FindSized>/7/6bm<point, not_highly_aligned_allocator, Op::Count>/8021/3056bm<point, not_highly_aligned_allocator, Op::Count>/63/62bm<point, not_highly_aligned_allocator, Op::Count>/31/30bm<point, not_highly_aligned_allocator, Op::Count>/15/14bm<point, not_highly_aligned_allocator, Op::Count>/7/6