Vectorize find-like algorithms for Clang for more types #5767

AlexGuteniev · 2025-10-05T19:28:44Z

Resolves #5479.

⚙️ Optimization

Use _Is_same_and_builtin_trivially_equality_comparable in another place. Make that part common, as it repeats.
Adjust _Could_compare_equal_to_value_type to return true for the same types. This also handles std::byte concisely.
Adjust casts in vectorized dispatch to use _Find_arg_cast. Which only widens or truncates integers, and uses _Bit_cast for the rest. Also makes pointers non-special.

That's it!

🏁 Benchmark

The benchmark shows that:

The changes indeed dispatch to vectorized implementation, and
Clang auto-vectorization either does not apply, or is less efficient than manual vectorization

Only find_and_count ought to be enough. It is sufficient to show that the dispatch works.

To see that other algorithms (search_n, replace, remove, remove_copy) also benefit from the change, use their existing benchmark, and set CXXFLAGS=/arch:AVX2 /D_USE_STD_VECTOR_ALGORITHMS=0 to imitate non-optimization on integer types, and then set CXXFLAGS=/arch:AVX2 /D_USE_STD_VECTOR_ALGORITHMS=1 to see the optimization results. The newly vectorized types should perform the same, as corresponding integer types.

⏱️ Benchmark results

It was run with set CXXFLAGS=/arch:AVX2 to compete with Clang auto-vectorization fairly: the algorithms detect AVX2 at runtime.

Benchmark	Before	After	Speedup
`bm<point, not_highly_aligned_allocator, Op::FindSized>/8021/3056`	749 ns	157 ns	4.77
`bm<point, not_highly_aligned_allocator, Op::FindSized>/63/62`	23.1 ns	3.66 ns	6.31
`bm<point, not_highly_aligned_allocator, Op::FindSized>/31/30`	7.58 ns	2.24 ns	3.38
`bm<point, not_highly_aligned_allocator, Op::FindSized>/15/14`	3.76 ns	1.72 ns	2.19
`bm<point, not_highly_aligned_allocator, Op::FindSized>/7/6`	1.81 ns	1.76 ns	1.03
`bm<point, not_highly_aligned_allocator, Op::Count>/8021/3056`	1174 ns	326 ns	3.60
`bm<point, not_highly_aligned_allocator, Op::Count>/63/62`	10.2 ns	4.03 ns	2.53
`bm<point, not_highly_aligned_allocator, Op::Count>/31/30`	5.58 ns	3.13 ns	1.78
`bm<point, not_highly_aligned_allocator, Op::Count>/15/14`	3.14 ns	2.89 ns	1.09
`bm<point, not_highly_aligned_allocator, Op::Count>/7/6`	2.20 ns	2.89 ns	0.76

stl/inc/xutility

Co-authored-by: S. B. Tam <cpplearner@outlook.com>

StephanTLavavej · 2025-10-21T12:27:38Z

Thanks! 😻 I really like the removal of all of the 64-bit/32-bit special cases for pointers, and the is_same generalization that supersedes checking for byte. I pushed a conflict-free merge with main and trivial comment cleanups.

I had to think for a moment why T{'0'} and T{'1'} were cromulent for the aggregate, but decided that it didn't need code changes or DMIs.

AlexGuteniev · 2025-10-21T12:45:15Z

I had to think for a moment why T{'0'} and T{'1'} were cromulent for the aggregate, but decided that it didn't need code changes or DMIs.

Ah, I initially added the template parameter to pass obviously-cromulent values, but in benchmark results it looked ugly and long, it made table cells multiline, so I abandoned that, preferring neat benchmark results over clarity, but didn't undo the template parameters change.

StephanTLavavej · 2025-10-22T08:41:35Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej · 2025-10-22T14:31:56Z

🎉 😻 🐈

benchmark

39897eb

AlexGuteniev requested a review from a team as a code owner October 5, 2025 19:28

github-project-automation bot added this to STL Code Reviews Oct 5, 2025

github-project-automation bot moved this to Initial Review in STL Code Reviews Oct 5, 2025

AlexGuteniev force-pushed the trivial-find branch from 8ae3cc4 to 3e6f674 Compare October 5, 2025 19:51

optimization

5e6c38f

AlexGuteniev force-pushed the trivial-find branch from 3e6f674 to 5e6c38f Compare October 6, 2025 04:59

cpplearner reviewed Oct 6, 2025

View reviewed changes

stl/inc/xutility Outdated Show resolved Hide resolved

Add missing _STD

13a008f

Co-authored-by: S. B. Tam <cpplearner@outlook.com>

StephanTLavavej added the performance Must go faster label Oct 6, 2025

StephanTLavavej self-assigned this Oct 6, 2025

StephanTLavavej added 2 commits October 21, 2025 03:39

Merge branch 'main' into trivial-find

ada3da8

Comment cleanups.

b94949b

StephanTLavavej approved these changes Oct 21, 2025

View reviewed changes

StephanTLavavej removed their assignment Oct 21, 2025

StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Oct 21, 2025

StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Oct 22, 2025

StephanTLavavej merged commit ea857c7 into microsoft:main Oct 22, 2025
39 checks passed

github-project-automation bot moved this from Merging to Done in STL Code Reviews Oct 22, 2025

AlexGuteniev deleted the trivial-find branch October 22, 2025 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vectorize find-like algorithms for Clang for more types #5767

Vectorize find-like algorithms for Clang for more types #5767

AlexGuteniev commented Oct 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

StephanTLavavej commented Oct 21, 2025

Uh oh!

AlexGuteniev commented Oct 21, 2025

Uh oh!

StephanTLavavej commented Oct 22, 2025

Uh oh!

Uh oh!

StephanTLavavej commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Vectorize find-like algorithms for Clang for more types #5767

Vectorize find-like algorithms for Clang for more types #5767

Conversation

AlexGuteniev commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚙️ Optimization

🏁 Benchmark

⏱️ Benchmark results

Uh oh!

Uh oh!

StephanTLavavej commented Oct 21, 2025

Uh oh!

AlexGuteniev commented Oct 21, 2025

Uh oh!

StephanTLavavej commented Oct 22, 2025

Uh oh!

Uh oh!

StephanTLavavej commented Oct 22, 2025

🎉 😻 🐈

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlexGuteniev commented Oct 5, 2025 •

edited

Loading