KEMBAR78
Optimize `std::transform` for `vector<bool>` by AlexGuteniev · Pull Request #5769 · microsoft/STL · GitHub
Skip to content

Conversation

@AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Oct 7, 2025

Towards #625, specifically #625 (comment) items 1 and 2.

🦖 Optimization

When a standard functor, either transparent or integer-specialized, is passed to transform, along with all vector<bool> iterators, map that functor to a bitwise one to operate on the underlying type.

The mapping is done via template specialization, and not via if constexpr to make the dispatch working fine without <functional> included and functors defined.

Only do this for zero offset. Supporting all possible offset combination is much complexity for a little gain. Remember copy.

Extract pointers from iterators to help the compiler auto-vectorize. Yes, it does not auto-vectorize when using the whole iterators. Auto-vecotrization needs simplest ways of implementing loops.

Don't call transform again, to avoid unnecessary recursion, the operation is simple.

Don't process tails explicitly, yield to the existing loop for now.
Actually lets go for it, it is not that hard. Process tails with applying bit mask.

Don't do ranges yet. Other vector<bool> optimizations don't do them either. It is getting complicated, so instead of doing ranges separately, need to look into #1754 at last.

🏁 Benchmark

Feed the randomizer with some seed to make the inputs different 🐦

Since (auto-)vectorization is (expected to be) engaged, use alignment controlling allocator.

⏱️ Benchmark results

Benchmark Before After Speedup
transform_two_inputs_aligned<logical_and<>>/64 108 ns 2.55 ns 42.4
transform_two_inputs_aligned<logical_and<>>/4096 13869 ns 9.44 ns 1470
transform_two_inputs_aligned<logical_and<>>/65536 416424 ns 115 ns 3620
transform_two_inputs_aligned<logical_or<>>/64 123 ns 2.59 ns 47.40
transform_two_inputs_aligned<logical_or<>>/4096 14377 ns 9.07 ns 1590
transform_two_inputs_aligned<logical_or<>>/65536 409012 ns 112 ns 3650
transform_one_input_aligned<logical_not<>>/64 83.7 ns 2.14 ns 39.10
transform_one_input_aligned<logical_not<>>/4096 6891 ns 7.28 ns 947
transform_one_input_aligned<logical_not<>>/65536 264957 ns 82.7 ns 3200

@StephanTLavavej StephanTLavavej added the performance Must go faster label Oct 8, 2025
@StephanTLavavej StephanTLavavej self-assigned this Oct 8, 2025
}

template <class _VbIt, class _OutIt, class _Mapped_fn>
_CONSTEXPR20 _OutIt _Transform_vbool_aligned(
Copy link
Contributor Author

@AlexGuteniev AlexGuteniev Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this out to <vector> from <algorithms> because other algorithms are moved out.
However, I'm not sure if it is useful.

For accessing vector<bool> representation it is not strictly necessary. Most of things are template-dependent member functions and datas. The only exception is _Vbase, which can be still deduced from iterators.

For throughput it does not look useful either. <vector> is more frequent than <algorithm> so it appears more useful to off-load <vector> instead.

For reference, 0f24d45 is the commit where this movement was made.

@StephanTLavavej
Copy link
Member

Thanks! I pushed a conflict-free merge with main and fixes for two regressions, plus test coverage to catch them.

@StephanTLavavej StephanTLavavej removed their assignment Oct 21, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Oct 21, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Oct 22, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 249cb01 into microsoft:main Oct 22, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Oct 22, 2025
@StephanTLavavej
Copy link
Member

🦖 🦕 🐔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants