KEMBAR78
Optimize core loop of `_Count_vbool` by localspook · Pull Request #5640 · microsoft/STL · GitHub
Skip to content

Conversation

@localspook
Copy link
Contributor

To handle the difference between counting ones and counting zeros, _Count_vbool's core loop conditionally complements every block it processes: that's O(n) work. We can do better: we can unconditionally count ones, then, after the loop, if we actually wanted zeros, subtract the number of ones from the number of bits we processed: O(1) work.

@localspook localspook requested a review from a team as a code owner July 9, 2025 17:18
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Jul 9, 2025
@localspook
Copy link
Contributor Author

All CI jobs are failing with "The agent did not connect within the alloted time of 45 minute(s)." 🤔

@StephanTLavavej
Copy link
Member

If that happens, ping me on #stalled-checks on the STL Discord and I can rerun the checks. Unfortunately, while our CI is highly reliable when it runs, we're at the mercy of Azure deciding to give us any VMs, and it's been rough lately.

@StephanTLavavej StephanTLavavej self-assigned this Jul 9, 2025
@StephanTLavavej StephanTLavavej added the performance Must go faster label Jul 9, 2025
@AlexGuteniev
Copy link
Contributor

Can you post benchmark results before / after?
(I guess there is such benchmark already)

@AlexGuteniev
Copy link
Contributor

I mean intuitively it looks like an improvement, but without benchmarking I wouldn't be sure that the compiler doesn't do something clever on its own.

This PR optimization is not what compiler is likely to do, but it might be able to achieve the same by duplication the loop for *_VbFirst and ~*_VbFirst and making the condition out of loop.

I don't think it is very likely to happen, and that's why we implemented the _Select_popcount_impl thing, but would be good to prove with measurement.

@StephanTLavavej
Copy link
Member

I took a look at the code and it looked good to me, but I would also like to see benchmark results before slightly complicating the code in this way.

@StephanTLavavej StephanTLavavej removed their assignment Aug 13, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Work In Progress in STL Code Reviews Aug 13, 2025
@StephanTLavavej
Copy link
Member

@localspook Do you have time to add a benchmark here?

Alternatively I would accept an independent PR from another contributor adding a benchmark.

Apologies for how long this has taken, I have been severely overloaded, otherwise I would have added a benchmark myself.

@StephanTLavavej StephanTLavavej moved this from Work In Progress to Ready To Merge in STL Code Reviews Aug 22, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Aug 25, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit abcac64 into microsoft:main Aug 25, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Aug 25, 2025
@StephanTLavavej
Copy link
Member

Thanks for noticing and improving this performance! 🚀 😸 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants