-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Optimize core loop of _Count_vbool
#5640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize core loop of _Count_vbool
#5640
Conversation
|
All CI jobs are failing with "The agent did not connect within the alloted time of 45 minute(s)." 🤔 |
|
If that happens, ping me on |
|
Can you post benchmark results before / after? |
|
I mean intuitively it looks like an improvement, but without benchmarking I wouldn't be sure that the compiler doesn't do something clever on its own. This PR optimization is not what compiler is likely to do, but it might be able to achieve the same by duplication the loop for I don't think it is very likely to happen, and that's why we implemented the |
|
I took a look at the code and it looked good to me, but I would also like to see benchmark results before slightly complicating the code in this way. |
|
@localspook Do you have time to add a benchmark here? Alternatively I would accept an independent PR from another contributor adding a benchmark. Apologies for how long this has taken, I have been severely overloaded, otherwise I would have added a benchmark myself. |
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
|
Thanks for noticing and improving this performance! 🚀 😸 🎉 |
To handle the difference between counting ones and counting zeros,
_Count_vbool's core loop conditionally complements every block it processes: that's O(n) work. We can do better: we can unconditionally count ones, then, after the loop, if we actually wanted zeros, subtract the number of ones from the number of bits we processed: O(1) work.