-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Rework ProbabilisticMap character checks in SearchValues #101001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework ProbabilisticMap character checks in SearchValues #101001
Conversation
Tagging subscribers to this area: @dotnet/area-system-buffers |
Are all these well covered in the perf repo? |
/azp run runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Yes, mainly by the Results
|
src/libraries/System.Private.CoreLib/src/System/SearchValues/ProbabilisticMapState.cs
Show resolved
Hide resolved
a496f8e
to
d9f3052
Compare
9a65ad9
to
f196644
Compare
Improvements on arm64: dotnet/perf-autofiling-issues#34817 |
* Rework ProbabilisticMap character checks in SearchValues * Reduce footprint of ProbMap SearchValues * Update misleading comment
Contributes to #100315 (comment)
The probabilistic map uses an
O(i * m)
fallback for-Except
methods (scan all the values for each character in the input).We use the same
O(m)
helper (scan the values) to confirm potential matches in the vectorized helper.This PR adds support for the probmap to use an
O(1)
contains check for-Except
methods and match confirmations.The check is effectively a perfect hash check
entries[value % entries.Length] == value
, but implemented using a variant ofFastMod
. These checks can be inlined into the vectorized methods while not consuming too much memory.We use this only when the probmap is computed as part of
SearchValues
as finding an optimal modulus is relatively expensive (seeProbabilisticMapState.FindModulus
- there are probably smarter ways to go about it). When the probabilistic map is created for single-useIndexOfAny
operations, we still use the sameO(m)
checks as before.This PR also replaces the
Latin1CharSearchValues
implementation withBitmapCharSearchValues
, which can use a bitmap of arbitrary size (not limited to[0, 255]
). We use this when we guess that it'll be faster than theProbabilisticMap
(e.g. there are a lot of values in the set, or the set is dense and the probmap isn't vectorized).I haven't changed the condition when we use
ProbabilisticWithAsciiCharSearchValues
as there are plausible cases where the ASCII fast path may still be useful even if the probabilistic path could be a bitmap instead.Improvements for early matches (cheaper confirmation step)
Improvements for IndexOfAnyExcept (O(1) instead of O(m) character checks)
Speedup factor is going to depend on how many values are in the set and how early in the
values
each input character matched with the previous implementation.With the new implementation, the throughput is less dependent on the haystack.
The bitmap vs probmap hash O(1) checks
This happens to bring all
SearchValues<char>
implementations to anO(i)
worst-case (fromO(i * m)
).