-
Notifications
You must be signed in to change notification settings - Fork 1.6k
<regex>: Correct character translation in icase and collate mode
#5553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<regex>: Correct character translation in icase and collate mode
#5553
Conversation
…ed case for non-collating ranges
|
I realized later that the tests didn't cover two cases related character ranges in
For the idempotent translation functions used in the test, this means we need code points with values >= 0x200 for the lower-case variants, so I had to make use of wide strings for the two additional tests. |
|
@muellerj2 |
That's relatively easy to say: Most of the fixed bugs are or were specific to some subclasses of regexes (rarely used syntax options, specific escapes, The bug fixed by this PR, for example, would only be observed if the traits classed provided (a) a non-trivial (or worse a non-idempotent) |
|
Thanks! 😻 I pushed cosmetic changes. |
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
|
Thanks for fixing this bug, obscure though it may be! 🐞 🕵️ 😻 |
In
icaseandcollatemode, characters are supposed to be passed throughtranslate_nocase()andtranslate()of the traits class before comparing them. This PR makes sure we always apply these translations exactly once before any character or string comparison.The test deliberately defines non-idempotent translation functions. They are very weird, but they make it easy to catch any repeated or unbalanced applications of these functions before character or string comparisons.
This PR also replaces
_Cmp_csbyequal_to. While this means for now that a static call is replaced by a non-static one, there is already machinery in place to recognize whenequal_tocan be vectorized, and vectorization will help us significantly improve performance in one of the following PRs.