-
Notifications
You must be signed in to change notification settings - Fork 1.6k
<regex>: regex_traits::transform_primary should yield primary sort keys appropriate for the imbued locale
#5444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<regex>: regex_traits::transform_primary should yield primary sort keys appropriate for the imbued locale
#5444
Conversation
…t keys appropriate for the imbued locale
|
Thanks! 😻 I pushed some fixes, the most significant being
|
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
|
As foretold by the ancient prophecy, I had to push a commit to fix |
|
I resolved a trivial adjacent-add conflict with #5437 in |
|
Thanks for implementing this new LWG issue and fixing this ancient bug! 🐈⬛ 💚 🎉 |
Fixes #5435. Fixes #5291.
The actual work is done in two new functions
__std_regex_transform_primary_char/wchar_t, which are basically 1:1 copies of_Strxfrm()and_Wcsxfrm()but pass different flags to__crtLCMapStringA/W. I also took the liberty to correct the SAL annotations.__crtLCMapStringA/Ware declared inawint.hppwhich includesyvals.h. I'm uncertain if this is the best approach, but I undefined_ENFORCE_ONLY_CORE_HEADERSso thatawint.hppcan be included.transform_primaryhas to check the types of the collate facets using RTTI, so I made the function always returns an empty string when dynamic RTTI is disabled/_CPPRTTIis undefined. The implementation itself is heavily based oncollate::do_transform(including the change in #5431). It also needs access to the internals ofcollate, so I made_Regex_traitsa friend of it.There is a behavior change for the C locale: As I explained in more detail in #5435, the traits requirement in [re.req]/20 is actually misleading, since it is wrong for precisely one locale: the C locale (or the POSIX locale, see the collation order definition here: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02_06). Since the equivalence classes are derived from POSIX and the definition of
regex_traits::transform_primaryalso alludes to "primary sort keys" which indirectly reference terminology from the POSIX standard (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02), I think we should do as POSIX says: "A" should not match[[=a=]].This has consequences:
<regex>: Properly parse and match collating symbols and equivalences #5392, I assumed [re.req]/20, so I didn't add any character translation usingtranslateandtranslate_nocasewhen parsing equivalences. Now we have to add such logic in_Parser::_Do_ex_class2to handle potentially case-sensitive sort keys when case-insensitive regexes are used (else "A" would even fail to match[[=A=]]).Since matching and parsing of equivalences no longer go through
collate::transform, related tests no longer have to be skipped under IDL mismatch.