KEMBAR78
Unicode 16 by JMazurkiewicz · Pull Request #5571 · microsoft/STL · GitHub
Skip to content

Conversation

@JMazurkiewicz
Copy link
Contributor

  • Implement generation of the Indic_Conjunct_Break table in the unicode_properties_data_gen.py script. We need this table to implement the new segmentation rule.
  • Update __msvc_format_ucd_tables.hpp.
  • Implement GB9c segmentation rule (added in Unicode 15.1) in _Grapheme_break_property_iterator.
  • Update tests.

Previous PR: #3556

@JMazurkiewicz JMazurkiewicz requested a review from a team as a code owner June 5, 2025 21:43
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Jun 5, 2025
@StephanTLavavej StephanTLavavej added enhancement Something can be improved format C++20/23 format labels Jun 5, 2025
@StephanTLavavej StephanTLavavej self-assigned this Jun 5, 2025
@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented Jun 21, 2025

FWIW, this is also related to CWG-2843 which is recently accepted. No change requested - Unicode 16 is definitely OK as the resolution requires Unicode 15.1 as the minimum version.

…eak_property_iterator2`

This is gaining a new data member `_GB9c_regex _GB9c_rx;`.
…tring_prefix_iterator_utf2`

This contains `_Grapheme_break_property_iterator2<_CharT> _WrappedIter;`.

This is the final part. The alias `_Measure_string_prefix_iterator` can expand to this type,
but aliases don't need to be renamed. The only use of that alias is within the function
`_Measure_string_prefix()`, so renaming these two classes is sufficient to preserve ABI.
@StephanTLavavej
Copy link
Member

Thanks! 😻 And apologies for taking over 4 months to review this.

I pushed a conflict-free (albeit large) merge with main, a couple of nitpick commits, and a couple of class renames to preserve ABI.

I verified that the product and test code are exactly generated by the scripts (after clang-formatting).

@StephanTLavavej StephanTLavavej removed their assignment Oct 16, 2025
@StephanTLavavej

This comment was marked as resolved.

@github-project-automation github-project-automation bot moved this from Initial Review to Done in STL Code Reviews Oct 16, 2025
@github-project-automation github-project-automation bot moved this from Done to Initial Review in STL Code Reviews Oct 16, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Oct 16, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Oct 16, 2025
@AraHaan
Copy link

AraHaan commented Oct 17, 2025

Can we get a PR for Unicode 17 soon as well please?

@cpplearner
Copy link
Contributor

cpplearner commented Oct 17, 2025

Can we get a PR for Unicode 17 soon as well please?

You can do it yourself. Just run the scripts under tools/unicode_properties_parse (download first, then the other two scripts), and replace the affected contents (entire stl/inc/__msvc_format_ucd_tables.hpp, part of tests/std/tests/P0645R10_text_formatting_grapheme_clusterization/test.cpp) with corresponding tool output.

@StephanTLavavej StephanTLavavej merged commit 7f05724 into microsoft:main Oct 17, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Oct 17, 2025
@StephanTLavavej
Copy link
Member

Thanks for this highly nontrivial PR! 🐱 🐈 🐈‍⬛

@JMazurkiewicz JMazurkiewicz deleted the unicode16 branch October 17, 2025 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Something can be improved format C++20/23 format

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants