`<regex>`: Process positive lookahead assertions non-recursively #5714

muellerj2 · 2025-09-10T22:15:58Z

Towards #997 and #1528. While the PR title describes the main observable effect, the actual main change is the implementation of manual unwinding of stack frames in _Match_pat. But at least one recursive call had to be replaced by the manual stack management to validate the main change, and positive assertions turned out be the easiest option.

Previously, _Match_pat mainly consisted of an NFA interpreter loop. After this PR, _Match_pat consists of two main loops: An NFA interpreter loop and a stack unwinding loop (joined together by a loop surrounding both). As before this PR, the interpreter loop in _Match_pat processes the nodes in the NFA. When this loop is done because some final node of the NFA has been reached or matching along a trajectory failed, a second loop is entered that manually unwinds the explicit state stack. If that second loop leads to some interpretable position in the NFA again (i.e., if _Nx becomes not null), the unwinding loop is exited and the interpreter loop is engaged again. If not and the stack has been fully unwound, _Match_pat is exited.

Because the NFA node following the node _Nx in the interpreter loop might have to be some node other than _Nx->_Next from now on, a new local variable _Next now stores the next node to process, which gets assigned to the correct next node for a each node type in the switch if it's not _Nx->_Next. In this PR specifically, this is used to make static_cast<_Node_assert*>(_Nx)->_Child follow a node of type _N_assert in the interpreter loop. (Additionally, final nodes in the interpreter loop that do not result in failure assign nullptr to _Next rather than _Nx now.)

The stack unwinding uses its own set of operation codes, which are interpreted by the unwinding loop. The operation codes are stored in the stack frames at the time the new frame is pushed to the stack. (After this PR, there is only one code, but there will soon be more.)

Because the matcher is currently semi-recursive, the stack counts as unwound in _Match_pat if it has been unwound up to its size at the time the _Match_pat call started. Further unwinding will happen in a surrounding _Matcher_pat call. (We can simplify this when the matcher is finally fully non-recursive.)

The stack frames on the heap are now represented by objects of type _Rx_state_frame_t. Currently, this is a very inefficient structure and it will become even worse in the next few PRs before it starts getting better.

For now, I opted to exactly preserve the situations when regex_errors with error_stack or error_complexity get thrown. But I moved the related code to their own member functions to avoid unnecessary code duplication. We can think about changing this after the matcher has been made fully non-recursive.

…nonrecursively

StephanTLavavej · 2025-09-17T21:08:57Z

Thanks as always for the detailed explanation! 😻

StephanTLavavej · 2025-09-19T18:43:11Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej · 2025-09-22T17:21:06Z

Thanks for taking the first step towards this long-thought-impossible goal! 😻 🏃 🎉

muellerj2 requested a review from a team as a code owner September 10, 2025 22:15

github-project-automation bot moved this to Initial Review in STL Code Reviews Sep 10, 2025

github-project-automation bot added this to STL Code Reviews Sep 10, 2025

<regex>: Process positive lookahead assertions non-recursively

a066a6b

muellerj2 force-pushed the regex-process-positive-lookahead-assertions-nonrecursively branch from 524a683 to a066a6b Compare September 10, 2025 22:22

StephanTLavavej added bug Something isn't working regex meow is a substring of homeowner labels Sep 11, 2025

StephanTLavavej self-assigned this Sep 11, 2025

Merge branch 'main' into regex-process-positive-lookahead-assertions-…

f66f5bb

…nonrecursively

StephanTLavavej approved these changes Sep 17, 2025

View reviewed changes

StephanTLavavej removed their assignment Sep 17, 2025

StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Sep 17, 2025

StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Sep 19, 2025

StephanTLavavej merged commit 3d2b494 into microsoft:main Sep 22, 2025
39 checks passed

github-project-automation bot moved this from Merging to Done in STL Code Reviews Sep 22, 2025

StephanTLavavej added enhancement Something can be improved and removed bug Something isn't working labels Sep 22, 2025

StephanTLavavej mentioned this pull request Sep 22, 2025

libcxx: Flaky timing assumption in std/thread/thread.semaphore/timed.pass.cpp #5733

Closed

muellerj2 mentioned this pull request Sep 26, 2025

<regex>: Process disjunctions non-recursively #5745

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`<regex>`: Process positive lookahead assertions non-recursively #5714

`<regex>`: Process positive lookahead assertions non-recursively #5714

Uh oh!

muellerj2 commented Sep 10, 2025 •

edited

Loading

Uh oh!

StephanTLavavej commented Sep 17, 2025

Uh oh!

StephanTLavavej commented Sep 19, 2025

Uh oh!

Uh oh!

StephanTLavavej commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

<regex>: Process positive lookahead assertions non-recursively #5714

<regex>: Process positive lookahead assertions non-recursively #5714

Uh oh!

Conversation

muellerj2 commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StephanTLavavej commented Sep 17, 2025

Uh oh!

StephanTLavavej commented Sep 19, 2025

Uh oh!

Uh oh!

StephanTLavavej commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`<regex>`: Process positive lookahead assertions non-recursively #5714

`<regex>`: Process positive lookahead assertions non-recursively #5714

muellerj2 commented Sep 10, 2025 •

edited

Loading