<regex>: Process positive lookahead assertions non-recursively
#5714
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Towards #997 and #1528. While the PR title describes the main observable effect, the actual main change is the implementation of manual unwinding of stack frames in
_Match_pat. But at least one recursive call had to be replaced by the manual stack management to validate the main change, and positive assertions turned out be the easiest option.Previously,
_Match_patmainly consisted of an NFA interpreter loop. After this PR,_Match_patconsists of two main loops: An NFA interpreter loop and a stack unwinding loop (joined together by a loop surrounding both). As before this PR, the interpreter loop in_Match_patprocesses the nodes in the NFA. When this loop is done because some final node of the NFA has been reached or matching along a trajectory failed, a second loop is entered that manually unwinds the explicit state stack. If that second loop leads to some interpretable position in the NFA again (i.e., if_Nxbecomes not null), the unwinding loop is exited and the interpreter loop is engaged again. If not and the stack has been fully unwound,_Match_patis exited.Because the NFA node following the node
_Nxin the interpreter loop might have to be some node other than_Nx->_Nextfrom now on, a new local variable_Nextnow stores the next node to process, which gets assigned to the correct next node for a each node type in theswitchif it's not_Nx->_Next. In this PR specifically, this is used to makestatic_cast<_Node_assert*>(_Nx)->_Childfollow a node of type_N_assertin the interpreter loop. (Additionally, final nodes in the interpreter loop that do not result in failure assignnullptrto_Nextrather than_Nxnow.)The stack unwinding uses its own set of operation codes, which are interpreted by the unwinding loop. The operation codes are stored in the stack frames at the time the new frame is pushed to the stack. (After this PR, there is only one code, but there will soon be more.)
Because the matcher is currently semi-recursive, the stack counts as unwound in
_Match_patif it has been unwound up to its size at the time the_Match_patcall started. Further unwinding will happen in a surrounding_Matcher_patcall. (We can simplify this when the matcher is finally fully non-recursive.)The stack frames on the heap are now represented by objects of type
_Rx_state_frame_t. Currently, this is a very inefficient structure and it will become even worse in the next few PRs before it starts getting better.For now, I opted to exactly preserve the situations when
regex_errors witherror_stackorerror_complexityget thrown. But I moved the related code to their own member functions to avoid unnecessary code duplication. We can think about changing this after the matcher has been made fully non-recursive.