<regex>: Simplify matching of _N_if NFA nodes in leftmost-longest mode
#5405
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since we settled on some reasonable semantics for leftmost-longest matching in #5218, I think we should remove the code for some other (abandoned) attempt to implement the leftmost-longest rule in
_Matcher::_Do_if: An attempt to setTgt_stateto the leftmost-longest match found under this_N_ifnode.This is neither necessary (in leftmost-longest mode, we take the final result from
_Res, not_Tgt_state) nor sufficient (it assigns one of the longest matches to_Tgt_state, but if there are several of the same length it doesn't necessarily pick the correct one among these matches).On the other hand, this saves 96 bytes of stack space per call to
_Do_ifin debug mode, slightly alleviating the stack overflow issues (#997, #1528).Potentially leaving
_Tgt_statein a garbage state is fine here, because none of the functions further up on the callstack rely on its value:_Match_patin the_N_ifcase: Will just return immediately to its caller without changing_Tgt_state._Do_rep0: Can't be a caller of_Match_patbecause a repetition containing an_N_ifnode anywhere is not simple._Do_assert/_Do_neg_assert: Can't be the callers of_Match_patbecause there are no lookahead assertions in the POSIX grammars that demand application of the leftmost-longest rule. (But even if there were lookahead assertions -- notwithstanding their currently unknown semantics -- it would be fine in the sense that the matcher wouldn't crash:_Do_assertwould reset the position pointer to a savepoint, while_Do_neg_assertwould fail, resulting in_Match_patreturning immediately to its caller so that the remaining analysis here applies. Even so, there is the issue that_Do_assertwould not reset the capture groups in_Tgt_state-- but we don't know what to set them to either as long as we don't know the semantics of such assertions. Nevertheless, all "valid" capture groups would still point to legal ranges in the input, so even the matcher with this PR's change wouldn't crash. This means we wouldn't have to worry about a newer parser emiting assertion nodes because running them with the old matcher would at worst produce wrong results. To get correct semantics, an updated parser and matcher are required, but this would also be the case if we didn't do this PR's change.)_Do_if: Will either reset_Tgt_stateto some savepoint before calling_Match_pator immediately return to its caller and leave_Tgt_stateas-is._Do_rep: Will either reset_Tgt_stateto some savepoint before doing the next_Match_patcall or return to its caller while leaving_Tgt_stateas-is._Match_patin the_N_repor_N_end_repcases: Will just return immediately to its caller without changing_Tgt_state._Match: Will evaluate_Resin leftmost-longest mode, not_Tgt_state. (_Matchonly evaluted_Resbefore<regex>: Fix depth-first and leftmost-longest matching rules #5218 as well, so this change also doesn't pose a problem if old and new functions are mixed.)So in all cases, either
Tgt_stateisn't evaluated anymore or it is reset to some savepoint before it is used again.