<regex>: Avoid generating unnecessary single-branch _N_if nodes
#5539
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The parser likes to generate single-branch
_N_ifnodes in the NFA for no good reason. These nodes have a noticeable runtime cost in the matcher. This PR avoids generating these nodes.This change has side effects:
error_stackexception before will match successfully after this PR._Matcher2::_Do_rep()calls can now take the place previously occupied by recursive calls of_Matcher2::_Do_if()._Matcher2::_Do_rep()takes more bytes from the stack (I didn't check recently, but I remember it was about 100 bytes per call?), so there will be a very small set of input strings that will run into an actual stack overflow now instead of anerror_stackexception.Without further changes, there would be more side effects because some loops would get classified as simple now. However, I preempted this reclassification by adjusting
_Parser2::_Calculate_loop_simplicity(). There are two reasons for this.error_stackexception before, and it's more important to make more regexes resistant against stack overflow. For example, if we were to allow the reclassification, this would fix the stack overflow for the regex in <regex>: Grouping within repetition causes regex stack error #997.I added no new tests because (a) there is no obvious additional set of regexes to test and (b) this change affects the generated NFAs for all regexes without exception (all of them contained at least one single-branch
_N_ifnode before), so the whole existing test suite serves as test coverage for this change.Benchmark