KEMBAR78
`<regex>`: Process disjunctions non-recursively by muellerj2 · Pull Request #5745 · microsoft/STL · GitHub
Skip to content

Conversation

@muellerj2
Copy link
Contributor

@muellerj2 muellerj2 commented Sep 26, 2025

Towards #997 and #1528.

We have to handle disjunctions (represented by _N_if nodes) differently depending on longest mode: When _Longest is true, we have to evaluate all alternatives during unwinding no matter if they matched successfully or not. Otherwise, we always may (and usually must) stop evaluating alternatives after one succeeded. The former is implemented by the case _Disjunction_eval_alt_always, the latter by _Disjunction_eval_alt_on_failure. We can implement the latter case by falling through to the former.

During unwinding, _Frame already holds the correct values to evaluate further alternatives in the disjunctions except for the NFA node it points to. So we can just update the node pointer and "push" _Frame on the stack again by incrementing _Frames_count.

In longest mode, _Match_pat() now returns false when matching the last evaluated alternative failed, even if some alternative evaluated earlier matched successfully. _Matcher3::_Match() can deal with this behavior change by checking the member _Matched as well, which is set to true in longest mode when the first successful match was found. (Note that in longest mode, the return value of _Match_pat() mostly doesn't matter when possible trajectories are evaluated by the matcher. The only exceptions are the _Match_pat(_Node->_Next) calls found in _Do_rep0(), which handles simple loops. But the return values of these calls are unaffected by this change in the evaluation of _N_if nodes because simple loops are guaranteed to be branchless.)

@muellerj2 muellerj2 requested a review from a team as a code owner September 26, 2025 19:47
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Sep 26, 2025
@AlexGuteniev
Copy link
Contributor

What is the performance impact?

@muellerj2
Copy link
Contributor Author

There has been no noticeable performance impact due to changes since #5714, as far as I can tell: The regex_search benchmarks and test runs seem to take basically the same amount of time.

But there will be a major change soon: Eliminating _Do_rep0() should generally improve performance for greedy simple loops (think a*), in some cases drastically.

@StephanTLavavej StephanTLavavej added enhancement Something can be improved regex meow is a substring of homeowner labels Sep 27, 2025
@StephanTLavavej StephanTLavavej self-assigned this Sep 27, 2025
@StephanTLavavej StephanTLavavej removed their assignment Sep 27, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Sep 27, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Oct 3, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit ec67f7b into microsoft:main Oct 4, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Oct 4, 2025
@StephanTLavavej
Copy link
Member

😻 💚 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Something can be improved regex meow is a substring of homeowner

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants