-
Notifications
You must be signed in to change notification settings - Fork 1.6k
<regex>: Process minimum number of reps in simple loops non-recursively
#5762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<regex>: Process minimum number of reps in simple loops non-recursively
#5762
Conversation
| _Failed = !_Do_rep0(_Nr, (_Nr->_Flags & _Fl_greedy) != 0); | ||
| _Next = nullptr; | ||
| } | ||
| } else { // internal _Match_pat(_Node->_Next) call in _Do_rep0() | ||
| _Next = nullptr; | ||
| } | ||
| } else { | ||
| _Failed = !_Do_rep(_Nr, (_Nr->_Flags & _Fl_greedy) != 0, _Sav._Loop_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible followup, no change requested for this PR: There are now two mentions of (_Nr->_Flags & _Fl_greedy) != 0. I'm assuming this doesn't change, so we should consider extracting it as bool _Greedy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably a good idea. I just didn't spend too much thought on making the recursive calls look pretty because I intend to remove them soon. If the logic for simple or generic loops mostly remains separate, though, this expression will still appear in two places.
|
Thank you as always for the extremely detailed description which will be helpful in understanding this refactoring later! 😻 |
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
💚 🐈 🐱 |
Towards #997 and #1528.
Simple loops are a subset of all loops with the following properties:
_Do_ifnode, no character class matching collating elements of various sizes).Properties 1-3 mean that we don't have to worry about restoring loop state: Whenever we enter the loop, we can just overwrite it because the data relates to a trajectory the processing of which is complete. Moreover, the repeated pattern is branchless, so it can't happen that we try some first branch in the repeated pattern, have to backtrack and then have to restore the loop state (especially any counters) before trying another branch in the repeated pattern.
As long as we haven't reached the minimum number of reps, we must match more repetitions, so we do not try to match the tail of the regex. Thus, there is also no branch after matching a repetition until the minimum is reached. This means that we don't have to add any stack frames with non-trivial unwinding logic that try the other branch or restore the loop state during backtracking.
For now, this results in the following implementation to match the minimum number of reps in a simple loop:
_N_repjust sets up the loop state for the first repetition if at least one repetition is necessary. No unwinding logic is added to restore the loop state because the state relates to an outdated trajectory (or is just the initial state). If this isn't a simple loop or a simple one which doesn't have to match at least one rep, we essentially delegate to the old code._N_end_rep, we first check whether this is a simple loop and whether we are still processing the minimum number of reps._Do_rep()._Match_pat(_Node->_Next)call in_Do_rep0(), so we have to keep doing what_N_end_repdid previously in this case: Set_Nexttonullptrso that we exit_Match_pat()._Nextto the start of the repeated pattern._Do_rep0()code.We don't have to run any stack unwinding logic during backtracking to process the minimum number of reps of a simple loop, but we still have to add a stack frame to save the match state at the time when the simple loop was entered. For this reason, we add a new opcode for doing nothing during stack unwinding. Since we can leave this stack frame to the standard unwinding logic in
_Match_pat()now, we can remove all explicit_Pop_frame()calls in_Do_rep0().Repetitions of simple loops didn't really increase the stack usage counter (other than some temporary increase by 1) prior to this PR, so to reduce complexity I have decided to not replicate the updates to the stack usage counter in the new logic. Even so, this PR replicates the updates to the complexity counter. (This PR also adds an update to the complexity counter to the logic of
_Disjunction_eval_alt_alwayswhich was missing from #5745.)