[wasm] Introduce jiterpreter control flow pass #83247
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a control flow graph (CFG) pass to the jiterpreter that runs after the initial code generation pass. Things that are currently generated inline like branch target blocks and branches are now recorded in a list of segments, and in a second pass all the segments are stitched together with the necessary webassembly flow control logic inserted inbetween. This allows turning forward branches into direct jumps and turns backward branches into a direct jump paired with a table dispatch. It is theoretically possible to avoid the table dispatch for backward branches, but I'm not smart enough to figure out how to do it in a general way :-)
This should provide large speedups for traces that contain many branch targets, since right now we pay the cost of an eip check for each branch target. The cfg is able to omit all of those checks. For traces containing backward branches the existence of the dispatch table means we still have overhead there, but it's not as bad.
This will probably regress startup time slightly (as visible in the Page Show timing, though I think that is probably noise) due to the second pass and the overhead in tracking segments, but it's possible to optimize that.
Initial browser-bench measurements, compared vs main: