KEMBAR78
Branching design for Tier 2 (uops) interpreter · Issue #106529 · python/cpython · GitHub
Skip to content

Branching design for Tier 2 (uops) interpreter #106529

@gvanrossum

Description

@gvanrossum

This issue is part of the larger epic of gh-104584. In PR gh-106393 I tried to implement branching, but it was premature. Here's a better design, following @markshannon's guidance.

We have the following jump instructions (not counting the instrumented versions):

Unconditional jumps:

  • JUMP_BACKWARD
  • JUMP_BACKWARD_NO_INTERRUPT
  • JUMP_FORWARD

Branches, a.k.a. conditional jumps:

  • POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE, POP_JUMP_IF_NONE, POP_JUMP_IF_NOT_NONE

  • FOR_ITER's specializations:

    • FOR_ITER_LIST
    • FOR_ITER_TUPLE
    • FOR_ITER_RANGE
  • FOR_ITER_GEN

  • SEND

  • Add counters to to POP_JUMP_IF_{FALSE,TRUE} to determine likeliness

The translation strategies could be as follows:

Unconditional jumps

JUMP_BACKWARD

  • If this jumps to exactly the top of the current trace, emit a Tier 2 JUMP_TO_TOP uop, and stop projecting (i.e., exit the trace generation loop). The JUMP_TO_TOP uop implementation should include a CHECK_EVAL_BREAKER call.
  • If this jumps anywhere else, emit a SAVE_IP uop with the destination of the jump, followed by an EXIT_TRACE uop, and stop projecting.

JUMP_BACKWARD_NO_INTERRUPT

  • Since this is typically only used in special circumstances, just emit a SAVE_IP instruction with the destination and an EXIT_TRACE uop, and stop projecting.
  • Alternatively, we could make CHECK_EVAL_BREAKER a separate UOP that is inserted for JUMP_BACKWARD but not for JUMP_BACKWARD_NO_INTERRUPT, and otherwise treat the two backward jumps the same.

JUMP_FORWARD

  • Emit a SAVE_IP uop with the destination of the jump, and continue projecting from there (i.e. set instr to the destination of the jump).

Conditional jumps (branches)

POP_JUMP_IF_FALSE and friends

Consider the following Python code:

if cond:
    block
rest

This translates roughly to the following Tier 1 bytecode (using B1, B2, ... to label Tier 1 instructions, and <cond>, <block> etc. to represent code blocks that evaluate or execute the corresponding Python fragments):

B1: <cond>
B2: POP_JUMP_IF_FALSE B4
B3: <block>
B4: <rest>
B5:

I propose the following translation into Tier 2 uops, assuming the branch is "unlikely":

    SAVE_IP B1
    <cond>
    SAVE_IP B2
    JUMP_IF_FALSE stub
    POP_TOP
    SAVE_IP B3
    <block>
    SAVE_IP B4
    EXIT_TRACE

stub:
    POP_TOP
    SAVE_IP B4
    EXIT_TRACE

Where JUMP_IF_FALSE inspects the top of stack but doesn't pop it, and has an argument that executes a jump in the Tier 2 uop instruction sequence.

If the branch is "likely", we do this instead:

    SAVE_IP B1
    <cond>
    SAVE_IP B2
    JUMP_IF_TRUE stub
    POP_TOP
    SAVE_IP B4
    <rest>
    SAVE_IP B5
    EXIT_TRACE

stub:
    POP_TOP
    SAVE_IP B3
    EXIT_TRACE

Note how in this case, <rest> is projected as part of the trace, while <block> is not, since the likely case is that we jump over <block> to <rest>.

For the other simple conditional jumps (POP_JUMP_IF_TRUE, POP_JUMP_IF_NONE, POP_JUMP_IF_NOT_NONE) we do the same: if the jump is unlikely, emit a JUMP_IF_XXX uop and a stub; if the jump is likely, emit the inverse JUMP_IF_NOT_XXX uop and a different stub, and continue projecting at the destination of the original jump bytecode.

I propose to have hand-written cases both in the superblock generator and in the Tier 2 interpreter for these, since the translations are too irregular to fit easily in the macro expansion data structure. The stub generation will require additional logic and data structures in translate_bytecode_to_trace() to keep track of the stubs required so far, the available space for those, and the back-patching required to set the operands for the JUMP_IF_XXX uops.

FOR_ITER and (especially) its specializations

The common case for these is not to jump. The bytecode definitions are too complex to duplicate in hand-written Tier 2 uops. My proposal is to change these in bytecodes.c so that, instead of using the JUMPBY(n) macro, they use JUMPBY_POP_DISPATCH(n), which in Tier 1 translates into just JUMPBY(n), but in Tier 2 translates into roughly

frame->prev_instr += (x);
PY_DECREF(stack_pointer[-1]);
stack_pointer -= 1;
goto exit;

thereby exiting the trace when the corresponding for-loop terminates.

I am assuming here that most loops have several iterations. I don't think it's worth special-casing the occasional for-loop that almost always immediately terminates.

SEND

Possibly we could treat this the same as FOR_ITER. But right now I propose to just punt here, and when we encounter it, stop projecting, as we do with any other unsupported bytecode instruction.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions