KEMBAR78
[mypyc] Use native integers for some sequence indexing operations by JukkaL · Pull Request #19426 · python/mypy · GitHub
Skip to content

Conversation

JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Jul 11, 2025

For example, when iterating over a list, now we use a native integer
for the index (which is not exposed to the user). Previously we used
tagged integers, but in these use cases they provide no real benefit.

This simplifies the IR and should slightly improve performance, as fewer
tagged int to native int conversions are needed.

Multiple ops have to be migrated in one go, as these interact with
each other, and by only changing a subset of them would actually
generate more verbose IR, as a bunch of extra coercions would be
needed.

List of impacted statements:

  • For loop over sequence
  • Assignment like x, y = a for tuple/list rvalue
  • Dict iteration
  • List comprehension

For example, consider this example:

def foo(a: list[int]) -> None:
    for x in a:
        pass

Old generated IR was like this:

def foo(a):
    a :: list
    r0 :: short_int
    r1 :: ptr
    r2 :: native_int
    r3 :: short_int
    r4 :: bit
    r5 :: native_int
    r6, r7 :: ptr
    r8 :: native_int
    r9 :: ptr
    r10 :: object
    r11 :: int
    r12 :: short_int
    r13 :: None
L0:
    r0 = 0
L1:
    r1 = get_element_ptr a ob_size :: PyVarObject
    r2 = load_mem r1 :: native_int*
    r3 = r2 << 1
    r4 = r0 < r3 :: signed
    if r4 goto L2 else goto L5 :: bool
L2:
    r5 = r0 >> 1
    r6 = get_element_ptr a ob_item :: PyListObject
    r7 = load_mem r6 :: ptr*
    r8 = r5 * 8
    r9 = r7 + r8
    r10 = load_mem r9 :: builtins.object*
    inc_ref r10
    r11 = unbox(int, r10)
    dec_ref r10
    if is_error(r11) goto L6 (error at foo:2) else goto L3
L3:
    dec_ref r11 :: int
L4:
    r12 = r0 + 2
    r0 = r12
    goto L1
L5:
    return 1
L6:
    r13 = <error> :: None
    return r13

Now the generated IR is simpler:

def foo(a):
    a :: list
    r0 :: native_int
    r1 :: ptr
    r2 :: native_int
    r3 :: bit
    r4, r5 :: ptr
    r6 :: native_int
    r7 :: ptr
    r8 :: object
    r9 :: int
    r10 :: native_int
    r11 :: None
L0:
    r0 = 0
L1:
    r1 = get_element_ptr a ob_size :: PyVarObject
    r2 = load_mem r1 :: native_int*
    r3 = r0 < r2 :: signed
    if r3 goto L2 else goto L5 :: bool
L2:
    r4 = get_element_ptr a ob_item :: PyListObject
    r5 = load_mem r4 :: ptr*
    r6 = r0 * 8
    r7 = r5 + r6
    r8 = load_mem r7 :: builtins.object*
    inc_ref r8
    r9 = unbox(int, r8)
    dec_ref r8
    if is_error(r9) goto L6 (error at foo:2) else goto L3
L3:
    dec_ref r9 :: int
L4:
    r10 = r0 + 1
    r0 = r10
    goto L1
L5:
    return 1
L6:
    r11 = <error> :: None
    return r11

JukkaL and others added 4 commits July 11, 2025 14:27
For example, when iterating over a list, now we use a native integer
for the index (which is not exposed to the user). Previously we used
tagged integers, but in these use cases they provide no real benefit.

This simplifies the IR and should slightly improve performance, as fewer
tagged int to native int conversions are needed.

Multiple ops have to be migrated in one go, as these interact with
each other, and by only changing a subset of them would actually
generate more verbose IR, as a bunch of extra coercions would be
needed.

List of impacted statements:
 * For loop over sequence
 * Assignment like `x, y = a` for tuple/list rvalue
 * Dict iteration
 * List comprehension

For example, consider this example:
```
def foo(a: list[int]) -> None:
    for x in a:
        pass
```

Old generated IR was like this:
```
def foo(a):
    a :: list
    r0 :: short_int
    r1 :: ptr
    r2 :: native_int
    r3 :: short_int
    r4 :: bit
    r5 :: native_int
    r6, r7 :: ptr
    r8 :: native_int
    r9 :: ptr
    r10 :: object
    r11 :: int
    r12 :: short_int
    r13 :: None
L0:
    r0 = 0
L1:
    r1 = get_element_ptr a ob_size :: PyVarObject
    r2 = load_mem r1 :: native_int*
    r3 = r2 << 1
    r4 = r0 < r3 :: signed
    if r4 goto L2 else goto L5 :: bool
L2:
    r5 = r0 >> 1
    r6 = get_element_ptr a ob_item :: PyListObject
    r7 = load_mem r6 :: ptr*
    r8 = r5 * 8
    r9 = r7 + r8
    r10 = load_mem r9 :: builtins.object*
    inc_ref r10
    r11 = unbox(int, r10)
    dec_ref r10
    if is_error(r11) goto L6 (error at foo:2) else goto L3
L3:
    dec_ref r11 :: int
L4:
    r12 = r0 + 2
    r0 = r12
    goto L1
L5:
    return 1
L6:
    r13 = <error> :: None
    return r13
```

Now the generated IR is simpler:
```
def foo(a):
    a :: list
    r0 :: native_int
    r1 :: ptr
    r2 :: native_int
    r3 :: bit
    r4, r5 :: ptr
    r6 :: native_int
    r7 :: ptr
    r8 :: object
    r9 :: int
    r10 :: native_int
    r11 :: None
L0:
    r0 = 0
L1:
    r1 = get_element_ptr a ob_size :: PyVarObject
    r2 = load_mem r1 :: native_int*
    r3 = r0 < r2 :: signed
    if r3 goto L2 else goto L5 :: bool
L2:
    r4 = get_element_ptr a ob_item :: PyListObject
    r5 = load_mem r4 :: ptr*
    r6 = r0 * 8
    r7 = r5 + r6
    r8 = load_mem r7 :: builtins.object*
    inc_ref r8
    r9 = unbox(int, r8)
    dec_ref r8
    if is_error(r9) goto L6 (error at foo:2) else goto L3
L3:
    dec_ref r9 :: int
L4:
    r10 = r0 + 1
    r0 = r10
    goto L1
L5:
    return 1
L6:
    r11 = <error> :: None
    return r11
```
@JukkaL JukkaL merged commit db67888 into master Jul 11, 2025
13 checks passed
@JukkaL JukkaL deleted the mypyc-for-loop-index branch July 11, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants