KEMBAR78
[mypyc] Simplify generated code for native attribute get by JukkaL · Pull Request #11978 · python/mypy · GitHub
Skip to content

Conversation

JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Jan 12, 2022

Originally c.b generated code like this:

    cpy_r_r0 = ((t1___CObject *)cpy_r_c)->_b;
    if (unlikely(((t1___CObject *)cpy_r_c)->_b == NULL)) {
        PyErr_SetString(PyExc_AttributeError, "attribute 'b' of 'C' undefined");
    } else {
        CPy_INCREF(((t1___CObject *)cpy_r_c)->_b);
    }
    if (unlikely(cpy_r_r0 == NULL)) {
        CPy_AddTraceback("t/t1.py", "foobar", 6, CPyStatic_globals);
        goto CPyL2;
    }

Now we generate code like this:

    cpy_r_r0 = ((t1___CObject *)cpy_r_c)->_b;
    if (unlikely(cpy_r_r0 == NULL)) {
        CPy_AttributeError("t/t1.py", "foobar", "C", "b", 6, CPyStatic_globals);
        goto CPyL2;
    }
    CPy_INCREF(cpy_r_r0);

The implementation merges consecutive GetAttr and Branch ops.

The main benefit is that this makes the generated code smaller and easier to read,
making it easier to spot possible improvements in the code.

Benchmark results seem to be roughly the same as before, since the original
code could be optimized well by C compilers. I tried richards, deltablue and hexiom
benchmarks using clang and gcc, and the performance delta is from -2% to 2%,
and the differences look pretty random.

body.emit_label(block)
visitor.next_block = next_block
for op in block.ops:
ops = block.ops
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably use a comment of what we are keeping track of here now that the loop is a lot more complex

self.rare = False
self.next_block: Optional[BasicBlock] = None
# If not None, the following op is a branch (to allow op merging)
self.next_branch: Optional[Branch] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we're going to go down the route of having the emitter visitor do peephole opts, we should implement it for the general case instead this special case.

One approach that follows this in relying on state in the visitor would be:

  • Stash the current block's ops in a block_ops attribute or something
  • And the next loop index into block_idx
    Then the top level emit loop just uses visitor.block_idx as its loop counter and doesn't need any knowledge of when optimizations trigger, and the peephole code in the emitter just looks at self.block_ops[self.block_idx] directly and then increments it if the optimization fires.

Another similar approach would be to do that but to have the visitor return Optional[int] to indicate skipping instructions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed your idea and I think that it looks cleaner now.

@msullivan
Copy link
Collaborator

The new output looks so much nicer.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Mar 26, 2022

The Windows failures don't seem specific to this PR, since everything seems to be failing right now (#12460), so I will go ahead and merge this.

@JukkaL JukkaL merged commit c755928 into master Mar 26, 2022
@JukkaL JukkaL deleted the mypyc-getattr branch March 26, 2022 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants