-
-
Notifications
You must be signed in to change notification settings - Fork 3k
[mypyc] Faster creation of a tuple from list/tuple #10217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mypyc] Faster creation of a tuple from list/tuple #10217
Conversation
I went through |
That's right! It's also possible that you can reuse |
There is another implementation on a new personal branch: 97littleleaf11@4f0b92a |
mypyc/irbuild/specialize.py
Outdated
|
||
def set_tuple_item() -> None: | ||
e = builder.accept(gen.left_expr) | ||
index_val = builder.accept(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this works as expected. The naming is pretty confusing, but I think that this is actually a sequence item (x
in for x in my_list
), not the index. Actually for_loop_helper
doesn't seem to give you access to the sequence index. That probably means that for_loop_helper
isn't a good match for this use case, unfortunately.
Directly using ForEnumerate
or maybe ForRange
could be better options. If using ForEnumerate
, you'll have to create a new lvalue for index1
(a register). If using ForRange
, you'll need to use something similar to what transform_assignment_stmt
does to assign the sequence item to the lvalue(s). The key part is this:
target = builder.get_assignment_target(lvalue)
builder.assign(target, rvalue_reg, line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked the generated IR, it seems that it dose work. I will add a few IR test later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed above, can you check that this works for types such as List[str]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I didn't consider that. > <
I would add a new helper function similar to for_loop_helper
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that should work. One option would be to have a similar helper that gives you access both to the index (starting from 0) and the item value. You don't need to support everything that for_loop_helper
supports since this helper would be used in more specialized cases.
r1 = PyList_New(0) | ||
r2 = box(tuple[bool, bool, bool, bool], source) | ||
r3 = PyObject_GetIter(r2) | ||
L1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bugs here.
I found that tuple source
would be recognized as tuple
instead of builtin_tuple
, thus failed in is_tuple_rprimitive (rtype)
in specialize.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above type annotation should help, but it would be better if the specialization also can be made to work with fixed-length tuples.
You could add a check for RTuple
(fixed-length tuple) and also handle that, instead of just supporting variable-length tuples and lists. You may need to coerce it to a variable-length tuple explicitly first, using coerce
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I support RTuple
in this PR?
IR test failed on 32bit |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments (not a full review).
Also, can you evaluate the performance impact using a microbenchmark? For example, test something like this (call f
many times in a loop):
def f(a: List[int]) -> Tuple[int, ...]):
return tuple(x + 1 for x in a)
You can also check if this speeds up the nqueens
benchmark in https://github.com/mypyc/mypyc-benchmarks. It's okay if there's no difference here, but it would be great if there is.
The above repository has a benchmark runner that can make it easier to measure performance. Just add a microbenchmark and run it using the runbench.py
tool (no need to create a PR, you can experiment locally).
r1 = PyList_New(0) | ||
r2 = box(tuple[bool, bool, bool, bool], source) | ||
r3 = PyObject_GetIter(r2) | ||
L1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above type annotation should help, but it would be better if the specialization also can be made to work with fixed-length tuples.
You could add a check for RTuple
(fixed-length tuple) and also handle that, instead of just supporting variable-length tuples and lists. You may need to coerce it to a variable-length tuple explicitly first, using coerce
.
I kind of feel this issue is less good-first-issue. But that's good, you get to know more aspects about the codebase and workflow :D |
@benchmark
def tuple_build_test() -> None:
a = []
for i in range(1000):
a.append(i)
for i in range(1000):
b = tuple(x + 1 for x in a) this branch: interpreted: 0.059748s (avg of 16 iterations; stdev 1.2%) compiled is 3.325x faster master: interpreted: 0.058382s (avg of 17 iterations; stdev 0.96%) compiled is 2.258x faster |
you can use the resolve conversation button to make the PR page more concise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! The performance improvement is great, and we could get even more performance by inlining some more operations (no need to do it now, this is already god). This is almost ready.
Left a few more comments, mostly minor.
return r11 | ||
|
||
|
||
[case testTupleBuiltFromList] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a test where we execute the code, in mypyc/test-data/run-tuples.test
. You can add more sub-cases to the existing testTupleOps
test. Note that it uses the default driver, so you can just add test_<whatever>
function(s). Test iterating over a list, a tuple and an unsupported iterable (e.g. set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JukkaL ! I just add a test function including iterating over a list
, a tuple
and a RTuple
(unsupported).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all minor comments addressed, I think it's LGTM now! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the final updates!
Description
mypyc: mypyc/mypyc#769