[mypyc] Faster creation of a tuple from list/tuple #10217

97littleleaf11 · 2021-03-16T06:03:17Z

Description

mypyc/irbuild/specialize.py

97littleleaf11 · 2021-03-16T06:13:17Z

I went through translate_list_comprehension and comprehension_helper. I should write similar code to fill in the blank tuple after I get tuple_op, is it correct?

JukkaL · 2021-03-16T11:33:12Z

I went through translate_list_comprehension and comprehension_helper. I should write similar code to fill in the blank tuple after I get tuple_op, is it correct?

That's right! It's also possible that you can reuse ForSequence (or ForEnumerate / ForRange) to simplify the generation of the for loop (not sure though whether it's worth it).

mypyc/irbuild/ll_builder.py

mypyc/irbuild/specialize.py

mypyc/primitives/tuple_ops.py

mypyc/irbuild/specialize.py

97littleleaf11 · 2021-03-17T07:54:03Z

There is another implementation on a new personal branch: 97littleleaf11@4f0b92a

mypyc/irbuild/ll_builder.py

mypyc/primitives/tuple_ops.py

mypyc/irbuild/specialize.py

JukkaL · 2021-03-17T11:48:34Z

mypyc/irbuild/specialize.py

+
+            def set_tuple_item() -> None:
+                e = builder.accept(gen.left_expr)
+                index_val = builder.accept(index)


I don't think that this works as expected. The naming is pretty confusing, but I think that this is actually a sequence item (x in for x in my_list), not the index. Actually for_loop_helper doesn't seem to give you access to the sequence index. That probably means that for_loop_helper isn't a good match for this use case, unfortunately.

Directly using ForEnumerate or maybe ForRange could be better options. If using ForEnumerate, you'll have to create a new lvalue for index1 (a register). If using ForRange, you'll need to use something similar to what transform_assignment_stmt does to assign the sequence item to the lvalue(s). The key part is this:

target = builder.get_assignment_target(lvalue) builder.assign(target, rvalue_reg, line)

I have checked the generated IR, it seems that it dose work. I will add a few IR test later.

As discussed above, can you check that this works for types such as List[str]?

Well, I didn't consider that. > <
I would add a new helper function similar to for_loop_helper.

Yeah, that should work. One option would be to have a similar helper that gives you access both to the index (starting from 0) and the item value. You don't need to support everything that for_loop_helper supports since this helper would be used in more specialized cases.

mypyc/primitives/generic_ops.py

mypyc/irbuild/for_helpers.py

97littleleaf11 · 2021-03-18T01:27:35Z

mypyc/test-data/irbuild-tuple.test

+    r1 = PyList_New(0)
+    r2 = box(tuple[bool, bool, bool, bool], source)
+    r3 = PyObject_GetIter(r2)
+L1:


Bugs here.
I found that tuple source would be recognized as tuple instead of builtin_tuple, thus failed in is_tuple_rprimitive (rtype) in specialize.py

The above type annotation should help, but it would be better if the specialization also can be made to work with fixed-length tuples.

You could add a check for RTuple (fixed-length tuple) and also handle that, instead of just supporting variable-length tuples and lists. You may need to coerce it to a variable-length tuple explicitly first, using coerce.

Should I support RTuple in this PR?

97littleleaf11 · 2021-03-18T04:04:40Z

IR test failed on 32bit

TH3CHARLie · 2021-03-18T10:47:28Z

c_pyssize_t has platform-specific size so we use native_int in IR output and it will be replaced with either int32 or int64 in test textual matching. So a quick fix is to replace every int64 in your code with native_int. Later you can try to improve mypyc/mypyc#776 to solve this some-what ugly workflow

JukkaL

Left some comments (not a full review).

Also, can you evaluate the performance impact using a microbenchmark? For example, test something like this (call f many times in a loop):

def f(a: List[int]) -> Tuple[int, ...]):
    return tuple(x + 1 for x in a)

You can also check if this speeds up the nqueens benchmark in https://github.com/mypyc/mypyc-benchmarks. It's okay if there's no difference here, but it would be great if there is.

The above repository has a benchmark runner that can make it easier to measure performance. Just add a microbenchmark and run it using the runbench.py tool (no need to create a PR, you can experiment locally).

mypyc/test-data/irbuild-tuple.test

JukkaL · 2021-03-18T11:07:40Z

mypyc/test-data/irbuild-tuple.test

+    r1 = PyList_New(0)
+    r2 = box(tuple[bool, bool, bool, bool], source)
+    r3 = PyObject_GetIter(r2)
+L1:


The above type annotation should help, but it would be better if the specialization also can be made to work with fixed-length tuples.

You could add a check for RTuple (fixed-length tuple) and also handle that, instead of just supporting variable-length tuples and lists. You may need to coerce it to a variable-length tuple explicitly first, using coerce.

mypyc/irbuild/for_helpers.py

TH3CHARLie · 2021-03-18T11:18:26Z

I kind of feel this issue is less good-first-issue. But that's good, you get to know more aspects about the codebase and workflow :D

97littleleaf11 · 2021-03-18T13:10:02Z

@benchmark
def tuple_build_test() -> None:
    a = []
    for i in range(1000):
        a.append(i)
    for i in range(1000):
        b = tuple(x + 1 for x in a)

this branch:

interpreted: 0.059748s (avg of 16 iterations; stdev 1.2%)
compiled: 0.017971s (avg of 16 iterations; stdev 3.6%)

compiled is 3.325x faster

master:

interpreted: 0.058382s (avg of 17 iterations; stdev 0.96%)
compiled: 0.025852s (avg of 17 iterations; stdev 2.5%)

compiled is 2.258x faster

TH3CHARLie · 2021-03-18T13:28:30Z

you can use the resolve conversation button to make the PR page more concise

JukkaL

Thanks for the updates! The performance improvement is great, and we could get even more performance by inlining some more operations (no need to do it now, this is already god). This is almost ready.

Left a few more comments, mostly minor.

mypyc/irbuild/for_helpers.py

mypyc/irbuild/ll_builder.py

mypyc/irbuild/specialize.py

mypyc/lib-rt/tuple_ops.c

JukkaL · 2021-03-21T11:02:58Z

mypyc/test-data/irbuild-tuple.test

    return r11
+
+
+[case testTupleBuiltFromList]


Also add a test where we execute the code, in mypyc/test-data/run-tuples.test. You can add more sub-cases to the existing testTupleOps test. Note that it uses the default driver, so you can just add test_<whatever> function(s). Test iterating over a list, a tuple and an unsupported iterable (e.g. set).

Thanks @JukkaL ! I just add a test function including iterating over a list, a tuple and a RTuple(unsupported).

TH3CHARLie

With all minor comments addressed, I think it's LGTM now! Thanks!

JukkaL

Thanks for the final updates!

Add new_tuple in tuple_ops

fc24ee9