KEMBAR78
[mypyc] `set(f(x) for x in it)` no longer uses list comprehension (#771) by mehdigmira · Pull Request #9707 · python/mypy · GitHub
Skip to content

Conversation

@mehdigmira
Copy link

Description

Solves mypyc/mypyc#771

Test Plan

Added statements to the run-sets.test file

@mehdigmira mehdigmira force-pushed the faster-comprehension-for-sets branch from 2bf5498 to 0c33ec8 Compare November 8, 2020 19:13
@TH3CHARLie
Copy link
Collaborator

Looks good, would you mind posting the generated code before/after the patch so we can make sure of its usefulness. (You can find the compiled C code in _native.c)

@TH3CHARLie TH3CHARLie self-requested a review November 9, 2020 13:31
@mehdigmira
Copy link
Author

For the following code

def f(n: int) -> None:
  x=set(i for i in range(n))

Master branch produces

char CPyDef_f(CPyTagged cpy_r_n) {
    PyObject *cpy_r_r0;
    CPyTagged cpy_r_r1;
    CPyTagged cpy_r_i;
    char cpy_r_r2;
    int64_t cpy_r_r3;
    char cpy_r_r4;
    int64_t cpy_r_r5;
    char cpy_r_r6;
    char cpy_r_r7;
    char cpy_r_r8;
    char cpy_r_r9;
    PyObject *cpy_r_r10;
    int32_t cpy_r_r11;
    char cpy_r_r12;
    CPyTagged cpy_r_r13;
    PyObject *cpy_r_r14;
    PyObject *cpy_r_x;
    char cpy_r_r15;
CPyL0: ;
    cpy_r_r0 = PyList_New(0);
    if (unlikely(cpy_r_r0 == NULL)) {
        CPy_AddTraceback("test.py", "f", 2, CPyStatic_globals);
        goto CPyL10;
    } else
        goto CPyL1;
CPyL1: ;
    cpy_r_r1 = 0;
    CPyTagged_IncRef(cpy_r_r1);
    cpy_r_i = cpy_r_r1;
    goto CPyL2;
CPyL2: ;
    cpy_r_r3 = cpy_r_r1 & 1;
    cpy_r_r4 = cpy_r_r3 == 0;
    cpy_r_r5 = cpy_r_n & 1;
    cpy_r_r6 = cpy_r_r5 == 0;
    cpy_r_r7 = cpy_r_r4 & cpy_r_r6;
    if (cpy_r_r7) {
        goto CPyL3;
    } else
        goto CPyL4;
CPyL3: ;
    cpy_r_r8 = (Py_ssize_t)cpy_r_r1 < (Py_ssize_t)cpy_r_n;
    cpy_r_r2 = cpy_r_r8;
    goto CPyL5;
CPyL4: ;
    cpy_r_r9 = CPyTagged_IsLt_(cpy_r_r1, cpy_r_n);
    cpy_r_r2 = cpy_r_r9;
    goto CPyL5;
CPyL5: ;
    if (cpy_r_r2) {
        goto CPyL6;
    } else
        goto CPyL11;
CPyL6: ;
    cpy_r_r10 = CPyTagged_StealAsObject(cpy_r_i);
    cpy_r_r11 = PyList_Append(cpy_r_r0, cpy_r_r10);
    CPy_DecRef(cpy_r_r10);
    cpy_r_r12 = cpy_r_r11 >= 0;
    if (unlikely(!cpy_r_r12)) {
        CPy_AddTraceback("test.py", "f", 2, CPyStatic_globals);
        goto CPyL12;
    } else
        goto CPyL7;
CPyL7: ;
    cpy_r_r13 = CPyTagged_Add(cpy_r_r1, 2);
    CPyTagged_DecRef(cpy_r_r1);
    CPyTagged_IncRef(cpy_r_r13);
    cpy_r_r1 = cpy_r_r13;
    cpy_r_i = cpy_r_r13;
    goto CPyL2;
CPyL8: ;
    cpy_r_r14 = PySet_New(cpy_r_r0);
    CPy_DecRef(cpy_r_r0);
    if (unlikely(cpy_r_r14 == NULL)) {
        CPy_AddTraceback("test.py", "f", 2, CPyStatic_globals);
        goto CPyL10;
    } else
        goto CPyL9;
CPyL9: ;
    cpy_r_x = cpy_r_r14;
    CPy_DecRef(cpy_r_x);
    return 1;
CPyL10: ;
    cpy_r_r15 = 2;
    return cpy_r_r15;
CPyL11: ;
    CPyTagged_DecRef(cpy_r_r1);
    CPyTagged_DecRef(cpy_r_i);
    goto CPyL8;
CPyL12: ;
    CPy_DecRef(cpy_r_r0);
    CPyTagged_DecRef(cpy_r_r1);
    goto CPyL10;
}

This branch produces

char CPyDef_f(CPyTagged cpy_r_n) {
    PyObject *cpy_r_r0;
    CPyTagged cpy_r_r1;
    CPyTagged cpy_r_i;
    char cpy_r_r2;
    int64_t cpy_r_r3;
    char cpy_r_r4;
    int64_t cpy_r_r5;
    char cpy_r_r6;
    char cpy_r_r7;
    char cpy_r_r8;
    char cpy_r_r9;
    PyObject *cpy_r_r10;
    int32_t cpy_r_r11;
    char cpy_r_r12;
    CPyTagged cpy_r_r13;
    PyObject *cpy_r_r14;
    PyObject *cpy_r_x;
    char cpy_r_r15;
CPyL0: ;
    cpy_r_r0 = PySet_New(NULL);
    if (unlikely(cpy_r_r0 == NULL)) {
        CPy_AddTraceback("test.py", "f", 2, CPyStatic_globals);
        goto CPyL10;
    } else
        goto CPyL1;
CPyL1: ;
    cpy_r_r1 = 0;
    CPyTagged_IncRef(cpy_r_r1);
    cpy_r_i = cpy_r_r1;
    goto CPyL2;
CPyL2: ;
    cpy_r_r3 = cpy_r_r1 & 1;
    cpy_r_r4 = cpy_r_r3 == 0;
    cpy_r_r5 = cpy_r_n & 1;
    cpy_r_r6 = cpy_r_r5 == 0;
    cpy_r_r7 = cpy_r_r4 & cpy_r_r6;
    if (cpy_r_r7) {
        goto CPyL3;
    } else
        goto CPyL4;
CPyL3: ;
    cpy_r_r8 = (Py_ssize_t)cpy_r_r1 < (Py_ssize_t)cpy_r_n;
    cpy_r_r2 = cpy_r_r8;
    goto CPyL5;
CPyL4: ;
    cpy_r_r9 = CPyTagged_IsLt_(cpy_r_r1, cpy_r_n);
    cpy_r_r2 = cpy_r_r9;
    goto CPyL5;
CPyL5: ;
    if (cpy_r_r2) {
        goto CPyL6;
    } else
        goto CPyL11;
CPyL6: ;
    cpy_r_r10 = CPyTagged_StealAsObject(cpy_r_i);
    cpy_r_r11 = PySet_Add(cpy_r_r0, cpy_r_r10);
    CPy_DecRef(cpy_r_r10);
    cpy_r_r12 = cpy_r_r11 >= 0;
    if (unlikely(!cpy_r_r12)) {
        CPy_AddTraceback("test.py", "f", 2, CPyStatic_globals);
        goto CPyL12;
    } else
        goto CPyL7;
CPyL7: ;
    cpy_r_r13 = CPyTagged_Add(cpy_r_r1, 2);
    CPyTagged_DecRef(cpy_r_r1);
    CPyTagged_IncRef(cpy_r_r13);
    cpy_r_r1 = cpy_r_r13;
    cpy_r_i = cpy_r_r13;
    goto CPyL2;
CPyL8: ;
    cpy_r_r14 = PySet_New(cpy_r_r0);
    CPy_DecRef(cpy_r_r0);
    if (unlikely(cpy_r_r14 == NULL)) {
        CPy_AddTraceback("test.py", "f", 2, CPyStatic_globals);
        goto CPyL10;
    } else
        goto CPyL9;
CPyL9: ;
    cpy_r_x = cpy_r_r14;
    CPy_DecRef(cpy_r_x);
    return 1;
CPyL10: ;
    cpy_r_r15 = 2;
    return cpy_r_r15;
CPyL11: ;
    CPyTagged_DecRef(cpy_r_r1);
    CPyTagged_DecRef(cpy_r_i);
    goto CPyL8;
CPyL12: ;
    CPy_DecRef(cpy_r_r0);
    CPyTagged_DecRef(cpy_r_r1);
    goto CPyL10;
}

Copy link
Collaborator

@TH3CHARLie TH3CHARLie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Looks good! The generated code shows we successfully avoid the needless list construction.

I'm only concerned with CPyL8, do we really need an extra set creation here?

cc @JukkaL

@JukkaL
Copy link
Collaborator

JukkaL commented Nov 11, 2020

The second set creation may defeat the purpose of this optimization, since we replace the construction of a list with a construction of a set, and creating a set is slower than creating a list. I think that we need to get rid of the second set creation, unless perhaps some benchmark can show that it doesn't matter.

@gvanrossum gvanrossum changed the title set(f(x) for x in it) no longer uses list comprehension (#771) [mypyc] set(f(x) for x in it) no longer uses list comprehension (#771) Nov 21, 2020
JukkaL pushed a commit that referenced this pull request Mar 31, 2021
Closes mypyc/mypyc#771 and #9707.

Changes:

* Move code from tranform_set_comprehension to translate_set_comprehension
* Fix the unnecessary set creation (#9707 (comment))
@TH3CHARLie
Copy link
Collaborator

This PR has been covered by #10261

@TH3CHARLie TH3CHARLie closed this May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants