KEMBAR78
[mypyc] Call generator helper method directly in await expression by JukkaL · Pull Request #19376 · python/mypy · GitHub
Skip to content

Conversation

JukkaL
Copy link
Collaborator

@JukkaL JukkaL commented Jul 4, 2025

Previously calls like await foo() were compiled to code that included code like this (in Python-like pseudocode):

a = foo()
...
b = get_coro(a)
...
c = next(b)

In the above code, get_coro(a) just returns a if foo is a native async function, so we now optimize this call away. Also next(b) calls b.__next__(), which calls the generated generator helper method __mypyc_generator_helper__. Now we call the helper method directly, which saves some unnecessary calls.

More importantly, in a follow-up PR I can easily change the way __mypyc_generator_helper__ is called, since we now call it directly. This makes it possible to avoid raising a StopIteration exception in many await expressions. The goal of this PR is to prepare for the latter optimization. This PR doesn't help performance significantly by itself.

In order to call the helper method directly, I had to generate the declaration of this method and the generated generator class before the main irbuild pass, since otherwise a call site could be processed before we have processed the called generator.

I also improved test coverage of related functionality. We don't have an IR test for async calls, since the IR is very verbose. I manually inspected the generated IR to verify that the new code path works both when calling a top-level function and when calling a method. I'll later add a mypyc benchmark to ensure that we will notice if the performance of async calls is degraded.

# Give a more precise type for generators, so that we can optimize
# code that uses them. They return a generator object, which has a
# specific class. Without this, the type would have to be 'object'.
ret: RType = RInstance(self.fdef_to_generator[fdef])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first part of the optimization.

and val.type.class_ir.has_method(helper_method)
):
# This is a generated generator class, and we can use a fast path.
iter_val: Value = val
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this function implemented the bulk of the optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a bit more detailed comment (essentially summarizing the PR description) so we will not forget the motivation.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, just one suggestion.

and val.type.class_ir.has_method(helper_method)
):
# This is a generated generator class, and we can use a fast path.
iter_val: Value = val
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a bit more detailed comment (essentially summarizing the PR description) so we will not forget the motivation.

@JukkaL JukkaL merged commit ada0d2a into master Jul 7, 2025
13 checks passed
@JukkaL JukkaL deleted the mypyc-await-optimize-1 branch July 7, 2025 14:43
JukkaL added a commit that referenced this pull request Jul 8, 2025
When calling a native async function using `await`, e.g. `await foo()`,
avoid raising `StopIteration` to pass the return value, since this is
expensive. Instead, pass an extra `PyObject **` argument to the
generator helper method and use that to return the return value. This is
mostly helpful when there are many calls using await that don't block
(e.g. there is a fast path that is usually taken that doesn't block).
When awaiting from non-compiled code, the slow path is still taken.

This builds on top of #19376.

This PR makes this microbenchmark about 3x faster, which is about the
ideal scenario for this optimization:
```
import asyncio
from time import time

async def inc(x: int) -> int:
    return x + 1


async def bench(n: int) -> int:
    x = 0
    for i in range(n):
        x = await inc(x)
    return x

asyncio.run(bench(1000))

t0 = time()
asyncio.run(bench(1000 * 1000 * 200))
print(time() - t0)
```
@Chainfire
Copy link
Contributor

from typing import Type

import asyncio
import traceback


class Parent:
    async def test(self) -> int:
        raise NotImplementedError()

    async def run(self) -> int:
        return await self.test()


class ChildTypeError1(Parent):
    async def test(self) -> int:
        return 1


class ChildTypeError2(Parent):
    async def run(self) -> int:
        return await self.test()


def test(cls: Type[Parent]) -> None:
    print(str(cls.__name__))
    try:
        print("- ", asyncio.run(cls().run()))
    except NotImplementedError:
        pass
    except Exception:
        traceback.print_exc()
        exit(1)


test(ChildTypeError1)
test(ChildTypeError2)
exit(0)

Results in:

ada0d2a3b
running build_ext
building 'test' extension
creating build/temp.linux-x86_64-cpython-312/build
x86_64-linux-gnu-gcc -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/home/jorrit/python/omnes/mypy/mypyc/lib-rt -I/home/jorrit/python/omnes/omnes-router-services/.venv/include -I/usr/include/python3.12 -c build/__native.c -o build/temp.linux-x86_64-cpython-312/build/__native.o -O3 -g1 -Werror -Wno-unused-function -Wno-unused-label -Wno-unreachable-code -Wno-unused-variable -Wno-unused-command-line-argument -Wno-unknown-warning-option -Wno-unused-but-set-variable -Wno-ignored-optimization-argument -Wno-cpp
creating build/lib.linux-x86_64-cpython-312
x86_64-linux-gnu-gcc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 build/temp.linux-x86_64-cpython-312/build/__native.o -L/usr/lib/x86_64-linux-gnu -o build/lib.linux-x86_64-cpython-312/test.cpython-312-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-cpython-312/test.cpython-312-x86_64-linux-gnu.so ->
ChildTypeError1
Traceback (most recent call last):
  File "test.py", line 28, in test
    print("- ", asyncio.run(cls().run()))
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "test.py", line 12, in run
    return await self.test()
TypeError: test.test_Parent_gen object expected; got test.test_ChildTypeError1_gen

Ping #19764

@CoolCat467
Copy link
Contributor

Error message and highlevel reproduction looks awfully similar to #19558, I expect they might both be stemming from the same issue.

@Chainfire
Copy link
Contributor

Chainfire commented Sep 16, 2025

Error message and highlevel reproduction looks awfully similar to #19558, I expect they might both be stemming from the same issue.

They do not. Your issue exists prior to this merge - the code from your opening comment errors out if I go further back in history than this commit.

@BobTheBuidler
Copy link
Contributor

BobTheBuidler commented Sep 16, 2025

At a quick glance it looks like if Parent is not @Final and Parent.test is not @Final, it will need to use object_rprimitive

edit: oh crap, sorry guy named @final I didn't mean to tag you

@JukkaL
Copy link
Collaborator Author

JukkaL commented Sep 17, 2025

I created an issue about the regression: mypyc/mypyc#1141

Let's move further discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants