-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
gh-98003: Inline call frames for CALL_FUNCTION_EX #98004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-98003: Inline call frames for CALL_FUNCTION_EX #98004
Conversation
Looks like not much difference:
Perhaps we should try inlining all |
@markshannon gentle ping for a review please. |
The approach I would prefer is to:
There is a lot unnecessary dismantling of tuples and dicts in the current approach. This is not a problem with this PR, but I think it needs to be sorted out before "inlining" the call in Ideally
Where |
I'm confused. The first step is to create a frame, but then in your example code you show it creating another frame again. Why does it do that? |
Because we don't really create a frame. We push a frame and then initialize it. Pushing a frame is simple. It is initializing that is complex and we want to factor out. |
Just curious, has this stalled out for now? At the very least, the changes in Totally get it if you're busy with school or something right now though! :) Just making sure that you aren't blocked on reviews or design discussions. If so, maybe we can move this forward (I'd really like to see it in)! |
Yeah my exams end next week. I plan to start contributing actively again after then! |
@brandtbucher and @markshannon I think this is ready for your reviews (when you have time of course). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach creates a lot of intermediate objects.
I'm not sure if that is a result of the transformations in this PR, or if the original code also did created those intermediate objects.
If the objects were already being created, then maybe we should streamline it in another PR.
@Fidget-Spinner what do you think?
Python/bytecodes.c
Outdated
int code_flags = ((PyCodeObject *)PyFunction_GET_CODE(func))->co_flags; | ||
PyObject *locals = code_flags & CO_OPTIMIZED ? NULL : Py_NewRef(PyFunction_GET_GLOBALS(func)); | ||
|
||
_PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit_Ex(tstate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to move the unpacking of the tuple into _PyEvalFramePushAndInit_Ex
.
_PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit_Ex(tstate, func, locals, callargs, kwargs).
CALL_FUNCTION_EX
is quite a large instruction. Pushing more code into _PyEvalFramePushAndInit_Ex
would keep it smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unpacking which tuple?
If you mean move the code above in, that would break convention with the rest of the code base. The other inline calls do these
int code_flags = ((PyCodeObject *)PyFunction_GET_CODE(func))->co_flags;
PyObject *locals = code_flags & CO_OPTIMIZED ? NULL : Py_NewRef(PyFunction_GET_GLOBALS(func));
before creating the new frame.
Py_INCREF(PyTuple_GET_ITEM(callargs, i)); | ||
} | ||
} | ||
_PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than transforming callargs
and kwargs
into a shape suitable for _PyEvalFramePushAndInit
, it might make sense to push the frame, then unpack the tuple and dict into the the frame without the intermediate objects.
It complicates the errror handling a bit, and the frame will need to be cleaned up if there is an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can work on this in another PR.
Thanks for this PR. Avoiding C stack consumption for these calls is definitely something we want. |
Yes these are the semantics of the original CALL_FUNCTION_EX. Most of these were copy pasted. I was surprised at how allocation heavy CALL_FUNCTION_EX is and have been thinking of ways to improve it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an expert on the call sequence, but the code generally looks good to me! I'd be excited to see this in 3.12.
I'm not sure how this interacts with PEP 669, but I'm sure @markshannon can answer that pretty easily.
Also, I'd kinda like to see buildbots (for correctness) and benchmarks (for curiousity) on this before merging. I can run the benchmarks for you whenever you think it's ready.
Python/ceval.c
Outdated
PyObject *const *newargs = has_dict | ||
? _PyStack_UnpackDict(tstate, _PyTuple_ITEMS(callargs), | ||
nargs, kwargs, &kwnames) | ||
: &PyTuple_GET_ITEM(callargs, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe if len(callargs) == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be. When len(callargs) == 0
the args shouldn't be used by the vectorcall protocol.
Co-Authored-By: Brandt Bucher <brandt@python.org>
Not sure why all the |
You can run the benchmarks but I expect no change sadly. Extended function calls are used rarely in pyperformance. |
🤖 New build scheduled with the buildbot fleet by @Fidget-Spinner for commit dfd6b01 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
The regen script needs to be run with
Understood. I just kicked off a run anyways. :) |
Co-authored-by: Carl Meyer <carl@oddbird.net>
Yep, no real change on pyperformance (as expected). |
Removing sprint because this issue missed the ones in 2022 and 2023. |
Yeah this issue was added in the 2022 CPython core dev sprint which us some ways back now. |
Thanks everyone! |
* main: (26 commits) pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030) pythongh-104036: Fix direct invocation of test_typing (python#104037) pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761) pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897) Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021) pythongh-88496: Fix IDLE test hang on macOS (python#104025) Improve int test coverage (python#104024) pythongh-88773: Added teleport method to Turtle library (python#103974) pythongh-104015: Fix direct invocation of `test_dataclasses` (python#104017) pythongh-104012: Ensure test_calendar.CalendarTestCase.test_deprecation_warning consistently passes (python#104014) pythongh-103977: compile re expressions in platform.py only if required (python#103981) pythongh-98003: Inline call frames for CALL_FUNCTION_EX (pythonGH-98004) Replace Netlify with Read the Docs build previews (python#103843) Update name in acknowledgements and add mailmap (python#103696) pythongh-82054: allow test runner to split test_asyncio to execute in parallel by sharding. (python#103927) Remove non-existing tools from Sundry skiplist (python#103991) pythongh-103793: Defer formatting task name (python#103767) pythongh-87092: change assembler to use instruction sequence instead of CFG (python#103933) pythongh-103636: issue warning for deprecated calendar constants (python#103833) Various small fixes to dis docs (python#103923) ...
Uh oh!
There was an error while loading. Please reload this page.