-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
bpo-44525: Copy freevars in bytecode to allow calls to inner functions to be specialized #29595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-44525: Copy freevars in bytecode to allow calls to inner functions to be specialized #29595
Conversation
🤖 New build scheduled with the buildbot fleet by @markshannon for commit 4fbaa99 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM.
I'm pretty sure there are a few slight (and very unlikely) races here. I also have a couple questions about the purpose of f_globals
and f_builtins
on the frame.
PyObject *f_globals; /* Borrowed reference */ | ||
PyObject *f_builtins; /* Borrowed reference */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we avoid extra refcount operations for these, since f_func
is guaranteed to hold a ref, right? Cool!
The only problem I see is that the function's func_globals
(or func_builtins
) could get swapped out in another thread while this frame is executing. Then we'd lose that borrowed reference and potentially segfault. It's an unlikely case, but still a possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having these on struct _interpreter_frame
is helpful because we avoid extra pointer indirection, right?
However, that only matters in cases where these are used relatively frequently (so the cost of the indirection adds up), i.e. in the eval loop (via GLOBALS()
). If we do not expect frame->f_globals
to change during the eval loop then could we move both to local variables before starting the loop (and drop them from struct _interpreter_frame
altogether)?
Note that then we would have a different possible race where the function's func_globals
could get swapped out while the frame is running, leading to weirdness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why don't we similarly add f_closure
to the frame? Is it because we want to avoid increasing the size of struct _interpreter_frame
(and we don't actually use the closure that much, so the cost of the indirection is bearable)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why don't we similarly add
f_closure
to the frame? Is it because we want to avoid increasing the size ofstruct _interpreter_frame
(and we don't actually use the closure that much, so the cost of the indirection is bearable)?
We might want to avoid creating a closure object at all. We could put the cells at the end of the function object, rather than in a separate tuple. but that's future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you skipped over the two other comments here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frame has a strong reference to the function, which has a strong reference to both the globals and builtins.
So borrowed references to the globals and builtins are OK.
If we do not expect frame->f_globals to change during the eval loop then could we move both to local variables before starting the loop (and drop them from struct _interpreter_frame altogether)?
And where would you store those references across calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep f_func
on the frame then that's where we'd get them.
When you're done making the requested changes, leave the comment: |
🤖 New build scheduled with the buildbot fleet by @markshannon for commit 51f2e49 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…nctions to be specialized (pythonGH-29595)
…nctions to be specialized (pythonGH-29595) * Make internal APIs that take PyFrameConstructor take a PyFunctionObject instead. * Add reference to function to frame, borrow references to builtins and globals. * Add COPY_FREE_VARS instruction to allow specialization of calls to inner functions.
…er (#87) This PR skips frame annotation completeness tests for Python >= 3.11. Python 3.11 added a reference to the function from the frame (see python/cpython#29595), but didn't expose that reference at Python level.
This should speedup calls to functions, by allowing us to specialize calls to functions with free variables without penalizing calls to other functions.
However, there is a bit of yak-shaving involved:
eval
and friends, rather than just passing aFrameDescriptor
.Although the speedup is probably not significant this is worthwhile as it makes more operations explicit in the bytecode which what we want when optimizing.
https://bugs.python.org/issue44525