-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
gh-140149: use PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x faster bytes concat in the parser
#140150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-140149: use PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x faster bytes concat in the parser
#140150
Conversation
PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x speed up for bytes concat in the parserPyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x faster bytes concat in the parser
Parser/action_helpers.c
Outdated
| PyBytes_Concat(&res, elem->v.Constant.value); | ||
| Py_ssize_t part = PyBytes_GET_SIZE(elem->v.Constant.value); | ||
| if (part > 0) { | ||
| memcpy(out, PyBytes_AS_STRING(elem->v.Constant.value), part); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using PyBytesWriter_WriteBytes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyBytesWriter_WriteBytes() grows the buffer if needed. It's not needed since the code already computes the total size in advance.
Parser/action_helpers.c
Outdated
| PyBytes_Concat(&res, elem->v.Constant.value); | ||
| Py_ssize_t part = PyBytes_GET_SIZE(elem->v.Constant.value); | ||
| if (part > 0) { | ||
| memcpy(out, PyBytes_AS_STRING(elem->v.Constant.value), part); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyBytesWriter_WriteBytes() grows the buffer if needed. It's not needed since the code already computes the total size in advance.
This optimization is good to have, but I don't think that users will notice since the parser is only run once at Python startup. I don't think that it's worth it to document this optimization. |
I concur, but on the other hand it doesn't hurt |
Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Misc/NEWS.d/next/Core_and_Builtins/2025-10-15-17-12-32.gh-issue-140149.cy1m3d.rst
Outdated
Show resolved
Hide resolved
…e-140149.cy1m3d.rst
| PyObject* kind = asdl_seq_GET(strings, 0)->v.Constant.kind; | ||
| PyObject *kind = asdl_seq_GET(strings, 0)->v.Constant.kind; | ||
|
|
||
| Py_ssize_t total = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit of a meta question: How much performance change is it to not pre-calculate the length? The precalculation + memcpy makes this code quite a bit more complex. If it's only a couple percentage difference (so most the 3x is kept) the simpler code for some of these would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use PyBytesWriter_Create(0), you have to resize the buffer multiple times, it's less efficient. I don't know how much. But here, the output size is easy to compute so I think that it's worth it to precompute the output size.
|
Merged, thanks for this nice optimization! |
The issue gh-140149 provides more details.
This effectively makes
bytesconcatenation about 3x faster in the parser, syntax like:Benchmark
The script:
The results (with
--rigorous, on 9955759):The environment:
sudo ./python -m pyperf system tuneensured.