gh-140149: use `PyBytesWriter` in `action_helpers.c`'s `_build_concatenated_bytes`; 3x faster `bytes` concat in the parser #140150

maurycy · 2025-10-15T08:35:32Z

The issue gh-140149 provides more details.

This effectively makes bytes concatenation about 3x faster in the parser, syntax like:

x = (b'meow')
y = (b'meow' b'cow')

Benchmark

The script:

from __future__ import annotations

import ast
import pyperf


def make_src(n, chunk_len, per_line: int = 64):
    assert n > 0 and chunk_len >= 0 and per_line > 0
    chunk = "b'" + ("x" * chunk_len) + "'"
    parts = [chunk] * n
    lines = ["x = ("]
    while parts:
        group = " ".join(parts[:per_line])
        parts = parts[per_line:]
        lines.append(f"    {group}")
    lines.append(")")
    return "\n".join(lines)


def bench_compile(loops, n, chunk_len, per_line: int = 64):
    src = make_src(n, chunk_len, per_line)
    t0 = pyperf.perf_counter()
    for _ in range(loops):
        compile(src, "<bench>", "exec")
    return pyperf.perf_counter() - t0


def bench_ast_parse(loops, n, chunk_len, per_line: int = 64):
    src = make_src(n, chunk_len, per_line)
    t0 = pyperf.perf_counter()
    for _ in range(loops):
        ast.parse(src, filename="<bench>", mode="exec")
    return pyperf.perf_counter() - t0


def main():
    runner = pyperf.Runner()

    for n in (1, 2, 4, 8, 16, 32, 64, 128, 1024):
        runner.bench_time_func(
            f"compile_bytes_concat_n{n}_chunk1",
            bench_compile,
            n,
            1,
        )
        runner.bench_time_func(
            f"parse_bytes_concat_n{n}_chunk1",
            bench_ast_parse,
            n,
            1,
        )

    for n, chunk in ((256, 4), (4, 128), (4, 256)):
        runner.bench_time_func(
            f"compile_bytes_concat_n{n}_chunk{chunk}",
            bench_compile,
            n,
            chunk,
        )
        runner.bench_time_func(
            f"parse_bytes_concat_n{n}_chunk{chunk}",
            bench_ast_parse,
            n,
            chunk,
        )


if __name__ == "__main__":
    main()

The results (with --rigorous, on 9955759):

Benchmark	main	peg-pybytes-bytes-concat-single-alloc
compile_bytes_concat_n1_chunk1	9.89 us	3.68 us: 2.68x faster
parse_bytes_concat_n1_chunk1	6.97 us	2.55 us: 2.73x faster
compile_bytes_concat_n2_chunk1	10.4 us	3.72 us: 2.79x faster
parse_bytes_concat_n2_chunk1	7.33 us	2.63 us: 2.78x faster
compile_bytes_concat_n4_chunk1	10.8 us	3.90 us: 2.78x faster
parse_bytes_concat_n4_chunk1	7.88 us	2.84 us: 2.77x faster
compile_bytes_concat_n8_chunk1	11.5 us	4.14 us: 2.78x faster
parse_bytes_concat_n8_chunk1	8.51 us	3.04 us: 2.80x faster
compile_bytes_concat_n16_chunk1	13.0 us	4.71 us: 2.76x faster
parse_bytes_concat_n16_chunk1	9.89 us	3.57 us: 2.77x faster
compile_bytes_concat_n32_chunk1	15.4 us	5.59 us: 2.75x faster
parse_bytes_concat_n32_chunk1	12.6 us	4.43 us: 2.85x faster
compile_bytes_concat_n64_chunk1	20.6 us	7.15 us: 2.88x faster
parse_bytes_concat_n64_chunk1	17.6 us	5.99 us: 2.94x faster
compile_bytes_concat_n128_chunk1	30.6 us	10.0 us: 3.05x faster
parse_bytes_concat_n128_chunk1	27.6 us	8.84 us: 3.12x faster
compile_bytes_concat_n1024_chunk1	165 us	48.9 us: 3.38x faster
parse_bytes_concat_n1024_chunk1	162 us	47.8 us: 3.40x faster
compile_bytes_concat_n256_chunk4	60.2 us	18.4 us: 3.27x faster
parse_bytes_concat_n256_chunk4	57.2 us	16.9 us: 3.38x faster
compile_bytes_concat_n4_chunk128	12.7 us	5.11 us: 2.47x faster
parse_bytes_concat_n4_chunk128	9.64 us	3.83 us: 2.51x faster
compile_bytes_concat_n4_chunk256	14.1 us	5.96 us: 2.36x faster
parse_bytes_concat_n4_chunk256	10.9 us	4.57 us: 2.38x faster
Geometric mean	(ref)	2.84x faster

The environment:

% ./python -c "import sysconfig; print(sysconfig.get_config_var('CONFIG_ARGS'))"
'--enable-optimizations' '--with-lto'

sudo ./python -m pyperf system tune ensured.

maurycy · 2025-10-15T08:36:32Z

cc @vstinner @cmaloney

pablogsal · 2025-10-15T12:16:11Z

Parser/action_helpers.c

-        PyBytes_Concat(&res, elem->v.Constant.value);
+        Py_ssize_t part = PyBytes_GET_SIZE(elem->v.Constant.value);
+        if (part > 0) {
+            memcpy(out, PyBytes_AS_STRING(elem->v.Constant.value), part);


Why not using PyBytesWriter_WriteBytes here?

PyBytesWriter_WriteBytes() grows the buffer if needed. It's not needed since the code already computes the total size in advance.

vstinner · 2025-10-15T12:20:37Z

Parser/action_helpers.c

-        PyBytes_Concat(&res, elem->v.Constant.value);
+        Py_ssize_t part = PyBytes_GET_SIZE(elem->v.Constant.value);
+        if (part > 0) {
+            memcpy(out, PyBytes_AS_STRING(elem->v.Constant.value), part);


PyBytesWriter_WriteBytes() grows the buffer if needed. It's not needed since the code already computes the total size in advance.

Parser/action_helpers.c

vstinner · 2025-10-15T12:26:16Z

vstinner added the skip news label

This optimization is good to have, but I don't think that users will notice since the parser is only run once at Python startup. I don't think that it's worth it to document this optimization.

pablogsal · 2025-10-15T12:26:46Z

vstinner added the skip news label

This optimization is good to have, but I don't think that users will notice since the parser is only run once at Python startup. I don't think that it's worth it to document this optimization.

I concur, but on the other hand it doesn't hurt

Co-authored-by: Victor Stinner <vstinner@python.org>

vstinner

LGTM

Misc/NEWS.d/next/Core_and_Builtins/2025-10-15-17-12-32.gh-issue-140149.cy1m3d.rst

…e-140149.cy1m3d.rst

cmaloney · 2025-10-15T17:46:04Z

Parser/action_helpers.c

-    PyObject* kind = asdl_seq_GET(strings, 0)->v.Constant.kind;
+    PyObject *kind = asdl_seq_GET(strings, 0)->v.Constant.kind;
+
+    Py_ssize_t total = 0;


bit of a meta question: How much performance change is it to not pre-calculate the length? The precalculation + memcpy makes this code quite a bit more complex. If it's only a couple percentage difference (so most the 3x is kept) the simpler code for some of these would be nice

If you use PyBytesWriter_Create(0), you have to resize the buffer multiple times, it's less efficient. I don't know how much. But here, the output size is easy to compute so I think that it's worth it to precompute the output size.

vstinner · 2025-10-16T17:25:09Z

Merged, thanks for this nice optimization!

use PyBytesWriter in _build_concatenated_bytes

692a4f0

maurycy requested review from lysnikolaou and pablogsal as code owners October 15, 2025 08:35

bedevere-app bot mentioned this pull request Oct 15, 2025

Use PyBytesWriter API in PEG parser's _build_concatenated_bytes, avoid quadratic memory allocations #140149

Closed

bedevere-app bot added the awaiting review label Oct 15, 2025

maurycy changed the title ~~gh-140149: use PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x speed up for bytes concat in the parser~~ gh-140149: use PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x faster bytes concat in the parser Oct 15, 2025

pablogsal reviewed Oct 15, 2025

View reviewed changes

vstinner reviewed Oct 15, 2025

View reviewed changes

vstinner added the skip news label Oct 15, 2025

maurycy and others added 4 commits October 15, 2025 17:03

no var read twice

904fe91

NEWS

bde6eaf

Update Parser/action_helpers.c

e5b07d6

Co-authored-by: Victor Stinner <vstinner@python.org>

Update Parser/action_helpers.c

9955759

Co-authored-by: Victor Stinner <vstinner@python.org>

vstinner approved these changes Oct 15, 2025

View reviewed changes

Misc/NEWS.d/next/Core_and_Builtins/2025-10-15-17-12-32.gh-issue-140149.cy1m3d.rst Outdated Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting review labels Oct 15, 2025

Update Misc/NEWS.d/next/Core_and_Builtins/2025-10-15-17-12-32.gh-issu…

3d375ce

…e-140149.cy1m3d.rst

cmaloney reviewed Oct 15, 2025

View reviewed changes

vstinner merged commit 459d493 into python:main Oct 16, 2025
45 checks passed

bedevere-app bot removed the awaiting merge label Oct 16, 2025

maurycy deleted the peg-pybytes-bytes-concat-single-alloc branch October 16, 2025 17:48

Uh oh!

gh-140149: use PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x faster bytes concat in the parser #140150

gh-140149: use PyBytesWriter in action_helpers.c's _build_concatenated_bytes; 3x faster bytes concat in the parser #140150

Uh oh!

Conversation

maurycy commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

maurycy commented Oct 15, 2025

Uh oh!

pablogsal Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

vstinner Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

vstinner Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner commented Oct 15, 2025

Uh oh!

pablogsal commented Oct 15, 2025

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cmaloney Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

vstinner Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gh-140149: use `PyBytesWriter` in `action_helpers.c`'s `_build_concatenated_bytes`; 3x faster `bytes` concat in the parser #140150

gh-140149: use `PyBytesWriter` in `action_helpers.c`'s `_build_concatenated_bytes`; 3x faster `bytes` concat in the parser #140150

maurycy commented Oct 15, 2025 •

edited

Loading