KEMBAR78
gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder by vstinner · Pull Request #119398 · python/cpython · GitHub
Skip to content

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented May 22, 2024

Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

return PyUnicode_FromFormat(
    "%s %s %s %s %s.",
    "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.

@vstinner
Copy link
Member Author

vstinner commented May 22, 2024

Benchmark:

diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index f99ebf0dde..0752b2b1d2 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -3312,6 +3312,14 @@ function_set_warning(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
     Py_RETURN_NONE;
 }
 
+static PyObject *
+bench(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
+{
+    return PyUnicode_FromFormat(
+        "%s %s %s %s %s.",
+        "format", "multiple", "utf8", "short", "strings");
+}
+
 static PyMethodDef TestMethods[] = {
     {"set_errno",               set_errno,                       METH_VARARGS},
     {"test_config",             test_config,                     METH_NOARGS},
@@ -3454,6 +3462,7 @@ static PyMethodDef TestMethods[] = {
     {"check_pyimport_addmodule", check_pyimport_addmodule, METH_VARARGS},
     {"test_weakref_capi", test_weakref_capi, METH_NOARGS},
     {"function_set_warning", function_set_warning, METH_NOARGS},
+    {"bench", bench, METH_NOARGS},
     {NULL, NULL} /* sentinel */
 };

Command:

./python -m venv env
env/bin/python -m pip install pyperf
env/bin/python -m pyperf timeit -s 'import _testcapi; func=_testcapi.bench' 'func()' -v -o ref.json

Result, Python built with gcc -O3:

620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster

@vstinner
Copy link
Member Author

Oh, there was a performance regression on b"abc".decode(): I fixed it.

Benchmark:

import pyperf
import _testcapi
runner = pyperf.Runner()

utf8 = b'abc'
runner.bench_func('abc', utf8.decode)

utf8 = 'abcé'.encode()
runner.bench_func('abc + UTF-8', utf8.decode)

utf8 = 'éabc'.encode()
runner.bench_func('UTF-8 + abc', utf8.decode)

utf8 = b'x' * (1024 * 1024)
runner.bench_func('ASCII 1 MiB', utf8.decode)

utf8 = ('x' * (1024 * 1024) + 'é').encode()
runner.bench_func('ASCII 1 MiB + UTF-8', utf8.decode)

utf8 = ('é' + 'x' * (1024 * 1024)).encode()
runner.bench_func('UTF-8 + ASCII 1 MiB', utf8.decode)

utf8 = ('€' + 'x' * (1024 * 1024)).encode()
runner.bench_func('UTF-8 euro + ASCII 1 MiB', utf8.decode)

Results, Python built with gcc -O3, CPU isolation.

+---------------------+---------+-----------------------+
| Benchmark           | ref     | change                |
+=====================+=========+=======================+
| abc                 | 73.7 ns | 74.7 ns: 1.01x slower |
+---------------------+---------+-----------------------+
| abc + UTF-8         | 167 ns  | 172 ns: 1.03x slower  |
+---------------------+---------+-----------------------+
| ASCII 1 MiB         | 118 us  | 118 us: 1.00x faster  |
+---------------------+---------+-----------------------+
| ASCII 1 MiB + UTF-8 | 1.08 ms | 1.07 ms: 1.00x faster |
+---------------------+---------+-----------------------+
| UTF-8 + ASCII 1 MiB | 572 us  | 570 us: 1.00x faster  |
+---------------------+---------+-----------------------+
| Geometric mean      | (ref)   | 1.00x slower          |
+---------------------+---------+-----------------------+

Benchmark hidden because not significant (2): UTF-8 + abc, UTF-8 euro + ASCII 1 MiB

=> There is no significant impact on bytes.decode() performance (no slow down).

@vstinner
Copy link
Member Author

cc @serhiy-storchaka

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

vstinner added 3 commits May 22, 2024 21:17
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

    return PyUnicode_FromFormat(
        "%s %s %s %s %s.",
        "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
@vstinner vstinner enabled auto-merge (squash) May 22, 2024 19:20
@vstinner
Copy link
Member Author

I enabled automerge. Thanks for the review @serhiy-storchaka.

@vstinner vstinner disabled auto-merge May 22, 2024 20:45
@vstinner vstinner enabled auto-merge (squash) May 22, 2024 20:45
@vstinner vstinner changed the title gh-119182: Optimize PyUnicode_FromFormat() UTF-8 decoder gh-119398: Optimize PyUnicode_FromFormat() UTF-8 decoder May 22, 2024
@vstinner vstinner changed the title gh-119398: Optimize PyUnicode_FromFormat() UTF-8 decoder gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder May 22, 2024
@vstinner vstinner merged commit 9b422fc into python:main May 22, 2024
@vstinner vstinner deleted the utf8_writer branch May 22, 2024 21:05
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
…n#119398)

Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

    return PyUnicode_FromFormat(
        "%s %s %s %s %s.",
        "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants