-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Closed
Labels
OS-windowsbuildThe build process and cross-buildThe build process and cross-buildperformancePerformance or resource usagePerformance or resource usagetype-featureA feature request or enhancementA feature request or enhancement
Description
This started off as a build time analysis (#130090 (comment)), but since I now have the infrastructure, I tried -flto=thin, too:
- faster in building 520.6 vs 651.2 seconds
- is neutral on the pyperformance benchmarks
- would bring us in sync with Linux, because there
CONFIGURE_CFLAGS_NODISTandCONFIGURE_LDFLAGS_NOLTOboth use-flto=thinwhen I configure for clang in WSL Ubuntu-24.04. See also the discussion why not to use full-fltoin Revert to default fullLTO on Clang #130048
| Benchmark | clang.pgo.20.1.0-rc2 | clang.pgo.thin.20.1.0-rc2 |
|---|---|---|
| Geometric mean | (ref) | 1.00x faster |
Detailed pybenchmark results
| Benchmark | clang.pgo.20.1.0-rc2 | clang.pgo.thin.20.1.0-rc2 |
|---|---|---|
| float | 95.0 ms | 89.7 ms: 1.06x faster |
| json_loads | 29.8 us | 28.6 us: 1.04x faster |
| mdp | 2.86 sec | 2.77 sec: 1.03x faster |
| html5lib | 68.3 ms | 66.2 ms: 1.03x faster |
| async_tree_none_tg | 330 ms | 320 ms: 1.03x faster |
| pyflate | 518 ms | 505 ms: 1.03x faster |
| sqlite_synth | 3.21 us | 3.13 us: 1.03x faster |
| pidigits | 228 ms | 223 ms: 1.02x faster |
| bench_mp_pool | 168 ms | 165 ms: 1.02x faster |
| async_tree_eager_io | 742 ms | 727 ms: 1.02x faster |
| generators | 34.5 ms | 33.8 ms: 1.02x faster |
| comprehensions | 18.3 us | 17.9 us: 1.02x faster |
| async_tree_cpu_io_mixed | 641 ms | 629 ms: 1.02x faster |
| scimark_sparse_mat_mult | 4.51 ms | 4.43 ms: 1.02x faster |
| async_tree_memoization | 425 ms | 417 ms: 1.02x faster |
| sympy_expand | 538 ms | 529 ms: 1.02x faster |
| unpack_sequence | 57.0 ns | 56.0 ns: 1.02x faster |
| regex_dna | 209 ms | 205 ms: 1.02x faster |
| async_generators | 465 ms | 458 ms: 1.02x faster |
| scimark_sor | 140 ms | 137 ms: 1.02x faster |
| sympy_str | 319 ms | 314 ms: 1.02x faster |
| async_tree_io_tg | 751 ms | 740 ms: 1.01x faster |
| regex_effbot | 3.14 ms | 3.10 ms: 1.01x faster |
| async_tree_eager_tg | 272 ms | 268 ms: 1.01x faster |
| pickle_dict | 27.3 us | 27.0 us: 1.01x faster |
| async_tree_eager_memoization_tg | 363 ms | 359 ms: 1.01x faster |
| sympy_integrate | 22.5 ms | 22.2 ms: 1.01x faster |
| sympy_sum | 181 ms | 179 ms: 1.01x faster |
| 2to3 | 390 ms | 386 ms: 1.01x faster |
| hexiom | 6.68 ms | 6.61 ms: 1.01x faster |
| docutils | 3.03 sec | 3.00 sec: 1.01x faster |
| sqlglot_normalize | 121 ms | 120 ms: 1.01x faster |
| async_tree_memoization_tg | 392 ms | 389 ms: 1.01x faster |
| async_tree_cpu_io_mixed_tg | 614 ms | 609 ms: 1.01x faster |
| tomli_loads | 2.20 sec | 2.18 sec: 1.01x faster |
| spectral_norm | 102 ms | 101 ms: 1.01x faster |
| python_startup_no_site | 34.4 ms | 34.2 ms: 1.01x faster |
| genshi_text | 24.6 ms | 24.5 ms: 1.01x faster |
| dulwich_log | 119 ms | 118 ms: 1.00x faster |
| go | 128 ms | 128 ms: 1.00x faster |
| deltablue | 3.62 ms | 3.63 ms: 1.00x slower |
| unpickle_pure_python | 247 us | 248 us: 1.00x slower |
| xml_etree_generate | 107 ms | 107 ms: 1.01x slower |
| django_template | 39.2 ms | 39.4 ms: 1.01x slower |
| coroutines | 24.8 ms | 25.0 ms: 1.01x slower |
| mako | 13.3 ms | 13.5 ms: 1.01x slower |
| unpickle | 15.9 us | 16.1 us: 1.01x slower |
| nbody | 119 ms | 121 ms: 1.01x slower |
| fannkuch | 465 ms | 472 ms: 1.01x slower |
| crypto_pyaes | 81.3 ms | 82.6 ms: 1.02x slower |
| json_dumps | 11.5 ms | 11.7 ms: 1.02x slower |
| deepcopy | 285 us | 291 us: 1.02x slower |
| pprint_safe_repr | 858 ms | 876 ms: 1.02x slower |
| xml_etree_iterparse | 136 ms | 139 ms: 1.02x slower |
| gc_traversal | 5.03 ms | 5.14 ms: 1.02x slower |
| meteor_contest | 115 ms | 117 ms: 1.02x slower |
| deepcopy_memo | 33.8 us | 34.7 us: 1.03x slower |
| richards_super | 51.1 ms | 52.6 ms: 1.03x slower |
| scimark_fft | 327 ms | 337 ms: 1.03x slower |
| richards | 44.9 ms | 46.3 ms: 1.03x slower |
| pickle_list | 4.83 us | 4.99 us: 1.03x slower |
| deepcopy_reduce | 2.93 us | 3.03 us: 1.03x slower |
| pprint_pformat | 1.74 sec | 1.80 sec: 1.03x slower |
| logging_simple | 10.9 us | 11.4 us: 1.05x slower |
| logging_format | 12.1 us | 12.6 us: 1.05x slower |
| xml_etree_parse | 197 ms | 208 ms: 1.05x slower |
| Geometric mean | (ref) | 1.00x faster |
| pgo_clang_20.1.0-rc2 | pgo_clang_thin_20.1.0-rc2 | |
|---|---|---|
| pginstr | 297.2 | 219.3 |
| pgo | 70.0 | 69.0 |
| kill | 1.2 | 0.5 |
| pgupd | 282.8 | 231.7 |
| total time | 651.2 | 520.6 |
Details pginstrument
| pgo_clang_20.1.0-rc2 | pgo_clang_thin_20.1.0-rc2 | |
|---|---|---|
| _freeze_module | 38.5 | 40.0 |
| python314 | 141.5 | 81.3 |
| pyexpat | 52.7 | 3.9 |
| _elementtree | 51.8 | 5.3 |
| sqlite3 | 46.0 | 42.4 |
| liblzma | 18.2 | 16.5 |
| _decimal | 12.4 | 7.7 |
| _testcapi | 8.3 | 7.1 |
| _bz2 | 7.0 | 4.9 |
| _ctypes | 6.9 | 7.5 |
| _testlimitedcapi | 4.9 | 4.3 |
| _wmi | 4.5 | 3.0 |
| _overlapped | 4.5 | 3.2 |
| _asyncio | 4.0 | 5.2 |
| _lzma | 3.8 | 1.8 |
| _ssl | 3.7 | 5.5 |
| _ctypes_test | 3.7 | 3.4 |
| _multiprocessing | 3.5 | 2.7 |
| _sqlite3 | 3.4 | 2.8 |
| venvwlauncher | 3.3 | 2.7 |
| _zoneinfo | 3.1 | 3.4 |
| unicodedata | 2.7 | 3.0 |
| pyshellext | 2.7 | 2.6 |
| pyw | 2.7 | 2.7 |
| py | 2.6 | 2.5 |
| _socket | 2.4 | 3.7 |
| _testinternalcapi | 2.4 | 2.2 |
| _tkinter | 2.2 | 4.1 |
| _testclinic | 2.0 | 1.9 |
| _hashlib | 1.8 | 3.1 |
| select | 1.8 | 2.2 |
| venvlauncher | 1.8 | 1.7 |
| winsound | 1.7 | 3.3 |
| _uuid | 1.6 | 3.2 |
| _queue | 1.6 | 2.3 |
| _testembed | 1.5 | 1.5 |
| _testbuffer | 1.4 | 1.3 |
| pythonw | 1.1 | 1.1 |
| _testconsole | 1.1 | 1.1 |
| _testmultiphase | 1.0 | 1.0 |
| _testsinglephase | 1.0 | 1.0 |
| python | 1.0 | 0.9 |
| _testclinic_limited | 0.9 | 0.9 |
| _testimportmultiple | 0.9 | 0.9 |
| python3 | 0.5 | 0.5 |
| total | 465.8 | 303.3 |
Details pgupdate
| pgo_clang_20.1.0-rc2 | pgo_clang_thin_20.1.0-rc2 | |
|---|---|---|
| _freeze_module | 38.0 | 39.5 |
| python314 | 141.9 | 95.4 |
| sqlite3 | 44.4 | 42.9 |
| liblzma | 17.3 | 16.5 |
| _decimal | 11.2 | 8.7 |
| _testcapi | 8.6 | 7.3 |
| _ctypes | 8.0 | 7.2 |
| _bz2 | 7.8 | 5.5 |
| _ssl | 5.2 | 5.6 |
| _testlimitedcapi | 5.0 | 4.2 |
| pyexpat | 4.6 | 3.6 |
| _asyncio | 4.5 | 4.6 |
| _socket | 4.3 | 3.5 |
| _tkinter | 4.0 | 4.2 |
| _ctypes_test | 3.7 | 3.4 |
| _overlapped | 3.5 | 3.7 |
| _elementtree | 3.5 | 4.5 |
| _wmi | 3.5 | 3.1 |
| _zoneinfo | 3.2 | 3.2 |
| _lzma | 3.2 | 1.9 |
| unicodedata | 3.2 | 3.0 |
| _sqlite3 | 3.1 | 2.7 |
| _hashlib | 3.1 | 3.3 |
| venvwlauncher | 3.1 | 3.0 |
| _multiprocessing | 2.8 | 2.6 |
| pyshellext | 2.7 | 2.6 |
| pyw | 2.6 | 2.6 |
| _uuid | 2.6 | 2.8 |
| py | 2.6 | 2.7 |
| _testinternalcapi | 2.4 | 2.2 |
| _testclinic | 2.0 | 1.9 |
| _queue | 1.9 | 2.2 |
| winsound | 1.8 | 3.0 |
| venvlauncher | 1.7 | 1.5 |
| select | 1.6 | 2.0 |
| _testembed | 1.5 | 1.4 |
| _testbuffer | 1.4 | 1.3 |
| _testconsole | 1.1 | 1.0 |
| pythonw | 1.1 | 1.1 |
| _testmultiphase | 1.0 | 1.1 |
| _testsinglephase | 1.0 | 1.0 |
| python | 1.0 | 0.9 |
| _testclinic_limited | 0.9 | 0.9 |
| _testimportmultiple | 0.9 | 0.9 |
| python3 | 0.5 | 0.5 |
| total | 372.9 | 316.8 |
Linked PRs
Metadata
Metadata
Assignees
Labels
OS-windowsbuildThe build process and cross-buildThe build process and cross-buildperformancePerformance or resource usagePerformance or resource usagetype-featureA feature request or enhancementA feature request or enhancement