KEMBAR78
gh-92777: Add LOAD_METHOD_LAZY_DICT by Fidget-Spinner · Pull Request #92778 · python/cpython · GitHub
Skip to content

Conversation

@Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented May 13, 2022

Fixes #92777. Specialize LOAD_METHOD for lazy dictionaries. This accounts for 40% of the misses.

I'm sad that I missed 3.11 beta freeze for this specialization. It's straightforward and is likely to account for the majority of LOAD_METHOD in real world code since lazy __dict__ is now commonplace.

@Fidget-Spinner
Copy link
Member Author

Hah, looks like I was wrong, it wasn't that straightforward after all :).

@AlexWaygood AlexWaygood added type-feature A feature request or enhancement performance Performance or resource usage labels May 13, 2022
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor issues, but generally looks good.
What are the stats for the LOAD_METHOD_LAZY_DICT instruction?

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented May 13, 2022

How do you collect stats for pyperformance and create that nice table on faster-cpython? I'm frankly clueless (I only know how to use the one that dumps stats out to the terminal or file). Sorry.

On test suite code, I get a 0.3% improvement on hits and 0.6% more misses. But I want to point out that typing loves messing around with __dict__, so test_typing may not be representative. Pyperformance will likely see a noticeable bump in hits with less misses.

./python -m test test_typing test_re test_dis test_zlib


Before:
opcode[160].specializable : 1
    opcode[160].specialization.success : 1395
    opcode[160].specialization.failure : 1146
    opcode[160].specialization.hit : 1026699
    opcode[160].specialization.deferred : 78612
    opcode[160].specialization.miss : 8664
    opcode[160].specialization.deopt : 156
    opcode[160].execution_count : 18435
    opcode[160].specialization.failure_kinds[0] : 63
    opcode[160].specialization.failure_kinds[1] : 39
    opcode[160].specialization.failure_kinds[2] : 505
    opcode[160].specialization.failure_kinds[4] : 406
    opcode[160].specialization.failure_kinds[9] : 4
    opcode[160].specialization.failure_kinds[10] : 3
    opcode[160].specialization.failure_kinds[17] : 33
    opcode[160].specialization.failure_kinds[18] : 23
    opcode[160].specialization.failure_kinds[19] : 1
    opcode[160].specialization.failure_kinds[22] : 69


After:
opcode[160].specializable : 1
    opcode[160].specialization.success : 1399
    opcode[160].specialization.failure : 1107
    opcode[160].specialization.hit : 1029041
    opcode[160].specialization.deferred : 76217
    opcode[160].specialization.miss : 8717
    opcode[160].specialization.deopt : 157
    opcode[160].execution_count : 18488
    opcode[160].specialization.failure_kinds[0] : 63
    opcode[160].specialization.failure_kinds[2] : 505
    opcode[160].specialization.failure_kinds[4] : 406
    opcode[160].specialization.failure_kinds[9] : 4
    opcode[160].specialization.failure_kinds[10] : 3
    opcode[160].specialization.failure_kinds[17] : 33
    opcode[160].specialization.failure_kinds[18] : 23
    opcode[160].specialization.failure_kinds[19] : 1
    opcode[160].specialization.failure_kinds[22] : 69

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented May 13, 2022

Wow looks like my expectations were proven wrong by the stats again, after removing test_typing, I get a 0.18% increase in hits at the expense of 0.45% more misses. So test_typing was actually bolstering the numbers!

The part of the stdlib I've found that frequently uses this instruction is the _io module's objects. But I don't know how to get stats on those as their tests use subprocesses.

I'm not feeling too confident about this optimization now. It seems like something that would boost our pyperformance numbers but maybe not in the real world?

./python -m test test_re test_dis test_zlib


Before:
opcode[160].specializable : 1
    opcode[160].specialization.success : 2113
    opcode[160].specialization.failure : 4126
    opcode[160].specialization.hit : 5506338
    opcode[160].specialization.deferred : 306030
    opcode[160].specialization.miss : 44980
    opcode[160].specialization.deopt : 745
    opcode[160].execution_count : 59474
    opcode[160].specialization.failure_kinds[0] : 365
    opcode[160].specialization.failure_kinds[1] : 172
    opcode[160].specialization.failure_kinds[2] : 770
    opcode[160].specialization.failure_kinds[4] : 2417
    opcode[160].specialization.failure_kinds[9] : 12
    opcode[160].specialization.failure_kinds[10] : 4
    opcode[160].specialization.failure_kinds[17] : 150
    opcode[160].specialization.failure_kinds[18] : 58
    opcode[160].specialization.failure_kinds[19] : 2
    opcode[160].specialization.failure_kinds[22] : 139
    opcode[160].specialization.failure_kinds[23] : 37



After:
opcode[160].specializable : 1
    opcode[160].specialization.success : 2127
    opcode[160].specialization.failure : 3954
    opcode[160].specialization.hit : 5516541
    opcode[160].specialization.deferred : 295625
    opcode[160].specialization.miss : 45182
    opcode[160].specialization.deopt : 748
    opcode[160].execution_count : 59676
    opcode[160].specialization.failure_kinds[0] : 365
    opcode[160].specialization.failure_kinds[2] : 770
    opcode[160].specialization.failure_kinds[4] : 2417
    opcode[160].specialization.failure_kinds[9] : 12
    opcode[160].specialization.failure_kinds[10] : 4
    opcode[160].specialization.failure_kinds[17] : 150
    opcode[160].specialization.failure_kinds[18] : 58
    opcode[160].specialization.failure_kinds[19] : 2
    opcode[160].specialization.failure_kinds[22] : 139
    opcode[160].specialization.failure_kinds[23] : 37

@markshannon
Copy link
Member

Generating the table is somewhat manual and hacky. I mean to automate it, but for now here's the procedure:

  1. Create a new branch and cherry-pick this commit: faster-cpython@a9c92c0
  2. Run pyperformance compile on that branch. I use this config.ini file: https://gist.github.com/markshannon/26f4e8db2b715c991eee1508f430f6b2 You will need to modify it for your machine and repo.
  3. While it is in the installing phase, create /tmp/py_stats and clear it out rm -r /tmp/py_stats/*
  4. About the time that the installing phase finishes and the benchmarks start, one final rm -r /tmp/py_stats/*

The table is created by running ./python Tools/scripts/summarize_stats.py

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented May 24, 2022

I have the stats here https://gist.github.com/Fidget-Spinner/4dbc2d002c30e36587939c4bdfd9840c.

LOAD_METHOD specialization hits are now 83.5%. It is 78.7% hits on the faster-cpython repo. So that's roughly a 5% increase.

@markshannon
Copy link
Member

Stats look good. Code looks good.
I'm going to run the benchmarks before merging.

@markshannon
Copy link
Member

No real difference in performance, but in line with what we would expect.

@markshannon markshannon merged commit 5e6e5b9 into python:main May 25, 2022
@Fidget-Spinner Fidget-Spinner deleted the load_method_lazy_dict branch May 29, 2022 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage type-feature A feature request or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More LOAD_METHOD specializations

4 participants