KEMBAR78
gh-100792: Make `email.message.Message.__contains__` twice as fast by sobolevn · Pull Request #100793 · python/cpython · GitHub
Skip to content

Conversation

@sobolevn
Copy link
Member

@sobolevn sobolevn commented Jan 6, 2023

See my micro-benchmarks in the original issue.

@sobolevn sobolevn requested a review from a team as a code owner January 6, 2023 11:06
@sobolevn sobolevn added the performance Performance or resource usage label Jan 6, 2023
@sobolevn sobolevn requested a review from hauntsaninja January 7, 2023 06:36
@hauntsaninja
Copy link
Contributor

hauntsaninja commented Jan 7, 2023

If we're doing microoptimizations, I think any is sometimes faster than the Python loop (and arguably more readable). Would you mind benchmarking that as well?

@sobolevn
Copy link
Member Author

sobolevn commented Jan 7, 2023

No, in this case it is slower:

    def __contains__(self, name):
        name_lower = name.lower()
        return any(name_lower == k.lower() for k, v in self._headers)

Results:

» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 1.81 us +- 0.04 us
» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 1.76 us +- 0.13 us

@eendebakpt
Copy link
Contributor

The code with any looks cleaner to me. The reduced performance seems related to #100762

@sobolevn
Copy link
Member Author

sobolevn commented Jan 7, 2023

@eendebakpt we don't want to make existing code slower just to use some style related thing, do we? :)

I agree that any is great, but in this module for k, v is used in many places:

So, I think we can call this pattern native to this module :)

Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking! (and I think I was just wrong about any sometimes being faster than the loop, don't see why it would be)

Anyway, this looks fine to me! cc @JelleZijlstra

One quick note (you're probably aware of this, but in case other potential contributors are reading): CPython is often quite hesitant to accept micro-optimisations and I think this PR comes close to that line. For example, if counterfactually the existing code used any and you changed it to a loop for the same speedup, I think that change would be rejected (without stronger evidence that this is something worth optimising).

(Why is CPython conservative here? First, reviewing changes itself costs maintainer time. Code churn risks bugs, obscures history, and invites more churn. Often micro-optimisations are not robust in the face of differing Python implementations or changes in the interpreter; we should avoid local minima. Such changes often affect readability, but readability is subjective, and this can lead to debate that further eats at maintainer time or leaves contributors feeling unwelcome)

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'd also prefer the cleaner, more idiomatic code using any() -- the precise performance characteristics of any vs a for loop feel like they're subject to change in the future, and I disagree that style decisions made a decade and a half ago should determine the style of new additions to the code base.

But, this is precisely the kind of bikeshedding that @hauntsaninja was talking about. So, I don't want to block the PR based on my style preferences -- it is indeed a nice optimisation :)

@JelleZijlstra
Copy link
Member

No strong opinion here but @hauntsaninja feel free to merge based on your best judgment.

Also, in the future no need to ping me on all PRs any more, though of course you can if you want another opinion.

…EOJth.rst

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@AlexWaygood
Copy link
Member

See also @pochmann's comments on the issue: #100792 (comment)

@hauntsaninja
Copy link
Contributor

Thanks for caring and for making Python faster!

@hauntsaninja hauntsaninja merged commit 6746135 into python:main Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants