KEMBAR78
[mypyc] Optimize str.encode with specializations for common used encodings by svalentin · Pull Request #18232 · python/mypy · GitHub
Skip to content

Conversation

svalentin
Copy link
Collaborator

Tested with:

import time
start = time.time()
for i in range(20000000):
    "test".encode('utf-8')
print(time.time() - start)

With PR applied and running mypyc, python3 -c "import test" runs in:
0.5383486747741699
0.5224344730377197
0.555696964263916

Without PR applied:
0.7315819263458252
0.7105758190155029
0.7471706867218018

Similar times observed for "ascii"

@svalentin svalentin requested a review from JukkaL December 2, 2024 18:19
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some cases aren't covered by the logic. Suggested some test cases that should help.

s.encode('utf-8', errors='strict')
s.encode('utf-8', 'backslashreplace')
s.encode(encoding='ascii')
s.encode('ascii', 'backslashreplace')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test cases where the specialization shouldn't be applied. Examples: s.encode(x), s.encode('a', x), s.encode('utf8', errors=x) and s.encode(errors=x) where x is not a literal.

Test cases where we have two keyword args: s.encode(encoding=..., errors=...) and s.encode(errors=..., encoding=...).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the logic to work out the args better and added more tests. Please take another look!

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@JukkaL JukkaL merged commit e731185 into python:master Dec 3, 2024
13 checks passed
@svalentin svalentin deleted the mypyc-str-encode branch December 17, 2024 16:27
JukkaL added a commit that referenced this pull request Aug 19, 2025
This is similar to #18232, which specialized `encode`.

A micro-benchmark that calls `decode` repeatedly was up to 45% faster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants