KEMBAR78
gh-117349: Micro-optimize a few `os.path` functions by nineteendo · Pull Request #117350 · python/cpython · GitHub
Skip to content

Conversation

@nineteendo
Copy link
Contributor

@nineteendo nineteendo commented Mar 28, 2024

Benchmarks

ntpath.py

script
# TODO: test isjunction() on Windows
echo "isreserved()" && python -m timeit -s "import before.ntpath" "before.ntpath.isreserved('con')" && python -m timeit -s "import after.ntpath" "after.ntpath.isreserved('con')"
echo "isreserved('.')" && python -m timeit -s "import before.ntpath" "before.ntpath.isreserved('.')" && python -m timeit -s "import after.ntpath" "after.ntpath.isreserved('.')"
echo "expanduser()" && python -m timeit -s "import before.ntpath" "before.ntpath.expanduser('~')" && python -m timeit -s "import after.ntpath" "after.ntpath.expanduser('~')"
echo "realpath()"; python -m timeit -s "import test" "test.realpath1('.')"; python -m timeit -s "import test" "test.realpath2('.')"
echo "realpath('nul')"; python -m timeit -s "import test" "test.realpath1('nul')"; python -m timeit -s "import test" "test.realpath2('nul')"
isreserved()
200000 loops, best of 5: 1.66 usec per loop # before
200000 loops, best of 5: 1.65 usec per loop # after
# -> no difference
isreserved('.')
200000 loops, best of 5: 1.31 usec per loop # before
500000 loops, best of 5: 966 nsec per loop # after
# -> 1.36x faster (for . & ..)
expanduser()
200000 loops, best of 5: 1.72 usec per loop # before
200000 loops, best of 5: 1.72 usec per loop # after
# -> no difference
realpath()
2000 loops, best of 5: 132 usec per loop # before
2000 loops, best of 5: 131 usec per loop # after
# -> no difference
realpath('nul')
200000 loops, best of 5: 1.26 usec per loop # before
500000 loops, best of 5: 907 nsec per loop # after
# -> 1.39x faster (for nul)

posixpath.py

script
# test.sh
echo "ismount()" && python -m timeit -s "import before.posixpath" "before.posixpath.ismount('/Volumes/2GB_001')" && python -m timeit -s "import after.posixpath" "after.posixpath.ismount('/Volumes/2GB_001')"
echo "expanduser()" && python -m timeit -s "import before.posixpath" "before.posixpath.expanduser('~')" && python -m timeit -s "import after.posixpath" "after.posixpath.expanduser('~')"
echo "expanduser(b'~root')" && python -m timeit -s "import before.posixpath" "before.posixpath.expanduser(b'~root')" && python -m timeit -s "import after.posixpath" "after.posixpath.expanduser(b'~root')"
echo "_normpath_fallback()" && python -m timeit -s "import before.posixpath" "before.posixpath._normpath_fallback('foo//bar')" && python -m timeit -s "import after.posixpath" "after.posixpath._normpath_fallback('foo//bar')"
echo "abspath()" && python -m timeit -s "import before.posixpath" "before.posixpath.abspath('foo')" && python -m timeit -s "import after.posixpath" "after.posixpath.abspath('foo')"
echo "abspath('/foo')" && python -m timeit -s "import before.posixpath" "before.posixpath.abspath('/foo')" && python -m timeit -s "import after.posixpath" "after.posixpath.abspath('/foo')"
echo "realpath()" && python -m timeit -s "import before.posixpath" "before.posixpath.realpath('foo/../../..')" && python -m timeit -s "import after.posixpath" "after.posixpath.realpath('foo/../../..')"
ismount()
10000 loops, best of 5: 20.3 usec per loop # before
10000 loops, best of 5: 19.3 usec per loop # after
# -> 1.05x faster
expanduser()
200000 loops, best of 5: 1.43 usec per loop # before
200000 loops, best of 5: 1.42 usec per loop # after
# -> no difference
expanduser(b'~root')
200000 loops, best of 5: 1.82 usec per loop # before
200000 loops, best of 5: 1.75 usec per loop # after
# -> 1.04x faster (for byte users)
_normpath_fallback()
200000 loops, best of 5: 1.07 usec per loop # before
500000 loops, best of 5: 953 nsec per loop # after
# -> 1.13x faster
abspath()
20000 loops, best of 5: 16.7 usec per loop # before
20000 loops, best of 5: 16.5 usec per loop # after
# -> no difference
abspath('/foo')
500000 loops, best of 5: 509 nsec per loop # before
500000 loops, best of 5: 423 nsec per loop # after
# -> 1.20x faster (for absolute paths)
realpath()
10000 loops, best of 5: 24.4 usec per loop # before
10000 loops, best of 5: 23.9 usec per loop # after
# -> 1.02x faster

@nineteendo nineteendo mentioned this pull request Mar 29, 2024
16 tasks
@nineteendo
Copy link
Contributor Author

I believe that's everything. If you would like these changes to be split up in multiple pull requests let me know.

@nineteendo nineteendo marked this pull request as ready for review March 29, 2024 15:09
@sobolevn sobolevn requested a review from barneygale March 29, 2024 15:37
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @sobolevn says, it's very difficult to see here which changes are actually related to performance improvements, and which are simply cosmetic changes that have no impact on how the code works. Our general policy is not to accept cosmetic changes, but even if we decided that we wanted to make an exception here, any such stylistic/formatting improvements would have to go into their own PR, so that the git history clearly showed which changes were performance-related and which were cosmetic.

Please revert all changes that do not actually have any impact on performance.

@bedevere-app
Copy link

bedevere-app bot commented Mar 29, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@nineteendo
Copy link
Contributor Author

Please revert all changes that do not actually have any impact on performance.

Does that include unnesting?

@nineteendo
Copy link
Contributor Author

Is there anything you still expect from me? Or can this finally be merged? I've listened to basically all the feedback.
If you want to wait for a benchmark from me, that's understandable of course.

@nineteendo
Copy link
Contributor Author

The speed improvements seem way less impressive now. ;(
And I that's even without running my benchmark.

@nineteendo
Copy link
Contributor Author

Are you happy now?

@AlexWaygood
Copy link
Member

I re-ran your benchmarks locally. With the latest version of your PR, I'm getting a reasonable slowdown on posixpath.ismount() from your PR branch.

On main:

(main) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                                 ~/dev/cpython
500000 loops, best of 5: 602 nsec per loop
(main) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                                 ~/dev/cpython
500000 loops, best of 5: 603 nsec per loop
(main) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                                 ~/dev/cpython
500000 loops, best of 5: 607 nsec per loop
(main) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                                 ~/dev/cpython
500000 loops, best of 5: 601 nsec per loop

With your PR branch:

(speedup-os.path) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                      ~/dev/cpython
500000 loops, best of 5: 637 nsec per loop
(speedup-os.path) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                      ~/dev/cpython
500000 loops, best of 5: 650 nsec per loop
(speedup-os.path) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                      ~/dev/cpython
500000 loops, best of 5: 646 nsec per loop
(speedup-os.path) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                      ~/dev/cpython
500000 loops, best of 5: 643 nsec per loop

@nineteendo
Copy link
Contributor Author

With the latest version of your PR, I'm getting a reasonable slowdown on posixpath.ismount() from your PR branch.

You don't have a usb stick "2GB_001" plugged in, so the function will return False immediately after s1 = os.lstat(path).
I believe something has been sped up on the main branch, so I just synced my branch.

@AlexWaygood
Copy link
Member

You don't have a usb stick "2GB_001" plugged in, so the function will return False immediately after s1 = os.lstat(path).

heh, that makes sense

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nineteendo
Copy link
Contributor Author

I've updated the benchmarks. Sadly, the only noticeable speedup for regular users is calling posixpath.abspath() with an absolute path.

@AlexWaygood
Copy link
Member

I've updated the benchmarks. Sadly, the only noticeable speedup for regular users is calling posixpath.abspath() with an absolute path.

Don't be too disheartened. As a result of your efforts here, a conversation has been started about possible ways of optimising str.startswith and other string methods. If one or more of those ideas comes to fruition, that will be very impactful for Python users.

Optimising the stdlib is hard! I tried out many things that ultimately didn't work when I was working on #74690. And there were several things that did work, but which I never created PRs for, as they would have made the code too ugly or too fragile.

My advice for the future, however, would definitely be to work on small, focused PRs that have isolated, easily measurable changes. PRs like that are much easier for us to review, and you should find the contributing experience less frustrating as a result.

@nineteendo
Copy link
Contributor Author

Can this be merged, or are we still waiting on something?

@AlexWaygood
Copy link
Member

Can this be merged, or are we still waiting on something?

It can be merged, I just wanted to wait a little bit longer to give the other reviewers time to chime in on the final version of the PR, if they wanted to. I'll merge tomorrow or the day after if there are no further objections from anybody.

nineteendo and others added 2 commits April 2, 2024 21:46
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
@AlexWaygood AlexWaygood changed the title gh-117349: Speedup os.path gh-117349: Micro-optimize a few os.path functions Apr 2, 2024
@AlexWaygood AlexWaygood merged commit cae4cdd into python:main Apr 2, 2024
@AlexWaygood
Copy link
Member

Thanks @nineteendo

@nineteendo nineteendo deleted the speedup-os.path branch April 2, 2024 20:34
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@erlend-aasland
Copy link
Contributor

I believe it should indeed be possible to speed up str.startswith(prefix), but not str.startswith((prefix1, prefix2)), because there's not guarantee that the prefixes have the same length. I'll revert the optimisations for the first case.

Both cases were significantly improved by #117466:

# pre optimisation
$ ./python.exe -m timeit -s "s = 'abcdef'" "s.startswith(('abc', 'de'))"
5000000 loops, best of 5: 89 nsec per loop

# post optimisation
$ ./python.exe -m timeit -s "s = 'abcdef'" "s.startswith(('abc', 'de'))"
10000000 loops, best of 5: 26.7 nsec per loop

I think we should be careful about micro-optimising Python code, like this PR. Instead, it would be better to keep the code as idiomatic as possible, and instead optimise the Python interpreter for the idiomatic cases.

@AlexWaygood
Copy link
Member

I think we should be careful about micro-optimising Python code, like this PR. Instead, it would be better to keep the code as idiomatic as possible, and instead optimise the Python interpreter for the idiomatic cases.

I agree that we should neither accept micro-optimisations that make code significantly less idiomatic, nor changes that are purely cosmetic and have no impact on performance. I accepted the PR nonetheless because in the final iteration of the PR, all changes seemed to me to be small, localised changes that, as well as providing small speedups, mostly made the code more idiomatic, and in all cases (in my opinion) did not cause a significant deterioration in style.

@nineteendo
Copy link
Contributor Author

nineteendo commented Apr 3, 2024

Both cases were significantly improved by #117466

I didn't expect the second case to be sped up as it was a sequence, so I asked for clarification: faster-cpython/ideas#671 (comment), but didn't get a response.

This speedup must have been added afterwards, which I missed.
I obviously reverted the regular str.startswith() & str.endswith().

diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
)

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Barney Gale <barney.gale@gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants