KEMBAR78
GH-115060: Speed up `pathlib.Path.glob()` by omitting initial `stat()` by barneygale · Pull Request #117831 · python/cpython · GitHub
Skip to content

Conversation

@barneygale
Copy link
Contributor

@barneygale barneygale commented Apr 13, 2024

Since 6258844, paths that might not exist can be fed into pathlib's globbing implementation, which will call os.scandir() / os.lstat() only when strictly necessary. This allows us to drop an initial self.is_dir() call, which saves a stat().

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('Lib'))"
20000 loops, best of 5: 13.6 usec per loop
20000 loops, best of 5: 10.4 usec per loop
# --> 1.31x faster

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('*.py'))"
5000 loops, best of 5: 88.4 usec per loop
5000 loops, best of 5: 83.8 usec per loop
# --> 1.05x faster

$ ./python -m timeit -s "from pathlib import Path; p = Path.cwd()" "list(p.glob('*'))"
2000 loops, best of 5: 145 usec per loop
2000 loops, best of 5: 139 usec per loop
# --> 1.04x faster

📚 Documentation preview 📚: https://cpython-previews--117831.org.readthedocs.build/

…stat()`

Since 6258844, paths that might not exist can be fed into pathlib's
globbing implementation, which will call `os.scandir()` / `os.lstat()` only
when strictly necessary. This allows us to drop an initial `self.is_dir()`
call, which saves a `stat()`.
Copy link
Contributor

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this was previously explicitly documented, should this have a versionchanged in the docs? Oh hmm, I guess this was just documented recently in #114036 by you, so it's probably fine... :-)

I also wonder if we can improve tests, e.g. it looks like the if not self.is_dir(): branch was not covered by tests

@barneygale
Copy link
Contributor Author

Thanks! I think it's probably not important enough for .. versionchanged::, particularly as we don't document the sorts of OSError that are raised or suppressed from is_dir().

barneygale and others added 3 commits April 13, 2024 20:06
@barneygale
Copy link
Contributor Author

barneygale commented Apr 13, 2024

On reflection, I think this works best as a .. versionchanged:: directive. Thank you for the pointer :)

@barneygale barneygale merged commit a74f117 into python:main Apr 13, 2024
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…stat()` (python#117831)

Since 6258844, paths that might not exist can be fed into pathlib's
globbing implementation, which will call `os.scandir()` / `os.lstat()` only
when strictly necessary. This allows us to drop an initial `self.is_dir()`
call, which saves a `stat()`.

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage topic-pathlib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants