KEMBAR78
GH-126363: Speed up pattern parsing in `pathlib.Path.glob()` by barneygale · Pull Request #126364 · python/cpython · GitHub
Skip to content

Conversation

@barneygale
Copy link
Contributor

@barneygale barneygale commented Nov 3, 2024

The implementation of Path.glob() does rather a hacky thing: it calls self.with_segments() to convert the given pattern to a Path object, and then peeks at the private _raw_path attribute to see if pathlib removed a trailing slash from the pattern.

In this patch, we make glob() use a new _parse_pattern() classmethod that splits the pattern into parts while preserving information about any trailing slash. This skips the cost of creating a Path object, and avoids some path anchor normalization, which makes Path.glob() slightly faster. But mostly it's about making the code less naughty.

This makes a no-match glob ~50% faster:

$ ./python -m timeit -s "import pathlib; p = pathlib.Path()" "list(p.glob('nope'))" 
50000 loops, best of 5: 8.3 usec per loop  # before
50000 loops, best of 5: 5.3 usec per loop  # after

The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.
Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
@barneygale barneygale merged commit 9b7294c into python:main Nov 4, 2024
36 checks passed
picnixz pushed a commit to picnixz/cpython that referenced this pull request Dec 8, 2024
…ython#126364)

The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
…ython#126364)

The implementation of `Path.glob()` does rather a hacky thing: it calls
`self.with_segments()` to convert the given pattern to a `Path` object, and
then peeks at the private `_raw_path` attribute to see if pathlib removed a
trailing slash from the pattern.

In this patch, we make `glob()` use a new `_parse_pattern()` classmethod
that splits the pattern into parts while preserving information about any
trailing slash. This skips the cost of creating a `Path` object, and avoids
some path anchor normalization, which makes `Path.glob()` slightly faster.
But mostly it's about making the code less naughty.

Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance or resource usage topic-pathlib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants