-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Closed
Labels
Description
In #104512 we made pathlib.Path.glob() use a "walk-and-filter" strategy for expanding ** wildcards in patterns: when we encounter a ** segment, we immediately consume subsequent segments and use them to build a regex that is used to filter results. This saves a bunch of scandir() calls.
However! We actually build a regex for the entire pattern given to glob(), rather than just the segments following ** wildcards. And so when evaluating a pattern like dir*/**/file*, the dir* part is needlessly matched twice against each path. @zooba noted this in a review comment at the time.
We should be able to improve performance by building an re.Pattern only for segments following ** wildcards, and not the entire glob() pattern.
Linked PRs
- GH-115060: Speed up
pathlib.Path.glob()by removing redundant regex matching #115061 - GH-115060: Speed up
pathlib.Path.glob()by skipping directory scanning #116152 - GH-115060: Speed up
pathlib.Path.glob()by not scanning literal parts #117732 - GH-115060: Speed up
pathlib.Path.glob()by omitting initialstat()#117831