-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
GH-119169: Skip reversing sibling directories in os.[f]walk(topdown=False)
#119473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…wn=False)` In `os.walk(topdown=False)`, don't bother reversing `walk_dirs`. This means that sibling directories are visited in a different order, but 1) that order is arbitrary and comes from `os.scandir()`, and 2) unlike in top-down mode, users can't influence which directories are visited or in what order. This change caused `test_walk_bottom_up` to fail. I think this test made assertions that were too specific and relied on `os.scandir()` returning things in a specific order, and the test code is pretty hard to understand once you get into the details. I've replaced it with a version of the same test from `test_pathlib_abc.py`.
os.walk(topdown=False)
os.[f]walk(topdown=False)
Hey @serhiy-storchaka, just for context: this change doesn't provide much of a speedup in itself, but it enables more significant speedups seen in GH-119186. I've separated it out here because it's an observable change that breaks a test. But I suspect that doesn't matter as the |
If |
It presently visits sibling directories in the same order they appear in What I'm proposing here is that we don't need to visit sibling directories in the same order they appear in
Perhaps a worked example would be helpful? I can put one together. |
Here's an example, first with the current >>> os.mkdir('mydir')
>>> os.mkdir('mydir/subdir1')
>>> os.mkdir('mydir/subdir2')
>>> for entry in os.walk('mydir', topdown=False):
... print(entry)
...
('mydir/subdir2', [], [])
('mydir/subdir1', [], [])
('mydir', ['subdir2', 'subdir1'], []) And with this PR: >>> for entry in os.walk('mydir', topdown=False):
... print(entry)
...
('mydir/subdir1', [], [])
('mydir/subdir2', [], [])
('mydir', ['subdir2', 'subdir1'], []) Note how the entry for If we think the new behaviour is acceptable (e.g. because we never guaranteed the order of sibling visits) then it unlocks an optimization: we can add paths to the stack while scanning directories rather than in a separate loop afterwards. |
I think that it would be confusing if The fact that the current code needs to reverse the lists of directories is just an implementation detail. It is not exposed to the user. If the code used BTW, how much it makes difference? What it in comparison of using |
Hm, the |
In
os.walk(topdown=False)
, don't bother reversingwalk_dirs
. This means that sibling directories are visited in a different order, but 1) that order is arbitrary and comes fromos.scandir()
, and 2) unlike in top-down mode, users can't influence which directories are visited or in what order.This change caused
test_walk_bottom_up
to fail. I think this test made assertions that were too specific and relied onos.scandir()
returning things in a specific order. The test code is pretty hard to understand once you get into the details. I've replaced it with a version of the same test fromtest_pathlib_abc.py
. The new test checks that every directory is yielded before all its ancestors, and after all its descendants, but does not check the order of its yielded siblings.os.walk()
#119169