KEMBAR78
gh-88569: add `ntpath.isreserved()` by barneygale · Pull Request #95486 · python/cpython · GitHub
Skip to content

Conversation

@barneygale
Copy link
Contributor

@barneygale barneygale commented Jul 31, 2022

Add ntpath.isreserved(), which identifies reserved pathnames such as "NUL", "AUX" and "CON".

Deprecate pathlib.PurePath.is_reserved().

@eryksun
Copy link
Contributor

eryksun commented Jul 31, 2022

In ntpath.isreserved(), let's remove the exception for UNC paths. GetFullPathNameW() doesn't reserve DOS device names in UNC paths, but the SMB provider disallows creating files with DOS device names. It's the most common and most important UNC provider. For example, on the "//localhost/C$" SMB share:

>>> for fn in ('nul', 'con', 'aux', 'prn', 'com1', 'lpt1'):
...     try:
...         open(f'//localhost/C$/Temp/{fn}', 'w')
...     except PermissionError:
...         print(f'{fn} disallowed')
...
nul disallowed
con disallowed
aux disallowed
prn disallowed
com1 disallowed
lpt1 disallowed

Also, a name that ends with a space or dot should be reserved. The system's path normalization strips trailing spaces and dots from the final path component (e.g. "spam. . ." -> "spam"). Path normalization applies to all cases except opening "\\?\" extended paths.

Also, the characters :<>"?*| are reserved in file names, as are as well the ASCII control characters (ordinals 0-31). (The characters : and | aren't reserved by all file systems, but generally they are.)

In file systems that support file streams, : delimits the stream name and type (e.g. "filename:streamname:$DATA"). Stream names can include the characters <>"?*| as well as control characters other than null (ordinals 1-31). I think it's simplest to just reserve a name that includes :, so names that specify a file stream are marked as reserved for general use. Code that needs to use file streams should be aware of this and separately validate the file name and stream name.

@ChrisDenton
Copy link

Not sure if it's relevant to this function but Windows 11 has greatly simplified how reserved names work. Now C:\path\COM1 is not reserved and neither is aux.c. But aux on its own still is (and aux.. .. because trailing spaces and dots are always stripped).

@eryksun
Copy link
Contributor

eryksun commented Jul 31, 2022

Not sure if it's relevant to this function but Windows 11 has greatly simplified how reserved names work. Now C:\path\COM1 is not reserved and neither is aux.c. But aux on its own still is (and aux.. .. because trailing spaces and dots are always stripped).

In Windows 11, path normalization no longer special cases a DOS device name if it has an extension (e.g. "con.txt"). Also a DOS device name isn't special cased if it's the leaf component of a path -- except for the "NUL" device. For example:

>>> nt._getfullpathname('nul.txt')
'C:\\Temp\\nul.txt'
>>> nt._getfullpathname('c:/temp/nul') # NUL is still reserved
'\\\\.\\nul'

However, since DOS device names are still special cased as unqualified names and still reserved by the SMB server, they should still be avoided as file names. For example, creating a file named "con" in the current working directory would have to use the path "./con".

@barneygale
Copy link
Contributor Author

@eryksun what should ntpath.isreserved('.') return? Thanks

@eryksun
Copy link
Contributor

eryksun commented Jul 31, 2022

@eryksun what should ntpath.isreserved('.') return? Thanks

Isn't that a cross-platform question? The names "." and ".." are reserved in POSIX and Windows. For ".." it always has to be exact. Otherwise trailing dots and spaces are stripped in Windows. For example:

>>> nt._getfullpathname('z:/spam/..')
'z:\\'
>>> nt._getfullpathname('z:/spam/...')
'z:\\spam\\'
>>> nt._getfullpathname('z:/spam/.. .')
'z:\\spam\\'

@barneygale
Copy link
Contributor Author

That would make pathlib.Path().is_reserved() always true. I'm not convinced tbh. I can't see any existing CPython bug or feature request that covers the behaviour change you're requesting.

@eryksun
Copy link
Contributor

eryksun commented Jul 31, 2022

That would make pathlib.Path().is_reserved() always true. I'm not convinced tbh. I can't see any existing CPython bug or feature request that covers the behaviour change you're requesting.

I'm referring to the base name, i.e. os.fsdecode(basename(path)), in terms of answering the question, can one create a file or directory with this exact name? One cannot create/overwrite a file named "." or "..". The names are reserved as implicit hard links or aliases to the current directory and parent directory. It's no different from something like "nul" in Windows. For example, open('c:/Temp/nul', 'w') succeeds because it is a valid path. But the name "nul" implicitly resolves to "\\.\nul" (even in Windows 11).

@brettcannon brettcannon requested a review from eryksun August 5, 2022 22:40
Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but I think there's an extraneous function call to take out. I would also like @eryksun to make sure we are copying over the right implementation details, or make any tweaks as necessary now if possible.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@barneygale
Copy link
Contributor Author

Thinking out loud:

There's probably a distinction between reserved names and invalid names, analogous to the distinction between reserved Python keywords (for, break, etc) and invalid Python identifiers (my-var, 3dpoints)

pathlib.PurePath.is_reserved() consciously concerns itself only with the former case (at least for the moment!)

@eryksun
Copy link
Contributor

eryksun commented Aug 6, 2022

There's probably a distinction between reserved names and invalid names,

Does this mean that you don't want to reserve names that contain the wildcard characters *, ?, " (i.e. DOS_DOT), < (i.e. DOS_STAR), and > (i.e. DOS_QM)? Wildcard characters are illegal in the name of a file or directory because they're used in system calls such as NtQueryDirectoryFile() (its FileName parameter can contain wildcards to filter a directory listing) and ultimately by filesystem device drivers via runtime library functions such as FsRtlIsNameInExpression().

What about names that are illegal to create because they're reserved in other contexts? That's the case when creating a file with a DOS device name on an SMB share. SMB disallows creating a file named "nul", for example, because it causes problems with accessing the file directly on the server (e.g. as "C:\share\nul").

What about base names that contain colons? For example, in an NTFS filesystem, "spam:00" creates a file named "spam" that contains a data stream named "00". That can surprise even experienced developers, particularly if they come from a POSIX background. In a FAT filesystem, "spam:00" is an invalid name. In a VboxSharedFolderFS (VirtualBox) filesystem with a Linux host, "spam:00" is allowed as the literal name.

What about a name that changes when accessed because trailing dots and spaces are stripped? For example, "spam ." -> "spam".

What about "." and ".." components in "\\?\" extended paths? In Windows, "." and ".." components are handled in the user-mode API. However, normalization is skipped for a "\\?\" extended path, so the filesystem is passed a path that contains "." and ".." components. The results can be dysfunctional and surprising. For example, FAT filesystems (volume "E:" in this example) allow creating regular files named "." and "..":

>>> open(r'\\?\E:\Temp\.', 'w').close()
>>> open(r'\\?\E:\Temp\..', 'w').close()
>>> print(*[(e[-2], e[-1], e[0]) for e in win32file.FindFilesW('*')], sep='\n')
('.', '', 16)
('..', '', 16)
('.', 'F93F~1', 32)
('..', 'C59D~1', 32)  

The first two are the required "." and ".." entries, which are directories (16) and have no short name. The last two are regular files with the archive attribute set (32) and legacy short names.

Otherwise, FAT filesystems don't support opening paths with literal "." and ".." entries. The open fails with ERROR_PATH_NOT_FOUND (i.e. C ENOENT). For example:

>>> open(r'\\?\E:\Temp\..\spam.txt', 'w').close()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '\\\\?\\E:\\Temp\\..\\spam.txt' 

NTFS fails all three of the above cases with ERROR_INVALID_NAME (i.e. C EINVAL).

@barneygale
Copy link
Contributor Author

@eryksun whats your view on the existing pathlib.PurePath.is_reserved() method? It doesn't perform any of the validity checks you mention. Is it broken without them? If so, perhaps we should consider deprecating the method, rather than expanding it and chasing a long tail of pathname validity bugs?

@eryksun
Copy link
Contributor

eryksun commented Aug 7, 2022

whats your view on the existing pathlib.PurePath.is_reserved() method? It doesn't perform any of the validity checks you mention. Is it broken without them? If so, perhaps we should consider deprecating the method, rather than expanding it and chasing a long tail of pathname validity bugs?

The is_reserved() method is useful in theory, but it isn't used much. A sanitizepath() function would be more useful. Regarding the design, it was a mistake to exclude UNC paths. Just because one can create "//?/C:/Temp/nul", that doesn't mean it's not a troublesome file name that should never be used. Also, file names that end with trailing spaces and dots have to be reserved.

As to reserved characters and "." and ".." components in extended paths, I'm just putting it out there for discussion. I'm less concerned about the five wildcard characters. They're always disallowed in base file names (but not stream names). A filesystem that didn't exclude them would be broken. Creating a file named "spam?.txt" isn't going to magically succeed in a surprising way. But a filesystem might allow :, |, and ASCII control characters in base file names, even though they should be illegal. VboxSharedFolderFS does. If a name like that doesn't get sanitized, it can cause problems later on, e.g. when trying to copy the file to another filesystem. The potentially surprising behavior with names that specify a data stream is the worst offender (e.g. "spam:.txt" has a stream named ".txt"). If I could only choose one character to flag as reserved in base file names, it would be colon.

Co-authored-by: Eryk Sun <eryksun@gmail.com>
@eryksun
Copy link
Contributor

eryksun commented Aug 9, 2022

The following check would need to be removed from PureWindowsPathTest.test_is_reserved() in "Lib/test/test_pathlib.py":

        # UNC paths are never reserved.
        self.assertIs(False, P('//my/share/nul/con/aux').is_reserved()) 

Though this entire pathlib test is redundant now. As you consolidate the implementations, you might want to remove redundant tests, and only keep tests that are specific to how pathlib uses the os.path flavour interfaces.

@eryksun
Copy link
Contributor

eryksun commented Jan 9, 2024

Considering the rules about which paths are reserved have changed in recent Windows releases (e.g. '.\\aux' is no longer reserved - I believe only nul is still reserved anywhere in a path, and the rest need to be the entire path), I'm even more cautious about the whole idea of a reserved path.

Even in Windows 11, the SMB redirector still reserves names of legacy DOS devices in the filename part of an open/create (e.g. NUL, CON, PRN, AUX, LPT1-LPT9, COM1-COM9). At least SMB has never reserved DOS device names with a file extension (e.g. "con.txt"). And at least it just denies access instead of trying to access a remote device. It would be nice if the next version of the SMB protocol allowed legacy DOS device names in the filename part of an open/create.

@zooba
Copy link
Member

zooba commented Jan 15, 2024

I don't have any other concerns with this. However, I think we should refer to os.path.isreserved in the NEWS entry (and clarify that it's only on Windows), and refer to ntpath in the commit message/PR title. Right now these are the opposite way around.

@zooba
Copy link
Member

zooba commented Jan 15, 2024

@brettcannon Your objection was a while ago and I believe has been addressed. You don't need to re-review unless you want to, but I think we'd like an ack that you aren't blocking the PR anymore

@brettcannon brettcannon self-requested a review January 15, 2024 23:14
@brettcannon
Copy link
Member

@barneygale I refreshed my review and I'm not blocking this. 🙂

@barneygale barneygale changed the title gh-88569: add os.path.isreserved() gh-88569: add ntpath.isreserved() Jan 16, 2024
@brettcannon brettcannon removed their request for review January 16, 2024 19:28
@barneygale
Copy link
Contributor Author

@zooba just checking you're happy for me to merge?

@zooba
Copy link
Member

zooba commented Jan 26, 2024

Yeah, go ahead

@barneygale
Copy link
Contributor Author

Thanks everyone for the help with this

@barneygale barneygale merged commit 7e31d6d into python:main Jan 26, 2024
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
Add `ntpath.isreserved()`, which identifies reserved pathnames such as "NUL", "AUX" and "CON".

Deprecate `pathlib.PurePath.is_reserved()`.

---------

Co-authored-by: Eryk Sun <eryksun@gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Aug 30, 2024
When invoking `git` to find the configuration file path associated
with the `git` installation itself, this sets `GIT_DIR` to a path
that cannot be a `.git` directory for any repository, to keep
`git config -l` from including any local scope entries in the
output of the `git config -l ...` command that is used to find the
origin for the first Git configuration variable.

Specifically, a path to the null device is used. This is
`/dev/null` on Unix and `NUL` on Windows. This is not a directory,
and when treated as a file it is always treated as empty: reading
from it, if successful, reaches end-of-file immediately.

This problem is unlikely since GitoxideLabs#1523, which caused this `git`
invocation to use a `/tmp`-like location (varying by system and
environment) as its current working directory. Although the goal of
that change was just to improve performance, it pretty much fixed
the bug where local-scope configuration could be treated as
installation-level configuration when no configuration variables
are available from higher scopes.

This change further hardens against two edge cases:

- If the `git` command is an old and unpatched vulnerable version
  in which `safe.directory` is not yet implemented, or in which
  GHSA-j342-m5hw-rr3v
  or other vulnerabilities where `git` would perform operations on
  untrusted local repositories owned by other users are unpatched,
  then a `.git` subdirectory of a shared `/tmp` or `/tmp`-like
  directory could be created by another account, and its local
  configuration would still have been used. (This is not a bug in
  gitoxide per se; having vulnerable software installed that other
  software may use is inherently insecure. But it is nice to offer
  a small amount of protection against this when readily feasible.)

- If the `/tmp`-like location is a Git repository owned by the
  current user, then its local configuration would have been used.

Any path guaranteed to point to a nonexistent entry or one that is
guaranteed to be (or to be treated as) an empty file or directory
should be sufficient here. Using the null device, even though it is
not directory-like, seems like a reasonably intuitive way to do it.

A note for Windows: There is more than one reasonable path to the
null device. One is DOS-style relative path `NUL`, as used here.
One of the others, which `NUL` in effect resolves to when opened,
is the fully qualified Windows device namespace path `\\.\NUL`. I
used the former here to ensure we avoid any situation where `git`
would misinterpret a `\` in `\\.\NUL` in a POSIX-like fashion. This
seems unlikely, and it could be looked into further if reasons
surface to prefer `\\.\NUL`.

One possible reason to prefer `\\.\NUL` is that which names are
treated as reserved legacy DOS device names changes from version to
version of Windows, with Windows 11 treating some of them as
ordinary filenames. However, while this affects names such as
`CON`, it does not affect `NUL`, at least written unsuffixed. I'm
not sure if any Microsoft documentation has yet been updated to
explain this in detail, but see:

- dotnet/docs#41193
- python/cpython#95486 (comment)
- python/cpython#95486 (comment)

At least historically, it has been possible on Windows, though
rare, for the null device to be absent. This was the case on
Windows Fundamentals for Legacy PCs (WinFPE). Even if that somehow
were ever to happen today, this usage should be okay, because
attempting to open the device would still fail rather than open
some other file (as may even be happening in Git for Windows
already), the name `NUL` would still presumably be reserved (much
as the names `COM?` where `?` is replaced with a Unicode
superscript 1, 2, or 3 are reserved even though those devices don't
really exist), and I think `git config -l` commands should still
shrug off the error opening the file and give non-local-scope
configuration, as it does when `GIT_DIR` is set to a nonexistent
location.
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
Add `ntpath.isreserved()`, which identifies reserved pathnames such as "NUL", "AUX" and "CON".

Deprecate `pathlib.PurePath.is_reserved()`.

---------

Co-authored-by: Eryk Sun <eryksun@gmail.com>
Co-authored-by: Brett Cannon <brett@python.org>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants