KEMBAR78
gh-122356: restore the position of a file-like object after `zipfile.is_zipfile` by picnixz · Pull Request #122397 · python/cpython · GitHub
Skip to content

Conversation

@picnixz
Copy link
Member

@picnixz picnixz commented Jul 29, 2024

Not sure if you want it to be backported to 3.12 and 3.13, since the corresponding PR (#26488) for tarfile was not backported.

@picnixz picnixz added the stdlib Standard Library Python modules in the Lib/ directory label Jul 29, 2024
@picnixz picnixz requested a review from jaraco October 24, 2024 12:23
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the behavior for unseekable file before and after this change?

@picnixz
Copy link
Member Author

picnixz commented Nov 15, 2024

Everything is wrapped in a huge except OSError: pass:

def is_zipfile(filename):
"""Quickly see if a file is a ZIP file by checking the magic number.
The filename argument may be a file or file-like object too.
"""
result = False
try:
if hasattr(filename, "read"):
result = _check_zipfile(fp=filename)
else:
with open(filename, "rb") as fp:
result = _check_zipfile(fp)
except OSError:
pass
return result

OTOH, _check_zipfile is also wrapped in except OSError: pass:

def _check_zipfile(fp):
try:
if _EndRecData(fp):
return True # file has correct magic number
except OSError:
pass
return False
.

And _EndRecData() calls .tell():

def _EndRecData(fpin):
"""Return data from the "End of Central Directory" record, or None.
The data is a list of the nine items in the ZIP "End of central dir"
record followed by a tenth item, the file seek offset of this record."""
# Determine file size
fpin.seek(0, 2)
filesize = fpin.tell()
.

So technically, before the exception would be raised from _EndRecData(), ignored in _check_zipfile and would return False in the end (for unseekable fds). Now, the exception would directly be raised in is_zipfile() and ignored and then would return False.

Do you prefer _EndRecData() to handle the position of the file-like object?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not though that this is important, but tarfile.is_tarfile() restores the file position (although this is a recent behavior). And imghdr.what() restored it. So it is safer to follow.

LGTM.

Comment on lines +244 to +246
pos = filename.tell()
result = _check_zipfile(fp=filename)
filename.seek(pos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing that a user will never call is_zipfile on a non-seekable stream (e.g. a web response stream)... but if they did, this change would break those cases. Oh, I just checked, and the current check already relies on a seekable fp, so this change has no effect on that behavior.

@jaraco jaraco merged commit e0ef08f into python:main Nov 24, 2024
40 checks passed
@picnixz picnixz deleted the feat-restore-pos-after-is-zipfile branch November 25, 2024 14:51
ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stdlib Standard Library Python modules in the Lib/ directory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants