-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Bug report
description
Using gzip.compress() with mtime=0 in 3.8<=cpython<=3.10, the OS byte, i.e. the 10th byte in the GZIP header, is set to 255 "unknown" (also see e.g. #83302):
Line 599 in dc0adb4
| return struct.pack("<BBBBLBB", 0x1f, 0x8b, 8, 0, int(mtime), xfl, 255) |
However, in cpython 3.11 and 3.12, the OS byte is suddenly set to a "known" value, e.g. 3 ("Unix") on Ubuntu.
This is not mentioned in the changelog for Python 3.11.
This may lead to problems in the context of reproducible builds. In our case, hash checking fails after decompressing and re-compressing a gzipped archive.
how to reproduce
Here's an example, where byte 10 is \xff in python 3.10 and \x03 in python 3.11:
~ $ python
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
>>> import gzip
>>> gzip.compress(b'', mtime=0)
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x02\xff\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00'
~ $ pyenv shell 3.11
~ $ python
Python 3.11.6 (main, Nov 23 2023, 17:30:16) [GCC 11.4.0] on linux
>>> import gzip
>>> gzip.compress(b'', mtime=0)
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x02\x03\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00'cause
I guess this is caused by python 3.11 delegating the gzip.compress() call to zlib if mtime=0, as mentioned in the docs:
Changed in version 3.11: Speed is improved by compressing all data at once instead of in a streamed fashion. Calls with mtime set to 0 are delegated to zlib.compress() for better speed.
and source:
Lines 609 to 612 in 89ddea4
| if mtime == 0: | |
| # Use zlib as it creates the header with 0 mtime by default. | |
| # This is faster and with less overhead. | |
| return zlib.compress(data, level=compresslevel, wbits=31) |
Apparently zlib does set the OS byte.
CPython versions tested on:
3.8, 3.9, 3.10, 3.11, 3.12
Operating systems tested on:
Linux, macOS, Windows
Linked PRs
- gh-112346: Bugfix: Remove faster codepath from gzip.compress as it introduces behavioral inconsistencies #114116
- gh-112346: Document the OS byte in
gzip.compressoutput change in 3.11 #120480 - gh-112346: Always set OS byte to 255, simpler gzip.compress function. #120486
- [3.13] gh-112346: Always set OS byte to 255, simpler gzip.compress function. (GH-120486) #120563
- [3.13] gh-112346: Document the OS byte in
gzip.compressoutput change in 3.11 (GH-120480) #120612 - [3.12] gh-112346: Document the OS byte in
gzip.compressoutput change in 3.11 (GH-120480) #120613 - [3.11] gh-112346: Document the OS byte in
gzip.compressoutput change in 3.11 (GH-120480) #120614
Metadata
Metadata
Assignees
Labels
Projects
Status