-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Bug report
Bug description:
The BINSTRING opcode in Pickle has two arguments: a "4-byte little-endian signed int" length, and a string of that many bytes (see the code comment for BINSTRING in pickletools). Since it is signed, this means that any provided value over 0x7fffffff would be interpreted as a negative number. The Python pickle implementation specifically treats it as a signed 32-bit length and checks to see if the length is < 0:
Lines 1454 to 1458 in 754e7c9
| def load_binstring(self): | |
| # Deprecated BINSTRING uses signed 32-bit length | |
| len, = unpack('<i', self.read(4)) | |
| if len < 0: | |
| raise UnpicklingError("BINSTRING pickle has negative byte count") |
However, the C pickle implementation runs calc_binsize(s, 4) for BINSTRING and returns a Py_ssize_t. Since Py_ssize_t data types are the same size as the compiler's size_t type (PEP0353), this means a Py_ssize_t is 64-bits long on 64-bit systems. Since the size variable here is also a Py_ssize_t, that means the threshold for negative values is much higher.
Lines 5546 to 5558 in a58026a
| Py_ssize_t size; | |
| char *s; | |
| if (_Unpickler_Read(self, st, &s, nbytes) < 0) | |
| return -1; | |
| size = calc_binsize(s, nbytes); | |
| if (size < 0) { | |
| PyErr_Format(st->UnpicklingError, | |
| "BINSTRING exceeds system's maximum size of %zd bytes", | |
| PY_SSIZE_T_MAX); | |
| return -1; | |
| } |
This is all just the background to illustrate that because size is not an int, a pickle with the BINSTRING opcode using a length > 0x7fffffff will fail in the Python implementation (since it's negative), but deserialize just fine in the C implementation.
The following payload will demonstrate the difference:
payload: b'T\x00\x00\x00\x80'+b'a'*0x80000000 + b'.'
pickle: FAILURE BINSTRING pickle has negative byte count
_pickle.c: aaaaaaaaaaaaaaaaaaa....
pickletools: pickletools failed string4 byte count < 0: -2147483648
The required payload does need to be 2GB in size which is very large, but not impossible on modern systems.
Note that the LONG4 opcode is in a similar situation, except the output for calc_binint() is an int data type so this issue does not exist there.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
- gh-135321: Changing data type of
sizevariable forBINSTRINGin_pickle#135322 - [3.14] gh-135321: Always raise a correct exception for BINSTRING argument > 0x7fffffff in pickle (GH-135322) #135382
- [3.13] gh-135321: Always raise a correct exception for BINSTRING argument > 0x7fffffff in pickle (GH-135322) #135383
Metadata
Metadata
Assignees
Labels
Projects
Status