-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Feature or enhancement
Proposal:
Code reading data in pure python tends to make a buffer variable, call os.read() which returns a separate newly allocated buffer of data, then copy/append that data onto the pre-allocated buffer[0]. That creates unnecessary extra buffer objects, as well as unnecessary copies. Provide os.readinto for directly filling a Buffer Protocol object.
os.readinto should closely mirror _Py_read which underlies os.read in order to get the same behaviors around retries as well as well-tested cross-platform support.
Move simple cases that use os.read (ex. [0]) to use the new API when it makes code simpler and more efficient. Potentially adding readinto to more readable/writeable file-like proxy objects or objects which transform the data (ex. Lib/_compression) is out of scope for this issue.
[0]
Lines 1914 to 1921 in 298dda5
| # Wait for exec to fail or succeed; possibly raising an | |
| # exception (limited in size) | |
| errpipe_data = bytearray() | |
| while True: | |
| part = os.read(errpipe_read, 50000) | |
| errpipe_data += part | |
| if not part or len(errpipe_data) > 50000: | |
| break |
cpython/Lib/multiprocessing/forkserver.py
Lines 384 to 392 in 298dda5
| def read_signed(fd): | |
| data = b'' | |
| length = SIGNED_STRUCT.size | |
| while len(data) < length: | |
| s = os.read(fd, length - len(data)) | |
| if not s: | |
| raise EOFError('unexpected EOF') | |
| data += s | |
| return SIGNED_STRUCT.unpack(data)[0] |
Lines 1695 to 1701 in 298dda5
| def readinto(self, b): | |
| """Same as RawIOBase.readinto().""" | |
| m = memoryview(b).cast('B') | |
| data = self.read(len(m)) | |
| n = len(data) | |
| m[:n] = data | |
| return n |
os.read loops to migrate
Well contained os.read loops
-
multiprocessing.forkserver read_signed- @cmaloney - gh-129205: Update multiprocessing.forkserver to use os.readinto #129425 [x]subprocess Popen._execute_child- @cmaloney - gh-129205: Use os.readinto() in subprocess errpipe_read #129498
os.read loop interleaved with other code
-
_pyio FileIO.read FileIO.readall FileIO.readintosee, Reduce copies when reading files in pyio, match behavior of _io #129005 -- @cmaloney -
_pyrepl.unix_console UnixConsole.input_buffer-- fixed length underlying buffer with "pos" / window on top. -
pty _copy. Operates around a "high waterlevel" / attempt to have a fixed-ish size buffer. Wrapsos.readwith a_readfunction. -
subprocess Popen.communicate. Note, this feels like something non-contiguous Py_buffer would be really good for, particularly inself.text_modewhere currently all the bytes are "copied" into a contiguousbytesto turn then turn into text... -
tarfile _Stream._read and _Stream.__read. Note, builds _LowLevelFile aroundos.read, but other read methods also available.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
Linked PRs
- gh-129205: Add os.readinto API for reading data into a caller provided buffer #129211
- gh-129205: Modernize test_eintr #129316
- gh-129205: Update multiprocessing.forkserver to use os.readinto #129425
- gh-129205: Use os.readinto() in subprocess errpipe_read #129498
- gh-129205: Experiment BytesIO._readfrom() #130098