KEMBAR78
gh-95782: Fix `io.BufferedReader.tell` etc. being able to return offsets < 0 by 6t8k · Pull Request #99709 · python/cpython · GitHub
Skip to content

Conversation

@6t8k
Copy link
Contributor

@6t8k 6t8k commented Nov 22, 2022

lseek() always returns 0 for character pseudo-devices like /dev/urandom (for other non-regular files, e.g. /dev/stdin, it always returns -1, to which CPython reacts by raising appropriate exceptions). They are thus technically seekable despite not having seek semantics.

When calling read() on e.g. an instance of io.BufferedReader that wraps such a file, BufferedReader reads ahead, filling its buffer, creating a discrepancy between the number of bytes read and the internal tell() always returning 0, which results in e.g. BufferedReader.tell() or BufferedReader.seek() being able to return positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning max(former_return_value, 0) instead, and add some corresponding tests.


Other possible solutions that I've decided against and why (aside from the aspect that they wouldn't reflect the underlying file's behavior):

  • Returning a positive offset, i.e. the number of bytes read, similar to how e.g. io.BufferedWriter.tell does for bytes written
    • Couldn't find a neat and correct way to do this for the current implementation at a glance
    • Wouldn't make much sense to implement this just for the special case at hand
  • Raising an exception if the return value would be negative
    • Much higher chance of breaking existing code
  • Changing the detection logic for unseekable files so that /dev/urandom etc. would be detected as unseekable
    • Much higher chance of breaking existing code
    • New detection logic could have (unintended) side effects

@ghost
Copy link

ghost commented Nov 22, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@serhiy-storchaka serhiy-storchaka enabled auto-merge (squash) February 17, 2024 10:43
@serhiy-storchaka serhiy-storchaka added needs backport to 3.11 only security fixes needs backport to 3.12 only security fixes labels Feb 17, 2024
@serhiy-storchaka serhiy-storchaka merged commit 26800cf into python:main Feb 17, 2024
@miss-islington-app
Copy link

Thanks @6t8k for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 17, 2024
…n offsets < 0 (pythonGH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
(cherry picked from commit 26800cf)

Co-authored-by: 6t8k <58048945+6t8k@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Feb 17, 2024

GH-115599 is a backport of this pull request to the 3.12 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 17, 2024
…n offsets < 0 (pythonGH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
(cherry picked from commit 26800cf)

Co-authored-by: 6t8k <58048945+6t8k@users.noreply.github.com>
@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Feb 17, 2024
@bedevere-app
Copy link

bedevere-app bot commented Feb 17, 2024

GH-115600 is a backport of this pull request to the 3.11 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.11 only security fixes label Feb 17, 2024
serhiy-storchaka pushed a commit that referenced this pull request Feb 17, 2024
…rn offsets < 0 (GH-99709) (GH-115600)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
(cherry picked from commit 26800cf)

Co-authored-by: 6t8k <58048945+6t8k@users.noreply.github.com>
serhiy-storchaka pushed a commit that referenced this pull request Feb 17, 2024
…rn offsets < 0 (GH-99709) (GH-115599)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
(cherry picked from commit 26800cf)

Co-authored-by: 6t8k <58048945+6t8k@users.noreply.github.com>
@6t8k 6t8k deleted the gh-95782 branch February 18, 2024 17:38
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this pull request Mar 4, 2024
…n offsets < 0 (pythonGH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…n offsets < 0 (pythonGH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
LukasWoodtli pushed a commit to LukasWoodtli/cpython that referenced this pull request Jan 22, 2025
…n offsets < 0 (pythonGH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants