KEMBAR78
gh-120608: Make reversed iterator work with free-threading by eendebakpt · Pull Request #120971 · python/cpython · GitHub
Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Jun 24, 2024

In this PR we make the reversed iterator thread-safe. In the free-threading build the index ro->index needs to be loaded and updated atomically. Also we need to prevent the pointer ro->seq from being released in one the thread while one of the other threads is still using it.

Update: based on review comments this PR makes the reversed work under free-threading, but it is not guaranteed to be thread-safe

  • There seems to be no dedicated file for the reversed iterator, so a new one was added,

@colesbury
Copy link
Contributor

Like we discussed in #120496, I do not think that we should make reversed thread safe.

@corona10
Copy link
Member

corona10 commented Jun 25, 2024

How impact for the single thread performance? I am +0 with this change.
(If the change is not that big but basically concur the Sam's opinion)

@eendebakpt
Copy link
Contributor Author

Like we discussed in #120496, I do not think that we should make reversed thread safe.

Maybe I misunderstood or the description of the PR is not clear enough. Without this PR the interpreter can lock or freeze in the free-threaded build. As a side effect of this PR the reversed would be thread-safe in the sense of returning expected results, but that is not the main goal.

@eendebakpt
Copy link
Contributor Author

eendebakpt commented Jun 25, 2024

Performance impact (bit noisy on my system):

list(reversed(range10)): Mean +- std dev: [ft_main] 257 ns +- 10 ns -> [ft_atomics_pr] 240 ns +- 10 ns: 1.07x faster
list(reversed(range1000)): Mean +- std dev: [ft_main] 10.3 us +- 0.3 us -> [ft_atomics_pr] 10.7 us +- 0.5 us: 1.04x slower
list(reversed(list100)): Mean +- std dev: [ft_main] 591 ns +- 15 ns -> [ft_atomics_pr] 572 ns +- 25 ns: 1.03x faster
list(reversed(tuple100)): Mean +- std dev: [ft_main] 606 ns +- 19 ns -> [ft_atomics_pr] 977 ns +- 40 ns: 1.61x slower
list(reversed(userlist100)): Mean +- std dev: [ft_main] 24.2 us +- 0.9 us -> [ft_atomics_pr] 22.9 us +- 0.9 us: 1.06x faster

Benchmark hidden because not significant (1): reversed(range10)

Geometric mean: 1.06x slower

Summary: reversed on range or list is unaffected since they have a __reversed__ method, reversed on a tuple becomes slower, reversed on a custom type is unchanged (overhead of the atomics is small)

Update: we could perhaps remove the atomics from this PR and keep the part where we do not clear ro->seq. That should give the same performance, but avoid corruption of the interpreter.

@corona10
Copy link
Member

corona10 commented Jun 25, 2024

The change requires some performance overhead, I think that we don't need to make reversed iterator thread-safe as same opinion as Sam if we need to make more implementation effort.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this PR the interpreter can lock or freeze in the free-threaded build

Ok, thanks I missed that.

The relaxed atomics loads and stores are free, but the _Py_atomic_add_ssize is relatively expensive. We want to avoid that, even at the cost of potentially returning the same item multiple times in a race condition.

  • Use relaxed atomics to load/store ro->index. (Or you can use the FT_ macros)
  • Update the NULL checks for ro->seq to instead check if ro->index is negative (in reversed_len and reversed_reduce)

eendebakpt and others added 2 commits June 25, 2024 21:01
Co-authored-by: Sam Gross <colesbury@gmail.com>
@eendebakpt
Copy link
Contributor Author

Should we also update the reversed_setstate method? With this PR we have a behaviour change even when running with just a single thread. Minimal example:

r = reversed((1,2,3))
list(r) # exhaust
r.__setstate__(1)
print(list(r)) # returns [] on normal build, returns [2, 1] on free-threaded build

I believe the behaviour on the free-threaded build makes more sense and that the behaviour on current main is a workaround for the case where ro->seq is cleared. I also doubt that people will actually run into this change in actual code, but it would be nice to have some confirmation on that.

@colesbury
Copy link
Contributor

Yes, I think both reversed_reduce and reversed_setstate should be updated.

I believe the behaviour on the free-threaded build makes more sense...

I would prefer to avoid changing the behavior, even if it doesn't make as much sense.

@eendebakpt eendebakpt changed the title gh-120608: Make reversed iterator thread-safe gh-120608: Make reversed iterator work with free-threading Jun 26, 2024
@eendebakpt eendebakpt requested a review from colesbury July 5, 2024 07:01
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Two minor suggestions.

Comment on lines 28 to 29
_ = [t.start() for t in worker_threads]
_ = [t.join() for t in worker_threads]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the preference I've heard from other core developers is to use for loops instead of using list comprehensions for their side-effects.

@ambv
Copy link
Contributor

ambv commented Mar 12, 2025

The merge with main helped with the "All green" check issue so I'm reverting my shot-in-the-dark trailing comma removal.

@ambv ambv merged commit 1fb7e2a into python:main Mar 12, 2025
42 checks passed
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…hon#120971)

Co-authored-by: Sam Gross <colesbury@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants