KEMBAR78
gh-124285: document assumptions on __bool__/__len__ behaviour by skirpichev · Pull Request #124723 · python/cpython · GitHub
Skip to content

Conversation

@skirpichev
Copy link
Contributor

@skirpichev skirpichev commented Sep 28, 2024

true.

Two successive calls to :meth:`!__bool__` on the same object must
return same value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if the object is mutated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that means - some other method was called on the object in between. Thus, calls to __bool__ weren't actually successive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object could have been mutated by another thread between the two calls...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by another thread

... that calls some other object's method

between the two calls...

:)

Copy link
Contributor Author

@skirpichev skirpichev Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how about this: "The __bool__ method can't mutate any objects."? (Ditto for __len__.)

This probably is a more strong requirement than actually need in current optimizations, but I doubt it blocks something useful.

@JelleZijlstra
Copy link
Member

As @AlexWaygood implied above, even builtin types can't provide the guarantee your current text for: if you call bool() or len() twice on a list, another thread might have run in between and mutated the list.

I think instead, we should say that in cases where the interpreter implicitly calls bool(), it is unspecified how often bool() is called, and it is unspecified what happens if multiple calls happen and they return different results. Not sure where to put that, though.

@AlexWaygood
Copy link
Member

The language spec already says with regard to hashability:

The __hash__() method should return an integer. The only required property is that objects which compare equal have the same hash value

So some interesting questions that come to mind for me here are:

  1. Does it ever make sense for objects that compare equal to have different boolean values?
  2. Does it ever make sense for objects that have different boolean values to compare equal?
  3. Does it ever make sense for x == y but len(x) != len(y)?

If we think that well-behaved code should never do those things, and that doing those things could violate some assumptions made by Python in some places, then we could consider adding a similar note to the language spec for __bool__, i.e.

A well-defined __bool__() method should never report different results for two objects that compare equal

But I'm not sure if it's worth doing so for __len__ as well/I'm not sure what a good phrasing might be there.

Additionally, this sort-of feels like an implicit requirement for an awful lot of dunders, really. I would find it pretty surprising if bytes(x) and bytes(y) produced different results for two objects x and y where x == y. Similarly for __int__, __index__, __neg__, etc... So that's an argument for not putting a special note in the entry for either __bool__ or __len__.

@skirpichev
Copy link
Contributor Author

skirpichev commented Sep 29, 2024

another thread might have run in between and mutated the list.

Then this example will not satisfy to added remark: calls to __bool__ weren't actually successive.

doing those things could violate some assumptions made by Python in some places

Do you have examples? While your points look reasonable, I'm not sure that such assumptions are actually used somewhere.

But I'm not sure if it's worth doing so for __len__

Well, in current version I have to add symmetric note for __len__ just because that helper might be used instead of __bool__ in some implicit call to bool().

So that's an argument for not putting a special note in the entry for either __bool__ or __len__.

There is a difference. The __bool__ method in general can return different values for same object; but the reference implementation does assume it can't be in certain conditions.

@TeamSpen210
Copy link

Maybe instead of saying something about __bool__(), we could instead say it about and/or. In a chain of logical ops, the interpreter is allowed to calculate the boolean value of any given term only once. Or more broadly, say in any given expression the boolean value of any given sub-expression may be reused if already calculated.

@iritkatriel
Copy link
Member

I think the docs can just say that __bool__ and __len__ should be idempotent.

@skirpichev
Copy link
Contributor Author

I think the docs can just say that __bool__ and __len__ should be idempotent.

That's a more short version of the current statement, isn't? But I worry it might require explanation of the term. E.g. we don't expect that reader knows about complex numbers.

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested wording added inline. Mutating objects shared between threads is another way of hitting this particular piece of implementation defined behaviour, so I think it's best just to state it that way.

I think "no mutation (of this object, or any other object) " is the right expected invariant to specify, since that's the assumption that gets violated in the multi-threading case.

Comment on lines +2046 to +2047
Two successive calls to :meth:`!__bool__` on the same object must
return same value.
Copy link
Contributor

@ncoghlan ncoghlan Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Two successive calls to :meth:`!__bool__` on the same object must
return same value.
Note: to help optimize logical expressions, implementations are permitted to assume
that calls to :meth:`!__bool__` will not mutate that object, nor any other object.
While this expected invariant is not explicitly enforced, failing to abide
by it will result in implementation dependent runtime behaviour. This
implementation dependent behaviour may also be encountered when mutable
objects are shared across threads without appropriate synchronization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if this needs to be explicitly documented as the behavior is dependent on the user's choice of implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is dependent on the user's choice of implementation.

Rather on bugs in the user code, if someone will implement __bool__(), doing crazy things (see issue). Added docs say that implementation may assume certain behaviour from the user code, just as for the __hash__() method (same hash value for equal objects - the invariant, which we also can't enforce, user code might break this).

PS: I think that part of discussion rather belongs to the issue thread, which has some other arguments on why we want document this. See e.g. this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the note warn against side effects in general (like I/O), not just mutation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the note warn against side effects in general (like I/O), not just mutation?

Side effects, including in fact object mutation - are fine, unless they break idempotence.

But I think we don't loose something practically relevant if just forbid mutation of any objects in __bool__() (a shortened version of @ncoghlan suggestion):

Suggested change
Two successive calls to :meth:`!__bool__` on the same object must
return same value.
Calls to :meth:!__bool__` shouldn't mutate any objects.

Comment on lines +2953 to +2954
Two successive calls to :meth:`!__len__` on the same object must
return same value.
Copy link
Contributor

@ncoghlan ncoghlan Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Two successive calls to :meth:`!__len__` on the same object must
return same value.
Note: to help optimize logical expressions, implementations are permitted to assume
that calls to :meth:`!__len__` will not mutate that object, nor any other object.
While this expected invariant is not explicitly enforced, failing to abide
by it will result in implementation dependent runtime behaviour. This
implementation dependent behaviour may also be encountered when mutable
objects are shared across threads without appropriate synchronization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a typo (__len__ -> __bool__). But if we are going with this lengthly wording - I think it's better just point to the __bool__ docs.

@skirpichev skirpichev removed the needs backport to 3.12 only security fixes label Apr 8, 2025
@skirpichev
Copy link
Contributor Author

I think that I can't make progress on this.

@skirpichev skirpichev closed this Apr 12, 2025
@skirpichev skirpichev deleted the truth-testing-assumptions-124285 branch April 12, 2025 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review docs Documentation in the Doc dir needs backport to 3.13 bugs and security fixes skip news

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

8 participants