KEMBAR78
Fix type inference for index expression with bounded TypeVar by ilyalabun · Pull Request #11434 · python/mypy · GitHub
Skip to content

Conversation

ilyalabun
Copy link
Contributor

@ilyalabun ilyalabun commented Nov 2, 2021

Closes #8231

When type of index expression (e.g. foo[bar]) is analyzed and left
expression (i.e. foo) has generic type (TypeVar) with upper bound,
for some upper bound types this type inference yields wrong result.

For example, if upper bound type is instance of TypeDict, mypy
considers return type of such index expression as object:

from typing import TypedDict, TypeVar

class Data(TypedDict):
    x: int

T = TypeVar("T", bound=Data)

def f(data: T) -> int:
    # line below leads to mypy error:
    # 'Unsupported operand types for + ("object" and "int")'
    return data["x"] + 1

The root cause of this issue was in visit_index_with_type method
from checkexpr.py which does type analysis for index expressions. For
TypeVar left expression code flow goes via default branch which just
returns return type of upper bound's __getitem__. For some types this
return type inference logic operates on a fallback type. For example,
for TypedDict fallback type is typing._TypedDict with __getitem__
returning just object.

To fix the issue we added special case to visit_index_with_type for
TypeVar left expression which recursively calls
visit_index_with_type with TypeVar upper bound parameter. This way
we always handle upper bounds requiring special treatment correctly.

Corner case -- recursive TypeVar T with upper bound having
__getitem__ method with self having type T and returning T.
In this case when we call visit_index_with_type recursively, TypeVar
types of upper bound __getitem__ method are erased and
check_method_call_by_name returns type of upper bound, not T. So we
don't do recursive visit_index_with_type call when upper bound has
__getitem__ method and handle this case in else branch. which
handles it as expected -- it return T as return type.

@ilyalabun
Copy link
Contributor Author

I couldn't prove formally that this recursive call to visit_index_with_type can't cause infinite loops, but also couldn't find instances when it can. Dear maintainers, could you please comment on this?

Closes python#8231

When type of index expression (e.g. `foo[bar]`) is analyzed and left
expression (i.e. `foo`) has generic type (`TypeVar`) with upper bound,
for some upper bound types this type inference yields wrong result.

For example, if upper bound type is instance of `TypeDict`, mypy
considers return type of such index expression as `object`:

```
from typing import TypedDict, TypeVar

class Data(TypedDict):
    x: int

T = TypeVar("T", bound=Data)

def f(data: T) -> int:
    # line below leads to mypy error:
    # 'Unsupported operand types for + ("object" and "int")'
    return data["x"] + 1
```

The root cause of this issue was in `visit_index_with_type` method
from `checkexpr.py` which does type analysis for index expressions. For
`TypeVar` left expression code flow goes via default branch which just
returns return type of upper bound's `__getitem__`. For some types this
return type inference logic operates on a fallback type. For example,
for `TypedDict` fallback type is `typing._TypedDict` with `__getitem__`
returning just `object`.

To fix the issue we added special case to `visit_index_with_type` for
`TypeVar` left expression which recursively calls
`visit_index_with_type` with `TypeVar` upper bound parameter. This way
we always handle upper bounds requiring special treatment correctly.
@github-actions

This comment has been minimized.

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 👍

elif (isinstance(left_type, CallableType)
and left_type.is_type_obj() and left_type.type_object().is_enum):
return self.visit_enum_index_expr(left_type.type_object(), e.index, e)
elif isinstance(left_type, TypeVarType):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure it cannot cause infinite recursion, because .upper_bound cannot be another TypeVar.

x: int


T = TypeVar("T", bound=Data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the same work for bound=Union[Data, Other]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it works because Union is handled in visit_index_with_type by processing Union items recursively. Added test for this case.

and left_type.is_type_obj() and left_type.type_object().is_enum):
return self.visit_enum_index_expr(left_type.type_object(), e.index, e)
elif isinstance(left_type, TypeVarType):
return self.visit_index_with_type(left_type.upper_bound, e, original_type)
Copy link
Member

@sobolevn sobolevn Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this type var has .values instead of .upper_bound? Will it still work?

Copy link
Contributor Author

@ilyalabun ilyalabun Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surprisingly it works. I'm not sure why, but I suspect it's related to TypeVars expansion in checker.py::check_func_def. Added test for this case as well

@github-actions

This comment has been minimized.

@ilyalabun
Copy link
Contributor Author

ilyalabun commented Nov 7, 2021

Diff from mypy_primer, showing the effect of this PR on open source code:

xarray (https://github.com/pydata/xarray.git)
+ xarray/core/variable.py:1193: error: Incompatible return value type (got "Variable", expected "T_Variable")  [return-value]

@sobolevn mypy_primer found interesting corner case with recursive TypeVar. Pushed the fix in 4c78757

@ilyalabun ilyalabun force-pushed the ilabun/fix-8231 branch 2 times, most recently from f191fab to 4c78757 Compare November 7, 2021 21:10
@ilyalabun
Copy link
Contributor Author

@sobolevn could you please approve CI workflows?

@sobolevn
Copy link
Member

sobolevn commented Nov 8, 2021

Sorry, I don't have permissions to do that 😞

CC @JelleZijlstra, please help us!

@JelleZijlstra
Copy link
Member

Done, there's a flake8 failure

Corner case -- recursive TypeVar `T` with upper bound having
`__getitem__` method with `self` having type `T` and returning `T`.
In this case when we call `visit_index_with_type` recursively, `TypeVar`
types of upper bound `__getitem__` method are erased and
`check_method_call_by_name` returns type of upper bound, not `T`. So we
don't do recursive `visit_index_with_type` call when upper bound has
`__getitem__` method and handle this case in `else` branch. which
handles it as expected -- it return `T` as return type.
@ilyalabun
Copy link
Contributor Author

Done, there's a flake8 failure

@JelleZijlstra thanks, pushed the fix

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@JukkaL JukkaL merged commit 231f7cf into python:master Nov 9, 2021
@ilyalabun ilyalabun deleted the ilabun/fix-8231 branch November 10, 2021 09:31
@ilyalabun
Copy link
Contributor Author

Thanks @JukkaL ! When are you going to release mypy with this fix?

tushar-deepsource pushed a commit to DeepSourceCorp/mypy that referenced this pull request Jan 20, 2022
…11434)

Closes python#8231

When type of index expression (e.g. `foo[bar]`) is analyzed and left
expression (i.e. `foo`) has generic type (`TypeVar`) with upper bound,
for some upper bound types this type inference yields wrong result.

For example, if upper bound type is instance of `TypeDict`, mypy
considers return type of such index expression as `object`:

```
from typing import TypedDict, TypeVar

class Data(TypedDict):
    x: int

T = TypeVar("T", bound=Data)

def f(data: T) -> int:
    # line below leads to mypy error:
    # 'Unsupported operand types for + ("object" and "int")'
    return data["x"] + 1
```

The root cause of this issue was in `visit_index_with_type` method
from `checkexpr.py` which does type analysis for index expressions. For
`TypeVar` left expression code flow goes via default branch which just
returns return type of upper bound's `__getitem__`. For some types this
return type inference logic operates on a fallback type. For example,
for `TypedDict` fallback type is `typing._TypedDict` with `__getitem__`
returning just `object`.

To fix the issue we added special case to `visit_index_with_type` for
`TypeVar` left expression which recursively calls
`visit_index_with_type` with `TypeVar` upper bound parameter. This way
we always handle upper bounds requiring special treatment correctly.

Corner case -- recursive TypeVar `T` with upper bound having
`__getitem__` method with `self` having type `T` and returning `T`.
In this case when we call `visit_index_with_type` recursively, `TypeVar`
types of upper bound `__getitem__` method are erased and
`check_method_call_by_name` returns type of upper bound, not `T`. So we
don't do recursive `visit_index_with_type` call when upper bound has
`__getitem__` method and handle this case in `else` branch. which
handles it as expected -- it return `T` as return type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeDict in bound TypeVar assumes parameter values are of type "object"

4 participants