KEMBAR78
Plugin for ctypes.Array by dgelessus · Pull Request #4869 · python/mypy · GitHub
Skip to content

Conversation

dgelessus
Copy link
Contributor

@dgelessus dgelessus commented Apr 7, 2018

Followup to python/typeshed#1986 (specifically this discussion). This adds a mypy.plugins.ctypes module, which provides accurate type info for ctypes.Array's methods. The corresponding Typeshed stub uses Any in lots of places, because ctypes' rules for what is accepted/returned are too complex. This plugin provides callbacks for the following ctypes.Array methods and attributes:

Note: ignore the following paragraphs, the issues described there have now been fixed in mypy.

This is still a work in progress. I added some debug prints showing which plugin hooks are called when, because it seems they are not always called properly. Specifically:

  • For __getitem__, get_method_hook is called as expected.
  • For __init__, get_method_hook is not called. Instead, get_function_hook is called with "Array" as the argument. This seems like a bug - shouldn't mypy figure out that Array is a class and look at its __init__ method? Even if this is intentional, the name should be "ctypes.Array" and not "Array".
  • For __setitem__, get_method_hook is not called either. Instead, get_function_hook is called with "__setitem__ of Array" as the argument. This also seems like a bug - __setitem__ is clearly a method and not a function, and "__setitem__ of Array" is not a fully qualified name.
  • For __iter__, the behavior is like with __setitem__.
  • get_method_signature_hook is never called for any of these dunder methods.

Are these bugs, or am I simply misunderstanding how the plugin hooks are supposed to behave?

Here's the test code with which I observed this behavior:

import ctypes

class MyCInt(ctypes.c_int):
    pass

intarr4 = ctypes.c_int * 4
reveal_type(intarr4)
a = intarr4(1, 2, 3, 4)
reveal_type(a)
reveal_type(a[0])
reveal_type(a[1:3])
a[0] = 42
a[1] = ctypes.c_int(42)
a[2] = MyCInt(42)
a[3] = b"bytes"  # type error
reveal_type(iter(a))
for x in a:
    reveal_type(x)

myintarr4 = MyCInt * 4
reveal_type(myintarr4)
mya = myintarr4(1, 2, 3, 4)
reveal_type(mya)
reveal_type(mya[0])
reveal_type(mya[1:3])
mya[0] = 42
mya[1] = ctypes.c_int(42)  # type error
mya[2] = MyCInt(42)
mya[3] = b"bytes"  # type error
reveal_type(iter(mya))
for myx in mya:
    reveal_type(myx)

This provides accurate return types for ctypes.Array.__getitem__
and __iter__. __init__ is partially implemented but not yet working.
@ilevkivskyi
Copy link
Member

Hm, it looks like a forgot to press "Comment" a week ago, sorry. Trying to remember, I think my conclusion was: behavior of __init__ is intentional (modulo name), other two are bugs. Fixing these should be a separate PR landing before this one.

@ilevkivskyi
Copy link
Member

... and thanks for working on this! :-)

@dgelessus
Copy link
Contributor Author

dgelessus commented Apr 23, 2018

Hm, it looks like a forgot to press "Comment" a week ago, sorry.

No worries :)

behavior of __init__ is intentional (modulo name), other two are bugs.

OK, that means the following bugs exist at the moment:

  • When a class is instantiated, the class name passed to get_function_hook is not fully qualified
  • For some dunder methods (__setitem__, __iter__), get_function_hook is incorrectly called instead of get_method_hook (and the method name is not fully qualified)
  • get_method_signature_hook is not called for some dunder methods (__getitem__, __setitem__, __iter__)

Is that correct? Should I put these points into a separate issue (or multiple issues)?

@ilevkivskyi
Copy link
Member

Is that correct? Should I put these points into a separate issue (or multiple issues)?

Yes, I think one common issue for these would be OK.

@gvanrossum
Copy link
Member

This work has been in progress for a mighty long time. Is it still relevant?

# Conflicts:
#	mypy/plugin.py
@dgelessus
Copy link
Contributor Author

As far as I can tell, this is still blocked by #4964. I merged master into my branch and checked, and the method signature hooks aren't looked up correctly for __init__, __setitem__ and __iter__.

I'm aware of #5379, but I think that doesn't help in this case. The ctypes.Array creation hook needs to know the element type of the array that's being created, so that it can check the types of the arguments. As far as I can tell, this information is not available in a function hook: the FunctionContext only includes the argument and return types, but not the type of the callable (in this case Type[ctypes.Array[E]], where E is the array's element type).

Because of this, array_init_callback is currently implemented as a method signature hook - that way the array element type can be determined by looking at the type of self (ctx.type). However, this approach currently doesn't work, because mypy doesn't call method hooks for __init__ when a class is instantiated.

@ilevkivskyi
Copy link
Member

Are interested in fixing #4964 first (or some parts of it)? This is an important issue and you have a good context of what is needed in real world.

@dgelessus
Copy link
Contributor Author

@ilevkivskyi I think I have a possible fix for the remaining #4964 issues, see #5700. Based on my local testing, those fixes should be enough to make the ctypes plugin work correctly.

@dgelessus dgelessus changed the title [WIP] Plugin for ctypes.Array (and possible bugs with the plugin hooks?) Plugin for ctypes.Array Nov 13, 2018
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks very good. I also tried this agains our internal codebases, and it doesn't generate any errors.

Here I have some comments (mostly minor).

def get_function_hook(self, fullname: str
) -> Optional[Callable[[FunctionContext], Type]]:
from mypy.plugins import ctypes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate, but it is not your fault. We already have an issue to refactor plugin modules to avoid these imports, see #5117. Could you please also add a comment (or comments?) about this also here?

Another idea is to move the ctypes plugin to plugin.py since all other stdlib plugins live there. In future we might add more stdlib plugins (e.g. for functools) so this file will grow, but for now this is probably a better solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is to move the ctypes plugin to plugin.py since all other stdlib plugins live there

That would be possible, but I'm not sure if it's worth it here. The circular import issues aren't too big (the only affected parts are the argument type annotations of the hooks). And since the ctypes plugin is a bit larger than the other hooks in plugin.py, I think it makes more sense to have it in a separate module. (It's not the only stdlib plugin to have its own module - dataclasses is separated as well, and it's roughly the same size as the ctypes plugin.)

'Array constructor argument {} of type "{}"'
' is not convertible to the array element type "{}"'
.format(arg_num, arg_type, et),
ctx.context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also support calling Array with *args/**kwargs/etc? If yes, I would add a test for this, if no, I would add a comment explaining this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it doesn't work with *args at the moment, but it definitely should. **kwargs should always be an error, since Array doesn't take keyword arguments. (Yes, **{} is technically allowed, but I really hope nobody does that.)

I'm not sure how to handle this in a function hook though - I don't see a way to find out if an argument is a regular arg, *args, or **kwargs.

In any case, I've added a (currently failing) test for *args.

Copy link
Member

@ilevkivskyi ilevkivskyi Nov 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the logic in function hook is sub-optimal, this is probably the most problematic hook currently, several people who tried it, found the API inconvenient. You also get corresponding expressions in ctx.args[0] for every type in ctx.arg_types[0], maybe you can try checking if it is a StarExpr(...) and then somehow "unpack" type if it maps to iterable supertype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can try checking if it is a StarExpr(...)

This doesn't seem to work - if I print out ctx.args[0][0] for a call like intarr4(*int_values), I get NameExpr(int_values [__main__.int_values]), with no StarExpr around it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... This is unfortunate, there is a trivial fix to apply_function_plugin() to pass arg_kinds to FunctionContext, but this looks like a partial solution of a bigger problem. Could you please add a TODO comment here? We can fix it later after we decide what to do with the FunctionContext API.

class slice: pass
class str: pass
class tuple(Generic[T]): pass
class unicode: pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two fixtures needed, or you can slightly tweak one of existing fixtures to accommodate your needs? If yes, then I think it is better than adding two new fixtures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest one would be floatdict.pyi, if I add definitions for unicode and slice there, I can use it for all of the ctypes tests.

@dgelessus
Copy link
Contributor Author

mypy/plugins/ctypes.py:169: error: Argument 1 to "make_simplified_union" of "UnionType" has incompatible type "List[Instance]"; expected "List[Type]"

This seems odd - shouldn't make_simplified_union accept any Sequence[Type]? (If necessary I can work around this by explicitly annotating my list, but it seems strange that an exact list type is required here.)

@ilevkivskyi
Copy link
Member

This seems odd - shouldn't make_simplified_union accept any Sequence[Type]? (If necessary I can work around this by explicitly annotating my list, but it seems strange that an exact list type is required here.)

You are totally allowed to update that signature to be more reasonable :-)

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updates! I have one more (hopefully last) round of comments here about few remaining corner cases (sorry for being pedantic here).

"""
# Every type can be converted from itself (obviously).
allowed_types = [tp]
if isinstance(tp, Instance):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, should we care about unions here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should, but this isn't as straightforward as with the attribute hooks. The type returned from _autoconvertible_to_cdata is not used as a return type, but an argument type, for example in Array.__setitem__. This makes the rules for union types a little more complicated here.

For example, if arr has the type Array[Union[c_int, c_uint]], arr[i] = x is only valid if x can be converted to both c_int and c_uint, i. e. the type of x should be the intersection of _autoconvertible_to_cdata(c_int) and _autoconvertible_to_cdata(c_uint).

As far as I know, mypy doesn't support intersection types, and I can't do an easy intersection calculation using sets, since Type objects are not hashable. So I guess I'd have to write a manual intersection algorithm based on is_subtype?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy types are hashable (we have a subtype cache for example that uses this), but I am not sure this is what you need. Also I am not sure we need an intersection here, it is indeed safe, but I am afraid it may cause false positives. I am fine with a less strict check (especially if it is easy to implement). IOW I don't worry about false negatives because we don't allow unions here, I worry about false positives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy types are hashable

Ah, my mistake - I only looked at the Type base class and saw that it had no __hash__, but almost all of the concrete type classes define __hash__.

IOW I don't worry about false negatives because we don't allow unions here, I worry about false positives.

In that case I can simply add another "union-loop" thing. But tbh even if this is changed later, I don't think it will give many unexpected errors - an Array[Union[...]] type can only come from an explicit annotation, in which case the user perhaps wants this kind of strict checking.

if et is not None:
types = []
for tp in union_items(et):
if isinstance(tp, Instance) and tp.type.fullname() == 'ctypes.c_char':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should double-check that Any is still valid here and elsewhere. Could you please add tests for this and update the code if necessary?

"""Callback to provide an accurate type for ctypes.Array.raw."""
et = _get_array_element_type(ctx.type)
if et is not None:
if isinstance(et, Instance) and et.type.fullname() == 'ctypes.c_char':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Union[c_char, Any] be still valid here? I think it should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I'm wondering though, what would a union like Union[c_char, Any] be useful for? Every c_char is an Any, so why doesn't the union get simplified to Any at some point? (The make_simplified_union docstring points out that Anys in unions are not simplified, but doesn't explain why.)

'Array constructor argument {} of type "{}"'
' is not convertible to the array element type "{}"'
.format(arg_num, arg_type, et),
ctx.context)
Copy link
Member

@ilevkivskyi ilevkivskyi Nov 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the logic in function hook is sub-optimal, this is probably the most problematic hook currently, several people who tried it, found the API inconvenient. You also get corresponding expressions in ctx.args[0] for every type in ctx.arg_types[0], maybe you can try checking if it is a StarExpr(...) and then somehow "unpack" type if it maps to iterable supertype?

@dgelessus
Copy link
Contributor Author

You are totally allowed to update that signature to be more reasonable :-)

Actually, after looking into it, it's not as easy to change as I thought. It turns out that make_simplified_union will sometimes reassign and/or mutate the list parameter, in which case the List[Type] type is actually correct. Refactoring this isn't trivial and I don't want to do it in an otherwise unrelated PR, so I simply added the necessary type comments in the plugin code instead.

@ilevkivskyi
Copy link
Member

@dgelessus I left two more comments, I would propose to apply simple fixes, land this (finally after many months :-) and then fix the remaining issues in a separate PR later.

(The test that currently fails can be just skipped by adding -skip suffix to its name.)

This case cannot be handled correctly yet, because the function hook
API doesn't expose the necessary information (see comment in code).
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I think this is ready now. The problem with *args can be revisited after we improve the plugin hook API.

@ilevkivskyi ilevkivskyi merged commit d4729d9 into python:master Nov 19, 2018
# unpacked. However, FunctionContext currently doesn't provide a way to differentiate
# between normal arguments and *args, so the iterable type is considered invalid.
# Once FunctionContext has an API for this, *args should be allowed here if the
# iterable's element type is compatible with the array element type.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any workaround for this? I currently have a list of integers that I want to convert into a ctypes array of bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just put a # type: ignore on the line where error appears. We are already working on something that will allow to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Relevant issue: #5409 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants