KEMBAR78
Detect and support module aliasing via assignment. by carljm · Pull Request #3435 · python/mypy · GitHub
Skip to content

Conversation

@carljm
Copy link
Member

@carljm carljm commented May 24, 2017

In semantic analysis, if we find a simple assignment statement where the rvalue
is a module, make the lvalue a direct alias to that same module.

Fixes #1778.

This PR is very much a trial balloon to see whether I'm even on roughly the right path.
Thanks in advance for review and advice!

In semantic analysis, if we find a simple assignment statement where the rvalue
is a module, make the lvalue a direct alias to that same module.

Fixes python#1778.
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. Maybe you can implement some items from your TODO in process_module_assignmanets?

@carljm
Copy link
Member Author

carljm commented May 24, 2017

Sure, if this looks like the right approach, I'll flesh it out to support additional cases.

@carljm
Copy link
Member Author

carljm commented May 24, 2017

@ilevkivskyi I took care of some of the TODOs. Propagating module aliases in non-global assignments is running into some strange issues that look like they might require deeper refactoring (see #3440). Maybe this can be merged without solving that problem yet? It's at least strictly an improvement over the status quo, and will cover the most common cases.

@carljm
Copy link
Member Author

carljm commented May 24, 2017

There may be some question about error handling here. In general I aimed for just silently not propagating module references in any case we don't understand, but this code will fail with an internal mypy exception if there are iterables of mismatched lengths between rvalue/lvalues. Such code would fail at runtime (and should fail later in type checking) anyway. Let me know if I should add some more defensive handling of this case.

@ilevkivskyi
Copy link
Member

mypy should never fail with an internal error. Also you don't need to care about all possible nested scenarios (I think unpacking modules is very rare). Just consider few simple scenarios. The code simplicity and robustness is much more important here.

@carljm
Copy link
Member Author

carljm commented May 24, 2017

Never mind, I mis-remembered the behavior of zip when given mismatched-length iterables. It doesn't fail, it just ignores extra elements in some of the iterables. So I don't believe there is a case here where mypy would fail with an internal exception.

I agree that nested unpacking of modules is probably quite rare (though one level of tuple unpacking assignment with modules is something I have seen in compatibility-shim code). However, I don't think the code here would be significantly simplified by removing support for that. Let me know if you think otherwise.

@carljm
Copy link
Member Author

carljm commented May 25, 2017

After #3450 is merged, the is_module_scope() guard here can be removed.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is almost ready. Just few suggestions.

mypy/semanal.py Outdated
litemlists = [v.items for v in cast(List[Union[TupleExpr, ListExpr]], lvals)]
ritems = cast(Union[TupleExpr, ListExpr], rval).items
for rv, *lvs in zip(ritems, *litemlists):
self._process_module_assignment(lvs, rv, ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks a bit unclear and probably unnecessary. We need to support only x, y = mod1, mod2 (with equal length of l.h.s and r.h.s.) and x = y = mod but not nested cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific what you find unclear about this code so I can improve the clarity? When I rewrite this code to not support nesting, it doesn't make it any simpler or clearer. The recursive approach is still the simplest even for a single layer of iterable (otherwise you have to do something else weird to support both iterable and single rval), and once you have recursion, not supporting nesting is more complex than supporting it.

I agree that nested unpacked module assignment is likely very rare, but it seems strange to me to reduce compatibility with true Python semantics in order to achieve (AFAICS) a negligible or nonexistent gain in clarity/simplicity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to agree that with all the comprehensions, casts, repetitions of (TupleExpr, ListExpr) and cleverness with zip and *x, I have no confidence that I understand what this code does. And then I'm not even touching on recursion. So the code appears unmaintainable, and I think it needs to be improved (maybe some comments would help).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! Pushed an update that breaks down the code a bit and adds comments to clarify the intent. Let me know if that helps.

[file n.py]
b = 'bar'

[builtins fixtures/module.pyi]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see few more tests. For example, a test that shows this failing:

import m
x, y = m  # must be an error here (Module object is not iterable or similar)

and in general more failures, like failure on attempted access/assignment to a non-existing module attribute.

Also probably some chained assignments:

import m
x = m
y = x
reveal_type(y.a) # 'builtins.str'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Very good point, will add these tests.

carljm added 6 commits May 30, 2017 13:54
* master: (23 commits)
  Make return type of open() more precise (python#3477)
  Add test cases that delete a file during incremental checking (python#3461)
  Parse each format-string component separately (python#3390)
  Don't warn about returning Any if it is a proper subtype of the return type (python#3473)
  Add __setattr__ support (python#3451)
  Remove bundled lib-typing (python#3337)
  Move version of extensions to post-release (python#3348)
  Fix None slice bounds with strict-optional (python#3445)
  Allow NewType subclassing NewType. (python#3465)
  Add console scripts (python#3074)
  Fix 'variance' label.
  Change label for variance section to just 'variance' (python#3429)
  Better error message for invalid package names passed to mypy (python#3447)
  Fix last character cut in html-report if file does not end with newline (python#3466)
  Print pytest output as it happens (python#3463)
  Add mypy roadmap (python#3460)
  Add flag to avoid interpreting arguments with a default of None as Optional (python#3248)
  Add type checking plugin support for functions (python#3299)
  Mismatch of inferred type and return type note (python#3428)
  Sync typeshed (python#3449)
  ...
* master:
  Support accessing modules imported in class bodies within methods. (python#3450)
  Make Type[X] compatible with metaclass of X (python#3346)
  Sync typeshed (python#3479)
  Handle flags in pytest passthrough (python#3467)
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this will be the last round of review, just few minor comments.

mypy/semanal.py Outdated
def process_module_assignment(self, s: AssignmentStmt) -> None:
"""Check if s assigns a module an alias name; if so, update symbol table."""
# TODO support module alias assignment not in global scope
if not self.is_module_scope():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that #3450 is merged, you can remove this, or make it less restrictive.

# y = n = b
# and then we recursively call this method for each of those assignments.
# If the rval and all lvals are not all of the same length, zip will just ignore
# extra elements, so no error will be raised here; mypy will later complain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somehow not convincing. Could you please add a test that

import m
x, y = m, m, m

is flagged as an error? (Plus also some length mismatches in nested scenarios.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday I added a test for the x, y, z = m, n case; will also push a test for x, y = m, m, m

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and also for length mismatches in nested scenarios.

# seq_rval = (a, b)
# seq_lvals = [(x, y), (m, n)]
# We now zip this into:
# elementwise_assignments = [(a, x, m), (b, y, n)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put a and b on the right here and in the actual zip? (It will be more logical and will match the order of arguments in _process_module_assignment.)

@ilevkivskyi
Copy link
Member

@carljm Also please resolve the merge conflict.

@carljm
Copy link
Member Author

carljm commented May 31, 2017

Enabling tracking module assignment in class scopes causes one new typing error in stdlib-samples, where the test is checking to see that the class attribute module is not one particular module that misses lexists function, and mypy doesn't understand this check. I added # type: ignore to that line; is there a better way to handle this? Should there be a TODO for that? Is that even a reasonable if check for mypy to try to understand?

Also pushed more extensive tests for unpacking mismatches, and the lval/rval reordering you suggested. I think I've addressed all comments!

@carljm
Copy link
Member Author

carljm commented May 31, 2017

@ilevkivskyi Putting the rval on the right end of the elementwise assignment tuples relies on handling of star-args that is only available in Py35+. The workarounds seem to me to be harder to understand, so I've gone back to the previous ordering with rval in the left, hope that's OK. Let me know if you have an alternative suggestion.

@carljm
Copy link
Member Author

carljm commented Jun 2, 2017

@ilevkivskyi Thanks for the reviews here! Anything else you'd like to see changed?

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a delay. Here two small comments.

mypy/semanal.py Outdated
def fail_invalid_classvar(self, context: Context) -> None:
self.fail('ClassVar can only be used for assignments in class body', context)

def process_module_assignment(self, s: AssignmentStmt) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a function that just unconditionally calls another function? Maybe one is enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. This was to present a cleaner API (pass in only the AssignmentStmt), but given there's only one call site and will never be more, that's not very useful. Consolidated.

class GenericTest(unittest.TestCase):
# The path module to be tested
pathmodule = genericpath # type: Any
pathmodule = genericpath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should keep the Any annotation here and not add # type: ignore below. This makes me think about another important aspect, if there is an explicit type annotation in a module assignment, then it should not create a module alias, but just a normal variable with that type. This will be consistent with how type aliases behave (It would be great to document this behaviour, but not necessarily in this PR):

import m
import types
class A: ...
class B(A): ...
Alias = A # creates a type alias
Var: Type[A] = A # creates a normal variable
mod = m # creates a module alias
mod.a # OK
mod_var: types.ModuleType = m # creates a normal variable
mod_var.a # Fails, ModuleType has no attribute a

And don't forget to add a test for module assignment with an explicit type (both types.ModuleType and Any).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so taking several things separately:

  1. Why is it better to keep # Any annotation for the module instead of # type: ignore in the one specific location mypy doesn't understand? It seems to me the latter provides a better test case for mypy, as it will validate all other attribute accesses on the module. If this were in my own code base, I would definitely prefer the # type: ignore on one special-case line over an # Any annotation that turns off all checking for that variable.

  2. Nonetheless, an explicit annotation of Any type should override the module-ref propagation; that was a bug in this PR, very good catch! Pushing fix and test.

  3. I don't understand why an explicit annotation of types.ModuleType should prevent propagation of the module-reference information. It doesn't seem consistent with how class references work. To extend the above example slightly:

from typing import Type
class A:
    foo = 'bar'
var: Type[A] = A
reveal_type(var.foo)

This code type-checks fine, and reveals a type of builtins.str. To me, this is perfectly parallel to:

import m
import types
mod: types.ModuleType = m
reveal_type(m.a)

I would definitely expect the latter code to succeed and reveal type of builtins.str, not fail with Module has no attribute "a". Why does giving an explicit type of types.ModuleType imply that I want mypy to play dumb about which module it is and what attributes it has?

I've pushed the behavior that makes sense to me for now; will be happy to update once I understand why.

Copy link
Member Author

@carljm carljm Jun 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, after further thought I think I understand your point: "type alias" is to "var of type Type" as "module alias" is to "var of type types.ModuleType." I'm still not sure that this provides the most intuitive or useful behavior, given that type aliases and module references are used in different ways. I just tested and I can still pass a real module reference to a function that takes a parameter annotated as types.ModuleType -- is there anything I can't do with a real module reference that I can do with an ordinary variable of types.ModuleType? Put differently, given your preferred behavior, what real-world use case is there to annotate a variable as types.ModuleType?

Anyway, changed my mind and pushing all changes as you suggest. Still would like to understand the reasoning, though.

carljm added 4 commits June 2, 2017 17:06
* master:
  Improve test output when a test doesn't specify the correct fixture (python#3488)
  Speed up tests by simplifying the stub for typing (python#3486)
  Clean up test fixtures (python#3485)
  Add additional quick mode test cases (python#3480)
  Add --strict-optional on by default to roadmap (python#3478)
  Minor speed-up in warn-return-any (python#3481)
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are replies to your questions an one more style comment.

class GenericTest(unittest.TestCase):
# The path module to be tested
pathmodule = genericpath # type: Any
pathmodule = genericpath # type: Any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any is the most precise currently existing type that reflects the semantics of this variable. If we had dependent types/literal types, then this should be something like Union[Literal[genericpath], Literal[posixpath]].

mypy/semanal.py Outdated
lvals: List[Expression],
rval: Expression,
ctx: AssignmentStmt,
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically this style is not used in mypy. You can just wrap argument list like this:

    def process_module_assignment(self, lvals: List[Expression], rval: Expression,
                                  ctx: AssignmentStmt) -> None:

mypy/semanal.py Outdated
# respect explicitly annotated type
if (
isinstance(lval.node, Var) and
lval.node.type is not None
Copy link
Member

@ilevkivskyi ilevkivskyi Jun 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably better to also directly check for if s.type is not None: ... in visit_assignment_stmt, and skip this whole function otherwise.

In general, here is why this is needed. The question is what should we do with things like this:

import mod1, mod2
if random() > 0.5:
    x = mod1
else:
    x = mod2

I would say this should either infer ModuleType or be an error, since we can't statically determine what will be x at runtime. I would prefer the latter since explicit is better than implicit, so that a user is forced to write:

x: ModuleType
if random() > 0.5:
    x = mod1
else:
    x = mod2

However, this could break some existing code, so inferring ModuleType seems to be safer for now. Currently, your PR seems to make x an alias to mod2 simply because it appears textually later. This could be fixed by checking for lvalue.is_def.

carljm added 2 commits June 3, 2017 04:43
* master:
  fix crash when serializing a type-ignored property setter (python#3493)
@carljm
Copy link
Member Author

carljm commented Jun 3, 2017

@ilevkivskyi I've implemented the behavior we discussed. The trickiest part is supporting

if bool():
    m = module1
else:
    m = module2

as a fallback to regular var of type ModuleType, rather than an error. This requires backtracking, which means adding some new storage of the original SymbolTableNode.kind and Var node that we had for m prior to considering m = module1 a module alias. Otherwise it's too brittle to try to reconstruct the original kind and node when we hit m = module2; it would mean reimplementing logic that already exists elsewhere.

This works and isn't too complex, but it does mean a new module_alias_backtrack dictionary attribute on SemanticAnalyzer. If you don't like this, we can get rid of it if we are OK with the same behavior as type aliases, that is, we simply fail at m = module2 in semantic analysis.

Let me know what you think.

mypy/semanal.py Outdated
for rv, *lvs in elementwise_assignments:
self.process_module_assignment(lvs, rv, ctx)
elif isinstance(rval, NameExpr):
# import pdb; pdb.set_trace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this comment probably. Sorry, I have to run now, will take a look this evening.

@carljm
Copy link
Member Author

carljm commented Jun 3, 2017

If the behavior you propose in #3494 is OK, then we can simplify this: remove module_alias_backtrack and just fail if we try to assign a second module to an existing module alias. The explicit-prior-annotation case will already be handled because we already don't convert an explicitly typed variable to a module alias.

@carljm
Copy link
Member Author

carljm commented Jun 3, 2017

I presume that if we want:

import m, n
if bool():
    x = m
else:
    x = n

to infer plain Var of ModuleType for x, then we also want:

import m, n

m = n

to also turn m into a plain Var of ModuleType? (In master this m = n assignment is ignored by mypy, it continues to typecheck m as a reference to its original module, which seems clearly wrong).

I just pushed code to support this join to ModuleType var. I'm not 100% confident of the correctness of the new code that creates the Var from scratch, and it seems to be partial duplication of logic that really belongs in analyze_lvalue, but can't go there because we don't know we need it until we also analyze the rvalue.

Again, this is implementation complexity we would avoid if we can go with your proposal from #3494, in which case I think this module reassignment would also just be an error.

@ilevkivskyi
Copy link
Member

It looks like this becomes too complicated. As I mentioned in #3494 we probably should go with an error if there is not explicit annotation.

@carljm
Copy link
Member Author

carljm commented Jun 3, 2017

Ok, that makes sense to me; will definitely simplify things (and bring it closer to the current implementation for type aliases). I will push an update.

@carljm
Copy link
Member Author

carljm commented Jun 3, 2017

Pushed the simplification. Hopefully we are getting close :-)

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good! Just two style comments, I will merge when you address them and test pass.

mypy/semanal.py Outdated
if lnode.kind == MODULE_REF and lnode.node is not rnode.node:
self.fail(
"Cannot assign multiple modules to name '{}' "
"without explicit annotation to ModuleType".format(lval.name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a native speaker, is it better to say without explicit 'types.ModuleType' annotation?

mypy/semanal.py Outdated
# respect explicitly annotated type
if (
isinstance(lval.node, Var) and
lval.node.type is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor style point: I think it is better to put it on one line here:

if isinstance(lval.node, Var) and lval.node.type is not None:
    ...

The maximum line width is 99 in mypy.

@carljm
Copy link
Member Author

carljm commented Jun 3, 2017

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants