-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Make tensorboard.compat.{tf,tf2} lazily loaded #1781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| from . import tensor_shape # noqa | ||
|
|
||
| # Set a fake __version__ to help distinguish this as our own stub API. | ||
| __version__ = 'stub' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it strange that this makes tf.__version__ and tf.version.VERSION
different? Do we want to change the latter as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stub doesn't actually have a version module so I'm not quite sure I follow? I picked __version__ because it's the one place that's consistent across TF 1.x and 2.x, since 2.0 only has tf.version.VERSION, but most of 1.x used tf.VERSION.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. Thanks.
tensorboard/lazy.py
Outdated
| def __getattr__(self, item): | ||
| module = self._load() | ||
| return getattr(module, item) | ||
| if not self._cached_module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a glance, it looks like it should be fine that this isn’t threadsafe,
but if there are problems that have to do with thread safety combined
with lazy imports then that sounds like a nightmare to debug. Up to you
whether to add (potentially double-checked) locking or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was also arguably an existing problem (in the previous code, the load function could be called many times, so the caching here was just for efficiency, not correctness), but I'll grant that inviting users to supply their own load functions makes it more of a concern, so done.
tensorboard/lazy.py
Outdated
| load_fn: callable that actually does the import and returns the module | ||
| """ | ||
| super(LazyLoader, self).__init__(name) | ||
| self._load_fn = load_fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this does have the effect that _load_fn and
_cached_module leak onto the actual module being imported, and in the
event that the underlying module actually defines those properties they
will not be accessible (because of the __getattr__/__getattribute__
distinction). This might sound like “who would ever do that”, but it can
happen if you try to lazily load a module that is itself lazily loaded:
>>> from tensorboard import lazy
>>> @lazy.lazy_load("inner")
... def inner():
... import sys
... return sys
...
>>> @lazy.lazy_load("outer")
... def outer():
... return inner
...
>>> outer.stdout
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "tensorboard/lazy.py", line 55, in __getattr__
return getattr(self._cached_module, item)
AttributeError: 'NoneType' object has no attribute 'stdout'
>>> inner.stdout
<open file '<stdout>', mode 'w' at 0x7fa4487f5150>
>>> outer.stdout
<open file '<stdout>', mode 'w' at 0x7fa4487f5150>
>>> This seems a bit insidious. Can we fix the leaky abstraction? For
instance, we could store _load_fn and _cached_module in a closure
value instead of on the object itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted offline that this is not new in this PR, so it’s fine with me if
you don’t address it and you or I fixes it in a subsequent commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to use a closure.
tensorboard/__init__.py
Outdated
| from __future__ import division | ||
| from __future__ import print_function | ||
|
|
||
| import importlib as _importlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This exposes tensorboard._importlib; could resolve by lazily
importing importlib inside program() and summary() if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun fact: we're currently exposing the symbols absolute_import, division, and print_function as well :p
The main reason to use importlib was avoiding an inline import that requires a lint suppression internally, so at that point we can just directly import.
|
Oh, forgot to note: before the underlying module is actually imported, though it provides the correct path after the import is forced. I don’t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL. I also added a custom __repr__() so it doesn't misleadingly claim to be a built-in, since I'd also noticed that.
| from . import tensor_shape # noqa | ||
|
|
||
| # Set a fake __version__ to help distinguish this as our own stub API. | ||
| __version__ = 'stub' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stub doesn't actually have a version module so I'm not quite sure I follow? I picked __version__ because it's the one place that's consistent across TF 1.x and 2.x, since 2.0 only has tf.version.VERSION, but most of 1.x used tf.VERSION.
tensorboard/__init__.py
Outdated
| from __future__ import division | ||
| from __future__ import print_function | ||
|
|
||
| import importlib as _importlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun fact: we're currently exposing the symbols absolute_import, division, and print_function as well :p
The main reason to use importlib was avoiding an inline import that requires a lint suppression internally, so at that point we can just directly import.
tensorboard/lazy.py
Outdated
| load_fn: callable that actually does the import and returns the module | ||
| """ | ||
| super(LazyLoader, self).__init__(name) | ||
| self._load_fn = load_fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to use a closure.
tensorboard/lazy.py
Outdated
| def __getattr__(self, item): | ||
| module = self._load() | ||
| return getattr(module, item) | ||
| if not self._cached_module: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was also arguably an existing problem (in the previous code, the load function could be called many times, so the caching here was just for efficiency, not correctness), but I'll grant that inviting users to supply their own load functions makes it more of a concern, so done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton! This looks better. Request response to one inline, then
it’ll look good to me.
If you don’t mind, I might submit follow-up PRs to
- add a test that the nested imports work,
- further improve the
reprso that it prints the underlying
__file__when the module is loaded, and - detects the deadlock noted in an inline and raises an error instead—
but you’ve done enough work on this so I don’t want to block you. :-)
| from . import tensor_shape # noqa | ||
|
|
||
| # Set a fake __version__ to help distinguish this as our own stub API. | ||
| __version__ = 'stub' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it. Thanks.
tensorboard/lazy.py
Outdated
|
|
||
|
|
||
| def _return_once(f): | ||
| """Decorator that calls f() once, then returns that value repeatedly.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though used safely* within this module, this decorator seems dangerous in
general; it passes through arguments to f the first time, then
completely ignores them at all subsequent invocations because the cached
result already exists.
Given that the local uses are safe, I’d be fine with either of:
-
documenting this with a warning, and changing the headline of the
docstring fromcalls f()tocalls `f`; or -
refactoring so that the decorator actually only calls
f()(no
args), and theselfargument toload_onceis passed in
separately:lazy_module_box = [] def load_once(): module = load_fn() lazy_module_box[0].__dict__.update(module.__dict__) return module class LazyModule(types.ModuleType): # … pass lazy_module = LazyModule(name) lazy_module_box.append(lazy_module) return lazy_module
(You could also remove the decorator altogether, but I agree that it’s
nice to have the locking logic encapsulated.)
* Barring clients who do things like type(mod).__dir__(not_mod),
which I think is fine to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was meant to just be an implementation detail so I wasn't worrying about this, but fair enough. I rewrote the decorator to be a standard memoize that only supports a single hashable argument since that's all that we need.
tensorboard/lazy.py
Outdated
| if cache[0] == not_called: | ||
| with lock: | ||
| if cache[0] == not_called: | ||
| cache[0] = f(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This open call can deadlock if the load_fn references the very module
that it corresponds to:
>>> from tensorboard import lazy
>>> @lazy.lazy_load("foo")
... def foo():
... print("loading")
... return foo.x
...
>>> foo.y
loading
^C^C^C^\QuitI can’t think of a valid reason to do this, so no change requested (if
you don’t mind, I’ll submit a follow-up PR to detect this and raise a
nice error message instead of hanging).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is a good point, since this could also be triggered by an import cycle:
@lazy.lazy_load("inner")
def inner():
return outer.foo
@lazy.lazy_load("outer")
def outer():
return innerAnd I've had enough headaches with those without deadlock too :p Changed to use a reentrant lock so you just get a recursion depth exceeded error instead, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could also be triggered by an import cycle
Yep, good point.
Changed to use a reentrant lock so you just get a recursion depth
exceeded error instead, WDYT?
Yep, using an RLock is the right fix IMO, and a recursion depth error
is fine for now. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL. I also tweaked repr() as you mentioned. I'll leave it to you to write the nested case test in a followup if you so desire :)
tensorboard/lazy.py
Outdated
|
|
||
|
|
||
| def _return_once(f): | ||
| """Decorator that calls f() once, then returns that value repeatedly.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was meant to just be an implementation detail so I wasn't worrying about this, but fair enough. I rewrote the decorator to be a standard memoize that only supports a single hashable argument since that's all that we need.
tensorboard/lazy.py
Outdated
| if cache[0] == not_called: | ||
| with lock: | ||
| if cache[0] == not_called: | ||
| cache[0] = f(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is a good point, since this could also be triggered by an import cycle:
@lazy.lazy_load("inner")
def inner():
return outer.foo
@lazy.lazy_load("outer")
def outer():
return innerAnd I've had enough headaches with those without deadlock too :p Changed to use a reentrant lock so you just get a recursion depth exceeded error instead, WDYT?
tensorboard/lazy.py
Outdated
| module = self._load() | ||
| return getattr(module, item) | ||
| def __repr__(self): | ||
| if hasattr(load_once, 'loaded'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not initialize load_once.loaded = False at the beginning, then
just use a field dereference instead of a hasattr (which always makes
me nervous, though it’s safe here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I didn't do loaded = False because now part of the definition/contract of the function exists outside the function body where it's easier for it to get accidentally removed, but I don't feel strongly about it.
tensorboard/lazy.py
Outdated
| """Memoizing decorator for f, which must have exactly 1 hashable argument.""" | ||
| nothing = object() # Unique "no value" sentinel object. | ||
| cache = {} | ||
| # Use a reentrank lock so that if f references the resulting wrapper we die |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: reentrant (sp.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tensorboard/lazy.py
Outdated
| if cache[0] == not_called: | ||
| with lock: | ||
| if cache[0] == not_called: | ||
| cache[0] = f(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could also be triggered by an import cycle
Yep, good point.
Changed to use a reentrant lock so you just get a recursion depth
exceeded error instead, WDYT?
Yep, using an RLock is the right fix IMO, and a recursion depth error
is fine for now. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
tensorboard/lazy.py
Outdated
| """Memoizing decorator for f, which must have exactly 1 hashable argument.""" | ||
| nothing = object() # Unique "no value" sentinel object. | ||
| cache = {} | ||
| # Use a reentrank lock so that if f references the resulting wrapper we die |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tensorboard/lazy.py
Outdated
| module = self._load() | ||
| return getattr(module, item) | ||
| def __repr__(self): | ||
| if hasattr(load_once, 'loaded'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I didn't do loaded = False because now part of the definition/contract of the function exists outside the function body where it's easier for it to get accidentally removed, but I don't feel strongly about it.
|
(re-approved) |
Summary: This commit adds a test case for [behavior implemented in #1781][1]. [1]: #1781 (comment) Test Plan: `bazel test //tensorboard:lazy_test` passes at HEAD, but fails after checking out `lazy.py` from 976064d. wchargin-branch: lazy-composition
Summary: This tests behavior [described in #1781][2], improving the error message along the way. [2]: #1781 (comment) Test Plan: `bazel test //tensorboard:lazy_test` passes at HEAD, but changing the implementation of `_memoize` to use `Lock` instead of `RLock` causes a deadlock (and so the test times out and fails eventually). wchargin-branch: lazy-cycles
Summary: It was [noted in #1781][3] that the `__repr__` of `LazyModule`s deserves a modicum of special attention. This commit tests that, and improves the `__repr__` for already-loaded lazy modules so that it’s clear that they are still going through the `LazyModule` proxy. (Should there be a bug in the proxy interface—say, someone expects `__setattr__` to also pass through to the underlying module—this would make it much easier to track down.) [3]: #1781 (review) Test Plan: `bazel test //tensorboard:lazy_test` passes. wchargin-branch: lazy-repr
Summary: It was [noted in #1781][3] that the `__repr__` of `LazyModule`s deserves a modicum of special attention. This commit tests that, and improves the `__repr__` for already-loaded lazy modules so that it’s clear that they are still going through the `LazyModule` proxy. (Should there be a bug in the proxy interface—say, someone expects `__setattr__` to also pass through to the underlying module—this would make it much easier to track down.) [3]: #1781 (review) Test Plan: `bazel test //tensorboard:lazy_test` passes. wchargin-branch: lazy-repr
Summary: This commit adds a test case for [behavior implemented in #1781][1]. [1]: #1781 (comment) Test Plan: `bazel test //tensorboard:lazy_test` passes at HEAD, but fails after checking out `lazy.py` from 976064d. wchargin-branch: lazy-composition
Summary: This tests behavior [described in #1781][2], improving the error message along the way. [2]: #1781 (comment) Test Plan: `bazel test //tensorboard:lazy_test` passes at HEAD, but changing the implementation of `_memoize` to use `Lock` instead of `RLock` causes a deadlock (and so the test times out and fails eventually). wchargin-branch: lazy-cycles
Summary: It was [noted in #1781][3] that the `__repr__` of `LazyModule`s deserves a modicum of special attention. This commit tests that, and improves the `__repr__` for already-loaded lazy modules so that it’s clear that they are still going through the `LazyModule` proxy. (Should there be a bug in the proxy interface—say, someone expects `__setattr__` to also pass through to the underlying module—this would make it much easier to track down.) [3]: #1781 (review) Test Plan: `bazel test //tensorboard:lazy_test` passes. wchargin-branch: lazy-repr
This PR makes
tensorboard.compat.tfinto a lazy-loading module for accessing the TF API (either real or stub), and introduces a similartensorboard.compat.tf2lazy-loading module for accessing the TF 2.0 API (either astfortf.compat.v2).The first change (lazifying
tensorboard.compat.tf) is necessary so that that thetensorboard.summary.v2summary API can be imported without immediately importing TensorFlow, since we want TF to be able to import that without triggering a circular dependency. Previously, the V2 summary API transitively imports various pieces from undertensorboard.compat, and that module's__init__.pyfile was opportunistically importingtensorflowat module load time to providetensorboard.compat.tf). Now that import is deferred untiltfis actually used, and we avoid using it from within the V2 summary APIs. I've addedsummary_dep_test.pyto enforce this so we don't have a regression.To implement this lazy-loading behavior, I refactored LazyLoader to let you customize the module loading with your own function, and made it usable as a decorator for convenience. I also refactored the existing LazyLoader uses in
tensorboard/__init__.pyto use this, and replacedtensorboard.compat.import_tf_v2()with a lazy loadingtensorboard.compat.tf2, which in turn eliminates the need to callimport_tf_v2()on-demand in the various places that use it (necessary because we don't always have a TF 2.0 API available).