-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[dynamo] implement framelocals mapping as c++ object #140063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140063
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit c01e3fd with merge base 7ab3177 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
|
|
||
| void framelocals_mapping_free(FrameLocalsMapping* map); | ||
|
|
||
| // Borrowed reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you say borrowed, but above you use py::reinterpret_steal which seems wrong if this is borrowed.
|
|
||
| typedef struct VISIBILITY_HIDDEN FrameLocalsMapping { | ||
| private: | ||
| std::unordered_map<std::string, PyObject*> _map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a std::unordered_map actually much faster than a Python dict?
The thing I was suggesting in #93753 was replacing:
f_locals["foo"]
with
fastlocals[3]
where the conversion to figure out that the variable "foo" is stored at index 3 can be done ahead of time. Thus making the lookup just a single load instruction (rather than a hashtable lookup).
Since both PyDict and unordered_map are hashtables, I'd expect similar performance.
You should compute the fastlocals offset ahead of time when the guard object is constructed, and store it on the guard. So when we evaluate the guard you can replace the hashtable lookup with an array lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could use pybind11 to do this casting for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there is a guarantee on the ordering of the fastlocals names (but in practice, this may be true). The safe thing to do would be to add a check that fastlocals name order is the same.
I'd also think that a C++ unordered has less overhead than using Python dict, although I'd have to measure to be sure. The point of this PR is not to introduce the complete optimization, but rather, is a step to moving framelocals/fastlocals lookup to C++. I plan on incrementally introducing optimizations in followup PRs. For this PR, we just need to make sure there are no regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe there is a guarantee on the ordering of the fastlocals names (but in practice, this may be true). The safe thing to do would be to add a check that fastlocals name order is the same.
Isn't the order defined on the code object which is a constant? How could the order change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I hadn't considered that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think directly using fastlocals will actually be easier to implement than a unordered_map. This also matches what CPython does, I believe:
LOAD_FAST 3 is just push(fastlocals[3])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments above.
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Part of implementing #93753. Implement `FrameLocalsMapping` as a C++ `unordered_map` over a `py::dict`. This PR ensures we can pass around the C++ object around C Dynamo properly. It is still converted back into a `py::dict` for guards and convert_frame, but the next step will involve removing the `py::dict` conversion for guards. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
TODO: update PR description Benchmark update: - 713.2us -> 630.0us (3.10) - 598.8us -> 530.7us (3.12) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
Move frame local guard accessors to C++. Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible. Some implementation notes: - `FrameLocalsMapping` is now initialized as a C++ vector of `PyObject`s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells. - `FrameLocalsMapping` can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily. - We update `LeafGuard`, `GuardAccessor`, and `GuardManager`'s `check_nopybind` methods to accept `FrameLocalsMapping`. By default, we convert the `FrameLocalsMapping` to a Python dict and run the original `check_nopybind` on it, but in some cases, conversion is not needed. - We add a new guard accessor `FrameLocalsGuardAccessor`, which is similar to `DictGetItemGuardAccessor` but has special handling for `FrameLocalsMapping`. dynamo_guard_eval.py microbenchmark update: - 713.2us -> 630.0us (3.10) - 598.8us -> 530.7us (3.12) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames [ghstack-poisoned]
| DICT_SUBCLASS_GUARD_MANAGER = 3 | ||
|
|
||
|
|
||
| @functools.lru_cache(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a limit here?
| framelocals_names_reversed = code_framelocals_names_reversed_cached( | ||
| self.f_code | ||
| ) | ||
| framelocals_idx = ( | ||
| len(framelocals_names_reversed) | ||
| - framelocals_names_reversed.index(source.local_name) | ||
| - 1 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the first one return a dict so you can compute framelocals_idx in O(1) time?
ghstack-source-id: f1f02aa Pull Request resolved: pytorch/pytorch#140063
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Implements pytorch/pytorch#93753 - move frame local guard accessors to C++. Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible. Some implementation notes: - `FrameLocalsMapping` is now initialized as a C++ vector of `PyObject`s. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells. - `FrameLocalsMapping` can still be converted into a Python dict representing the frame's fastlocals, but it is done lazily. - We update `LeafGuard`, `GuardAccessor`, and `GuardManager`'s `check_nopybind` methods to accept `FrameLocalsMapping`. By default, we convert the `FrameLocalsMapping` to a Python dict and run the original `check_nopybind` on it, but in some cases, conversion is not needed. - We add a new guard accessor `FrameLocalsGuardAccessor`, which is similar to `DictGetItemGuardAccessor` but has special handling for `FrameLocalsMapping`. We create a separate class to emphasize different use cases, but we could probably combine these two (can do in a follow up) dynamo_guard_eval.py microbenchmark update: - 713.2us -> 630.0us (3.10) - 598.8us -> 530.7us (3.12) Other followups: - Add `FrameLocalsMapping` version for `check_verbose_nopybind` in order to match behavior between `check_nopybind` and `check_verbose_nopybind`. This can prevent difficult debugging situations where guards fail (`check_nopybind` returns false) but no guard error message is generated (`check_verbose_nopybind` succeeds). - Rewrite the `SHAPE_ENV` guard into C++ - it is a fairly common guard that results in `FrameLocalsMapping` needing to convert to a dict X-link: pytorch/pytorch#140063 Approved by: https://github.com/jansel ghstack dependencies: #142117, #142430 Reviewed By: huydhn Differential Revision: D67355697 Pulled By: williamwen42 fbshipit-source-id: 930855f49db629b2bc2d7b7ff14c065bd044e894
Stack from ghstack (oldest at bottom):
Implements #93753 - move frame local guard accessors to C++.
Before, we used dict accessors on a Python dict representing the frame's fastlocals that we manually build. We move this accessor to C++ and additionally use the fastlocal index whenever possible.
Some implementation notes:
FrameLocalsMappingis now initialized as a C++ vector ofPyObjects. We do not just use the frame's localsplus/fastlocals buffer because we also unbox cells.FrameLocalsMappingcan still be converted into a Python dict representing the frame's fastlocals, but it is done lazily.LeafGuard,GuardAccessor, andGuardManager'scheck_nopybindmethods to acceptFrameLocalsMapping. By default, we convert theFrameLocalsMappingto a Python dict and run the originalcheck_nopybindon it, but in some cases, conversion is not needed.FrameLocalsGuardAccessor, which is similar toDictGetItemGuardAccessorbut has special handling forFrameLocalsMapping. We create a separate class to emphasize different use cases, but we could probably combine these two (can do in a follow up)dynamo_guard_eval.py microbenchmark update:
Other followups:
FrameLocalsMappingversion forcheck_verbose_nopybindin order to match behavior betweencheck_nopybindandcheck_verbose_nopybind. This can prevent difficult debugging situations where guards fail (check_nopybindreturns false) but no guard error message is generated (check_verbose_nopybindsucceeds).SHAPE_ENVguard into C++ - it is a fairly common guard that results inFrameLocalsMappingneeding to convert to a dictcc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames