-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Unify invoking JIT functions #57851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify invoking JIT functions #57851
Conversation
The same as the previous PR, but for JIT functions. Differential Revision: [D28253841](https://our.internmc.facebook.com/intern/diff/D28253841/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 9baf610 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Build | 🔁 rerun |
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
The same as the previous PR, but for JIT functions. Differential Revision: [D28253841](https://our.internmc.facebook.com/intern/diff/D28253841/) [ghstack-poisoned]
The same as the previous PR, but for JIT functions. Differential Revision: [D28253841](https://our.internmc.facebook.com/intern/diff/D28253841/) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| bool hasOp() const; | ||
| std::shared_ptr<Operator> op() const; | ||
| bool hasQualifiedName() const; | ||
| // FIXME It's weird to return a const _value_, should this return a reference? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, returning const reference look better to me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it then
|
|
||
| // Helpers to run user-defined functions, operators and other computations. | ||
|
|
||
| c10::intrusive_ptr<JitFuture> runJitFunction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar as the comment in the previous PR, this does not need to be a member function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be used by the subclass later in the stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, we are adding a new subclass of RequestCallbackImpl? IIUC, it does not have a subclass today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, no, got confused with the previous PR. This method in fact will just be used in here, so it could perhaps be a simple function, though I think it might use some other methods in turn, and also making a method is more "consistent" with runJitOperator. Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, LGTM
| .jitCompilationUnit() | ||
| ->get_function(scriptRemoteCall.qualifiedName()) | ||
| .runAsync(stack); | ||
| if (jitFuture->completed()) { // short-cut. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are skipping fast passes, let's measure the impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What benchmarks would you like me to run?
Also, I don't remember if I said it clearly before, so let me explain why I think it's better to avoid these fastpaths/shortcuts: they are terribly dangerous for CUDA sync, because at no point we're synchronizing the events contained in the future with the current streams at the callsite. (It can be done, one just has to call future->wait(), but we didn't do it anywhere, and it's super easy to forget about).
Moreover, I believe they hurt readability, because they force us to lay out the code in a different order than how it's executed. Namely, now we need to do something like this:
continuation = []() { bar(); };
auto fut = foo();
if (fut->completed()) {
continuation();
} else {
fut->addCallback(continuation);
}
Note how in the code bar() appears before foo(), whereas in the "logical flow" of the program bar() comes after foo(). This requires the reader to "jump around" and makes it harder to quickly understand what the code is doing. Without shortcuts, the above code becomes:
auto fut = foo();
fut->addCallback([]() { bar(); });
As for the supposed performance benefit of these fastpaths, I honestly don't think that we're introducing a regression. The addCallback method itself already has such a shortcut logic internally, so in fact these fastpaths were redundant. Moreover, after the changes in #57638, invoking addCallback should come with literally zero overhead, and the compiler will most likely inline it all the time. (I said addCallback but I also mean then since they're basically the same).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, @jjlilley added this a while ago when optimizing PyPer perf. So, I assume there was a reason to introduce fast path at least at that time.
Moreover, after the changes in #57638, invoking addCallback should come with literally zero overhead, and the compiler will most likely inline it all the time.
If we are confident there is no additional overhead introduce, this looks good to me.
The same as the previous PR, but for JIT functions. Differential Revision: [D28253841](https://our.internmc.facebook.com/intern/diff/D28253841/) [ghstack-poisoned]
The same as the previous PR, but for JIT functions. Differential Revision: [D28253841](https://our.internmc.facebook.com/intern/diff/D28253841/) [ghstack-poisoned]
The same as the previous PR, but for JIT functions. Differential Revision: [D28253841](https://our.internmc.facebook.com/intern/diff/D28253841/) [ghstack-poisoned]
The same as the previous PR, but for JIT functions. Differential Revision: [D28253841](https://our.internmc.facebook.com/intern/diff/D28253841/) [ghstack-poisoned]
Pull Request resolved: pytorch#57851 The same as the previous PR, but for JIT functions. ghstack-source-id: 129567069 Differential Revision: [D28253841](https://our.internmc.facebook.com/intern/diff/D28253841/)
|
This pull request has been merged in bfdc279. |
Stack from ghstack:
The same as the previous PR, but for JIT functions.
Differential Revision: D28253841