-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Inductor] Add Dynamic shape support to user defined triton kernels #112523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1) This PR moves the grid function codegen to wrapper so that we can use IndentBuffers as opposed to manually adding tabs for indentation. 2) In inductor, emits the grid function in the body of the kernel call so that it can use free symbols from dynamic shapes [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112523
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 25684fb with merge base 5a6f801 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| grid, code = user_defined_kernel_grid_fn_code(kernel_name, configs, grid) | ||
| # Must happen after free symbols are already codegened | ||
| with self.prefix.indent(): | ||
| self.prefix.splice(code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for feedback here. I moved the grid function to the prefix (rather than the header) so that it can access the free symbols from dynamic shapes. This happens to work because prefix contains the free symbols. Is there a better solution or is there a way for me to assert that free symbols are already generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the grid function is a function (and not some inlined block of code) would all the symbols defined in the outer scope, including the ones defined after this function, not be visible in the inner scope of this function's body? E.g., this works (in Python):
def fn(a):
return a * b
b = 123
c = fn(456)The main thing is that fn is called before the b is defined. But otherwise it's fine that b is defined after fn. So we need to make sure that the calls to Triton kernels are codegened after the required symbols' definitions, but the grid functions can as well be anywhere in the call function's body before the kernel is called?
As for whether the required symbols will be codegened before the call, as the Triton kernel is represented by a Buffer in the IR which has dependencies, I'd hope that the existing dependency management mechanics should take care of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know this is how python semantics worked.. C++ certainly does not work this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how we decide to deal with codegening the grid in the AOTInductor's C++ wrapper codegen, maybe we could opt for relying on this feature of Python.
| for grid, c in zip(grids, configs): | ||
| guards = [f"meta['{name}'] == {val}" for name, val in c.kwargs.items()] | ||
| guards = " and ".join(guards) | ||
| output.writeline(f"if {guards}: return {grid}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also generate an exception at the end saying smth like f"no matching Triton config found for the kernel {name=} and {meta=}"? Otherwise, the function will return None and I'm not sure how clear the downstream error would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triton will error saying that no suitable grid was found for {name}. But yes, we will not know what meta looked like. I could add the exception but by construction aren't we always guaranteed to match? Like what's scenario that we would fail to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the scenario, more like a defensive / informative code against the unexpected :)
|
Can you provide an example of what the generated code looks like? |
Full code: P870612754 |
|
|
||
| @triton_kernel_wrapper_mutation.py_impl(DispatchKey.CompositeExplicitAutograd) | ||
| def triton_kernel_wrapper_mutation_dense(*, kernel_idx, grid, kwargs): | ||
| from torch._inductor.codegen.wrapper import user_defined_kernel_grid_fn_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed some PRs: previously, we were evaluating the grid at Dynamo time. Did that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are still doing the same, this is just for emitting the python code for multi option grid function.
From above
This PR moves the grid function codegen to wrapper so that we can use IndentBuffers as opposed to manually adding tabs for indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, thanks for clarifying
|
@oulgen And what's the user code for that generated inductor code? |
from the test cases Is there anything particular you're looking at? |
…n kernels" 1) This PR moves the grid function codegen to wrapper so that we can use IndentBuffers as opposed to manually adding tabs for indentation. 2) In inductor, emits the grid function in the body of the kernel call so that it can use free symbols from dynamic shapes cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the grid function evaluated at dynamo time for all of the config options? is that code in this PR?
| t = torch.rand(4, 4, device="cuda") | ||
| t_view = t.view(16) | ||
|
|
||
| compiled_func = torch.compile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add some more concrete tests about recompilation here with dynamic shapes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do as follow up
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ytorch#112523) 1) This PR moves the grid function codegen to wrapper so that we can use IndentBuffers as opposed to manually adding tabs for indentation. 2) In inductor, emits the grid function in the body of the kernel call so that it can use free symbols from dynamic shapes Pull Request resolved: pytorch#112523 Approved by: https://github.com/Chillee
…ytorch#112523) 1) This PR moves the grid function codegen to wrapper so that we can use IndentBuffers as opposed to manually adding tabs for indentation. 2) In inductor, emits the grid function in the body of the kernel call so that it can use free symbols from dynamic shapes Pull Request resolved: pytorch#112523 Approved by: https://github.com/Chillee
Stack from ghstack (oldest at bottom):
IndentBuffers as opposed to manually adding tabs for indentation.
that it can use free symbols from dynamic shapes
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler