KEMBAR78
[Inductor][ROCm][CK] add CK grouped conv2d fwd kernels to ROCm codegen by tenpercent · Pull Request #137947 · pytorch/pytorch · GitHub
Skip to content

Conversation

@tenpercent
Copy link
Collaborator

@tenpercent tenpercent commented Oct 15, 2024

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137947

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 2472fbb with merge base 60c1433 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@tenpercent
Copy link
Collaborator Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Oct 18, 2024
@tenpercent tenpercent changed the title [Draft][Inductor][CK] add conv2d kernels [Draft][Inductor][CK] add CK grouped conv2d fwd kernels to ROCm codegen Oct 18, 2024
@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels Oct 18, 2024
@tenpercent tenpercent marked this pull request as ready for review October 18, 2024 17:51
@tenpercent tenpercent changed the title [Draft][Inductor][CK] add CK grouped conv2d fwd kernels to ROCm codegen [Inductor][ROCm][CK] add CK grouped conv2d fwd kernels to ROCm codegen Oct 18, 2024
@chenyang78
Copy link
Contributor

Thanks, Max! Could we add some tests?

@tenpercent
Copy link
Collaborator Author

Thanks, Max! Could we add some tests?

Could we add the tests later, along with lowering?

@tenpercent
Copy link
Collaborator Author

@pytorchbot rebase -s

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased ck-conv onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ck-conv && git pull --rebase)

@tenpercent
Copy link
Collaborator Author

Link #125453

if (
is_nonzero
and use_ck_gemm_template(layout)
and V.graph.sizevars.size_hint(m * n * k, fallback=-1) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was thinking whether it made sense to pull this into use_ck_gemm_template

def torch_layout_to_ck_layouts(torch_layout):
# logically, torch tensors are always NCHW,
# and channels-last memory layout is visible in the strides
if torch_layout.stride[-1] == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use statically_known_equals(stride[-1], 1) here for dynamic shape support

https://github.com/pytorch/pytorch/blob/134f6cda7e192046921f44b9565a3b56cd28158d/torch/_inductor/sizevars.py#L340C9-L340C32

)
# NB: when using a fixed list order, most likely we will pick the subset of instances
# which are very similar to each other. Randomizing the choice seems to solve this.
random.seed(-11)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also see this in ck_universal_gemm_template, might be nice to reference the same seed here!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy/paste artifact, I might change it if the instances picked for test do not serve well for testing purpose - that is, when we introduce the end-to-end test

@tenpercent
Copy link
Collaborator Author

@pytorchbot rebase -s

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased ck-conv onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ck-conv && git pull --rebase)

@pytorchmergebot
Copy link
Collaborator

Successfully rebased ck-conv onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ck-conv && git pull --rebase)

@tenpercent
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

SamGinzburg pushed a commit that referenced this pull request Oct 28, 2024
#137947)

Plug into lowering and end to end test in a later PR

Instance parsing companion PR ROCm/composable_kernel#1585

Pull Request resolved: #137947
Approved by: https://github.com/ColinPeppler, https://github.com/chenyang78
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants