KEMBAR78
[inductor] Fix ReinterpretView call in TMADescriptor IR by aakhundov · Pull Request #138759 · pytorch/pytorch · GitHub
Skip to content

Conversation

@aakhundov
Copy link
Contributor

@aakhundov aakhundov commented Oct 23, 2024

Stack from ghstack (oldest at bottom):

As a result of #137768, ReinterpretView call in the TMADescriptor
has become invalid. This leads to some TMA tests breaking in
test_triton_kernels.py. In this PR, we fix this.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138759

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 8e797e2 with merge base 72ea7ba (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aakhundov added a commit that referenced this pull request Oct 23, 2024
As a result of #137768, `ReinterpretView` call in the `TMADescriptor`
has become invalid. This leads to some TMA tests breaking in
test_triton_kernels.py. In this PR, we fix this.

ghstack-source-id: 0f88d4f
Pull Request resolved: #138759
@aakhundov aakhundov requested review from Chillee and eellison October 23, 2024 22:45
@aakhundov aakhundov added the topic: not user facing topic category label Oct 23, 2024
Copy link
Collaborator

@Chillee Chillee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, how come I didn't see this on my PR?

@aakhundov
Copy link
Contributor Author

Wait, how come I didn't see this on my PR?

You landed your PR on 10/14 before I landed mine on 10/17. And, apparently, my base rev was older than 10/14 when I landed. I guess, it's a good habit to rebase to the newest viable/strict before landing. Will keep in mind.

@Chillee
Copy link
Collaborator

Chillee commented Oct 23, 2024

eh I should probably have warned the chat that this PR had a high chance of land races.

@aakhundov
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100)

Details for Dev Infra team Raised by workflow job

@aakhundov
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@aakhundov
Copy link
Contributor Author

@pytorchbot merge -f "unrelated failing and hanging CI jobs"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Oct 26, 2024
This fixes some leftover typos in
CreateTMADescriptorVariable.call_function (and close).

Pull Request resolved: #138877
Approved by: https://github.com/davidberard98, https://github.com/zou3519, https://github.com/Skylion007
ghstack dependencies: #138759
pytorchmergebot pushed a commit that referenced this pull request Oct 28, 2024
This adds host-side Triton TMA support to AOTInductor. Notes:

- Two helper functions, `init1DTMADescriptor` and `init2DTMADescriptor` are added to the C++ wrapper codegen on GPU, conditioned on the model having user-defined Triton kernels with host-side TMA (CUDA-specific).
- C++ wrapper codegen on GPU emits TMA descriptor initialization via the aforementioned helper functions.
- Special handling added for the TMA descriptors (in the Python wrapper codegen) during the compile-time autotuning, as the underlying tensor can't be passed directly to the user-defined Triton kernel. TMA descriptors are generated in-between the source tensor's buffer and the kernel call, like in the full Python wrapper codegen.
- This PR concludes the host-side Triton TMA support in PT2.

Pull Request resolved: #138878
Approved by: https://github.com/desertfire, https://github.com/chenyang78
ghstack dependencies: #138759, #138877
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Oct 29, 2024
This adds host-side Triton TMA support to AOTInductor. Notes:

- Two helper functions, `init1DTMADescriptor` and `init2DTMADescriptor` are added to the C++ wrapper codegen on GPU, conditioned on the model having user-defined Triton kernels with host-side TMA (CUDA-specific).
- C++ wrapper codegen on GPU emits TMA descriptor initialization via the aforementioned helper functions.
- Special handling added for the TMA descriptors (in the Python wrapper codegen) during the compile-time autotuning, as the underlying tensor can't be passed directly to the user-defined Triton kernel. TMA descriptors are generated in-between the source tensor's buffer and the kernel call, like in the full Python wrapper codegen.
- This PR concludes the host-side Triton TMA support in PT2.

Pull Request resolved: pytorch#138878
Approved by: https://github.com/desertfire, https://github.com/chenyang78
ghstack dependencies: pytorch#138759, pytorch#138877
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
This adds host-side Triton TMA support to AOTInductor. Notes:

- Two helper functions, `init1DTMADescriptor` and `init2DTMADescriptor` are added to the C++ wrapper codegen on GPU, conditioned on the model having user-defined Triton kernels with host-side TMA (CUDA-specific).
- C++ wrapper codegen on GPU emits TMA descriptor initialization via the aforementioned helper functions.
- Special handling added for the TMA descriptors (in the Python wrapper codegen) during the compile-time autotuning, as the underlying tensor can't be passed directly to the user-defined Triton kernel. TMA descriptors are generated in-between the source tensor's buffer and the kernel call, like in the full Python wrapper codegen.
- This PR concludes the host-side Triton TMA support in PT2.

Pull Request resolved: pytorch#138878
Approved by: https://github.com/desertfire, https://github.com/chenyang78
ghstack dependencies: pytorch#138759, pytorch#138877
@github-actions github-actions bot deleted the gh/aakhundov/12/head branch November 25, 2024 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants