Turn on static cuda launcher in OSS #151691

jamesjwu · 2025-04-18T18:46:55Z

Stack from ghstack (oldest at bottom):

-> Turn on static cuda launcher in OSS #151691

After a few small bugfixes on tests (to make it so we throw/catch similar exceptions to triton), I think we're ready to flip the switch and use StaticCudaLauncher on by default in OSS.

Initial round of benchmarks look good, with average compilation time going down by a few percent:

With no changes to runtime perf:

There are a few noisy models I want to double check, though, so will run some more tests before accepting review.

Full benchmark results, showing a ~5% compile time improvement across the board:
https://hud.pytorch.org/benchmark/huggingface/inductor_with_cudagraphs?dashboard=torchinductor&startTime=Wed%2C%2016%20Apr%202025%2002%3A31%3A12%20GMT&stopTime=Wed%2C%2023%20Apr%202025%2002%3A31%3A12%20GMT&granularity=hour&mode=training&dtype=amp&deviceName=cuda%20(a100)&lBranch=gh/jamesjwu/139/orig&lCommit=cc45c8667fa23dec16ca50002d9504a34688ca5c&rBranch=main&rCommit=2a9afdae81d0dde98e96d7e3c9ca840e241e5405

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-04-18T18:46:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151691

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 29 Pending

As of commit 01feeaf with merge base 6efc572 ():
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳ pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail / build (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 4d0d534 Pull Request resolved: #151691

[ghstack-poisoned]

ghstack-source-id: 606ac45 Pull Request resolved: #151691

[ghstack-poisoned]

ghstack-source-id: 68bcacf Pull Request resolved: #151691

[ghstack-poisoned]

ghstack-source-id: c718c76 Pull Request resolved: #151691

jamesjwu · 2025-04-23T02:35:08Z

I double checked the regression for BartForConditionalGenerati, and saw that 4/19's perf run on main was an outlier in terms of execution speedup. It usually floats around 1.4x, so it's unlikely that static cuda launcher is slowing it down.

For example, if compared to the 4/19 or 4/21 run, you get something like this:

jamesjwu · 2025-04-23T15:35:53Z

@pytorchbot merge

pytorchmergebot · 2025-04-23T15:37:49Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[ghstack-poisoned]

ghstack-source-id: cb1c5a8 Pull Request resolved: #151691

seemethere · 2025-04-25T00:34:54Z

@pytorchbot merge

pytorchmergebot · 2025-04-25T00:36:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-04-25T02:00:18Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm-py3.10 / test (default, 2, 2, linux.rocm.gpu.2)

Details for Dev Infra team

Raised by workflow job

etaf · 2025-04-25T02:39:47Z

Hi, may I suggest we disable use_static_cuda_launcher on XPU by default? Since the implementation is specific for CUDA, this PR will break XPU. I'll generalize use_static_cuda_launcher with use_static_gpu_launcher and turn it on for XPU. Thanks.

jamesjwu · 2025-04-25T15:30:56Z

@pytorchbot merge

jamesjwu · 2025-04-25T15:31:14Z

Oops just saw that comment, will do that first

pytorchmergebot · 2025-04-25T15:32:50Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-04-25T15:33:12Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

jamesjwu · 2025-04-25T15:35:41Z

@etaf does XPU not change the device_type listed here? I had figured this would gate static cuda launcher to only cuda device type.

pytorch/torch/_inductor/runtime/triton_heuristics.py

Line 1213 in d4a8e4e

if triton_meta.get("device_type", None) != "cuda":

jamesjwu · 2025-04-25T15:46:05Z

I'm somewhat confident that that check prevents XPU from using StaticCudaLauncher, as tests like inductor/test_xpu_basic.py pass with this PR. From this line we can see "xpu" does appear as a valid device_type:

pytorch/torch/_inductor/runtime/triton_heuristics.py

Line 909 in d4a8e4e

bin_type = {"hip": "hsaco", "xpu": "spv"}.get(self.device_props.type, "cubin")

etaf · 2025-04-25T16:06:09Z

I'm somewhat confident that that check prevents XPU from using StaticCudaLauncher, as tests like inductor/test_xpu_basic.py pass with this PR. From this line we can see "xpu" does appear as a valid device_type:

pytorch/torch/_inductor/runtime/triton_heuristics.py

Line 909 in d4a8e4e

bin_type = {"hip": "hsaco", "xpu": "spv"}.get(self.device_props.type, "cubin")

Great, thanks!

etaf · 2025-04-25T16:10:03Z

@etaf does XPU not change the device_type listed here? I had figured this would gate static cuda launcher to only cuda device type.

pytorch/torch/_inductor/runtime/triton_heuristics.py

Line 1213 in d4a8e4e

if triton_meta.get("device_type", None) != "cuda":

Thanks! then this PR will not break XPU, please go ahead.

jamesjwu · 2025-04-25T16:10:27Z

@pytorchbot merge

pytorchmergebot · 2025-04-25T16:13:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

malfet · 2025-04-25T17:46:38Z

@pytorchbot merge -f "Workflow has been scheduled?"

pytorchmergebot · 2025-04-25T17:46:56Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2025-04-25T17:48:39Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update

7850479

[ghstack-poisoned]

jamesjwu mentioned this pull request Apr 18, 2025

Lift guard checking logic to AOTAutogradCache #151563

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Apr 18, 2025

jamesjwu added a commit that referenced this pull request Apr 18, 2025

Turn on static cuda launcher

db268ce

ghstack-source-id: 4d0d534 Pull Request resolved: #151691

jamesjwu added the topic: not user facing topic category label Apr 18, 2025

Update

853ad25

[ghstack-poisoned]

jamesjwu marked this pull request as draft April 21, 2025 15:09

Update

ab6d92c

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Apr 21, 2025

Turn on static cuda launcher

7fa4d05

ghstack-source-id: 606ac45 Pull Request resolved: #151691

Update

511d642

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Apr 21, 2025

Turn on static cuda launcher

2efd181

ghstack-source-id: 68bcacf Pull Request resolved: #151691

jamesjwu added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 22, 2025

Rebase

80ebcbe

[ghstack-poisoned]

Rebase onto a main

3ff3347

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Apr 22, 2025

Turn on static cuda launcher

cc45c86

ghstack-source-id: c718c76 Pull Request resolved: #151691

jamesjwu changed the title ~~Turn on static cuda launcher~~ Turn on static cuda launcher in OSS Apr 22, 2025

jamesjwu requested review from eellison, jansel and oulgen April 23, 2025 02:33

oulgen approved these changes Apr 23, 2025

View reviewed changes

jamesjwu marked this pull request as ready for review April 23, 2025 15:24

pytorchmergebot added the merging label Apr 23, 2025

Update

01feeaf

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Apr 24, 2025

Turn on static cuda launcher

c1b3f70

ghstack-source-id: cb1c5a8 Pull Request resolved: #151691

pytorchmergebot removed the merging label Apr 25, 2025

pytorchmergebot added the merging label Apr 25, 2025

EikanWang approved these changes Apr 25, 2025

View reviewed changes

malfet mentioned this pull request Apr 25, 2025

[CI] No workflows scheduled on PRs #151322

Open

pytorchmergebot closed this in 0dae27d Apr 25, 2025

pytorchmergebot removed the merging label Apr 25, 2025

jamesjwu mentioned this pull request Jun 9, 2025

Use variadic argument pre-compiled cuda launcher triton-lang/triton#6788

Open

github-actions bot deleted the gh/jamesjwu/139/head branch June 12, 2025 02:23

Turn on static cuda launcher in OSS #151691

Turn on static cuda launcher in OSS #151691

Uh oh!

Conversation

jamesjwu commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151691

⏳ No Failures, 29 Pending

Uh oh!

jamesjwu commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamesjwu commented Apr 23, 2025

Uh oh!

pytorchmergebot commented Apr 23, 2025

Merge started

Uh oh!

seemethere commented Apr 25, 2025

Uh oh!

pytorchmergebot commented Apr 25, 2025

Merge started

Uh oh!

pytorchmergebot commented Apr 25, 2025

Merge failed

Uh oh!

etaf commented Apr 25, 2025

Uh oh!

jamesjwu commented Apr 25, 2025

Uh oh!

jamesjwu commented Apr 25, 2025

Uh oh!

pytorchmergebot commented Apr 25, 2025

Merge started

Uh oh!

pytorchmergebot commented Apr 25, 2025

Uh oh!

jamesjwu commented Apr 25, 2025

Uh oh!

jamesjwu commented Apr 25, 2025

Uh oh!

etaf commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etaf commented Apr 25, 2025

Uh oh!

jamesjwu commented Apr 25, 2025

Uh oh!

pytorchmergebot commented Apr 25, 2025

Merge started

Uh oh!

malfet commented Apr 25, 2025

Uh oh!

pytorchmergebot commented Apr 25, 2025

Uh oh!

pytorchmergebot commented Apr 25, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jamesjwu commented Apr 18, 2025 •

edited

Loading

pytorch-bot bot commented Apr 18, 2025 •

edited

Loading

jamesjwu commented Apr 23, 2025 •

edited

Loading

etaf commented Apr 25, 2025 •

edited

Loading