[profiler] update CUDA runtime kernel identification logic #157890

namgyu-youn · 2025-07-09T02:46:34Z

Update CUDA kernel detection to exclude memory API calls

References:

pytorch-bot · 2025-07-09T02:46:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157890

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit b0b748d with merge base ee72338 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

namgyu-youn · 2025-07-09T02:47:38Z

@pytorchbot label "release notes: profiler"

sraikund16 · 2025-07-09T18:01:54Z

torch/profiler/_utils.py

+
+            name = str(getattr(e, "name", e)).lower()
+            # Exclude launcher and memory events
+            exclude_patterns = ["cudalaunch", "cudamem"]


Not sure if this is necessary. cudalaunch will already have DeviceType.CPU. I think we should just check if kernel is in the name...

Thanks, I misunderstood it due to my lack of background. Then how about keeping the current structure and removing #TODO? (cc @davidchencsl )

I think until we can procedurally determine it is a kernel besides looking at the name we should keep the todo

@sraikund16 Thanks for your review. For focusing on excluding memory kernel, I updated for more edge-cases in this commit. Could you take a look at this PR?

sraikund16 · 2025-07-23T21:06:01Z

@pytorchbot rebase

pytorchmergebot · 2025-07-23T21:07:28Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

Use cases: - "mem" : "cudaMemcpy", "cudaMemset" - "alloc" : "cudaMalloc", "cudaMallocManaged" - "free" : "cudaFree"

pytorchmergebot · 2025-07-23T21:07:31Z

Successfully rebased profiler_kernel onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout profiler_kernel && git pull --rebase)

namgyu-youn · 2025-07-24T03:30:32Z

@pytorchbot merge

pytorch-bot · 2025-07-24T03:30:36Z

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

namgyu-youn · 2025-07-24T16:23:25Z

@pytorchmergebot merge

pytorchmergebot · 2025-07-24T16:25:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update CUDA kernel detection to exclude memory API calls References: - https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html - https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXECUTION.html Pull Request resolved: #157890 Approved by: https://github.com/sraikund16

namgyu-youn requested a review from sraikund16 as a code owner July 9, 2025 02:46

pytorch-bot bot added the release notes: profiler release notes category label Jul 9, 2025

pytorchbot added the open source label Jul 9, 2025

sraikund16 reviewed Jul 9, 2025

View reviewed changes

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 14, 2025

namgyu-youn marked this pull request as draft July 15, 2025 03:30

namgyu-youn marked this pull request as ready for review July 15, 2025 10:14

namgyu-youn requested a review from sraikund16 July 15, 2025 10:14

sraikund16 approved these changes Jul 23, 2025

View reviewed changes

namgyu-youn added 2 commits July 23, 2025 21:07

update CUDA runtime kernel identification logic

f97305f

Expand exclude patterns for CUDA kernel detection

b0b748d

Use cases: - "mem" : "cudaMemcpy", "cudaMemset" - "alloc" : "cudaMalloc", "cudaMallocManaged" - "free" : "cudaFree"

pytorchmergebot force-pushed the profiler_kernel branch from 1f85809 to b0b748d Compare July 23, 2025 21:07

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 24, 2025

pytorchmergebot added the merging label Jul 24, 2025

pytorchmergebot closed this in aeaa200 Jul 24, 2025

pytorchmergebot added Merged and removed merging labels Jul 24, 2025

namgyu-youn deleted the profiler_kernel branch July 24, 2025 19:15

[profiler] update CUDA runtime kernel identification logic #157890

[profiler] update CUDA runtime kernel identification logic #157890

Uh oh!

Conversation

namgyu-youn commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157890

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

namgyu-youn commented Jul 9, 2025

Uh oh!

sraikund16 Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sraikund16 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

namgyu-youn Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

sraikund16 commented Jul 23, 2025

Uh oh!

pytorchmergebot commented Jul 23, 2025

Uh oh!

pytorchmergebot commented Jul 23, 2025

Uh oh!

namgyu-youn commented Jul 24, 2025

Uh oh!

pytorch-bot bot commented Jul 24, 2025

Uh oh!

namgyu-youn commented Jul 24, 2025

Uh oh!

pytorchmergebot commented Jul 24, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

namgyu-youn commented Jul 9, 2025 •

edited

Loading

pytorch-bot bot commented Jul 9, 2025 •

edited

Loading

namgyu-youn Jul 10, 2025 •

edited

Loading