KEMBAR78
[Profiler] Add CUDA Overhead to Auto-trace by sraikund16 · Pull Request #142271 · pytorch/pytorch · GitHub
Skip to content

Conversation

@sraikund16
Copy link
Contributor

@sraikund16 sraikund16 commented Dec 6, 2024

Summary: We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace

Test Plan: Tested using internal performance suites and found no noticeable performance change

Differential Revision: D66904879

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142271

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 40845a2 with merge base d3d1a78 (image):

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66904879

@netlify
Copy link

netlify bot commented Dec 6, 2024

Deploy Preview for chimerical-cranachan-793287 ready!

Name Link
🔨 Latest commit e52cb9b
🔍 Latest deploy log https://app.netlify.com/sites/chimerical-cranachan-793287/deploys/6753850ffcb6ef00083cd12c
😎 Deploy Preview https://deploy-preview-142271--chimerical-cranachan-793287.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2024
@ngimel ngimel added the release notes: profiler release notes category label Dec 9, 2024
pytorch-bot bot pushed a commit that referenced this pull request Dec 10, 2024
Summary:

We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace

Test Plan:
Tested using servicelab and found no performance difference:
kineto_benchmark
    duration_ms: 21668
    number_of_events: 26542
    profiler_prepare_call_duration_us: 970
    profiler_enable_call_duration_us: 616474
    profiling_window_duration_us: 2188525
    profiler_disable_call_duration_us: 148628
    parse_kineto_call_duration_us: 1672536
    function_events_build_tree_call_duration_us: 285939


kineto_benchmark
    duration_ms: 21718
    number_of_events: 26556
    profiler_prepare_call_duration_us: 885
    profiler_enable_call_duration_us: 7037
    profiling_window_duration_us: 1772481
    profiler_disable_call_duration_us: 174122
    parse_kineto_call_duration_us: 1983683
    function_events_build_tree_call_duration_us: 333582

Differential Revision: D66904879
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66904879

@sraikund16
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #142271, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@sraikund16
Copy link
Contributor Author

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

Summary:

We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace

Test Plan:
Tested using servicelab and found no performance difference:
kineto_benchmark
    duration_ms: 21668
    number_of_events: 26542
    profiler_prepare_call_duration_us: 970
    profiler_enable_call_duration_us: 616474
    profiling_window_duration_us: 2188525
    profiler_disable_call_duration_us: 148628
    parse_kineto_call_duration_us: 1672536
    function_events_build_tree_call_duration_us: 285939


kineto_benchmark
    duration_ms: 21718
    number_of_events: 26556
    profiler_prepare_call_duration_us: 885
    profiler_enable_call_duration_us: 7037
    profiling_window_duration_us: 1772481
    profiler_disable_call_duration_us: 174122
    parse_kineto_call_duration_us: 1983683
    function_events_build_tree_call_duration_us: 333582

Differential Revision: D66904879
@pytorchmergebot
Copy link
Collaborator

Successfully rebased export-D66904879 onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout export-D66904879 && git pull --rebase)

@sraikund16
Copy link
Contributor Author

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged release notes: profiler release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants