-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Profiler] Add CUDA Overhead to Auto-trace #142271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142271
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 40845a2 with merge base d3d1a78 ( UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D66904879 |
✅ Deploy Preview for chimerical-cranachan-793287 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
e52cb9b to
02e2c3e
Compare
Summary:
We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace
Test Plan:
Tested using servicelab and found no performance difference:
kineto_benchmark
duration_ms: 21668
number_of_events: 26542
profiler_prepare_call_duration_us: 970
profiler_enable_call_duration_us: 616474
profiling_window_duration_us: 2188525
profiler_disable_call_duration_us: 148628
parse_kineto_call_duration_us: 1672536
function_events_build_tree_call_duration_us: 285939
kineto_benchmark
duration_ms: 21718
number_of_events: 26556
profiler_prepare_call_duration_us: 885
profiler_enable_call_duration_us: 7037
profiling_window_duration_us: 1772481
profiler_disable_call_duration_us: 174122
parse_kineto_call_duration_us: 1983683
function_events_build_tree_call_duration_us: 333582
Differential Revision: D66904879
|
This pull request was exported from Phabricator. Differential Revision: D66904879 |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
@pytorchbot rebase -b main |
|
@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here |
Summary:
We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace
Test Plan:
Tested using servicelab and found no performance difference:
kineto_benchmark
duration_ms: 21668
number_of_events: 26542
profiler_prepare_call_duration_us: 970
profiler_enable_call_duration_us: 616474
profiling_window_duration_us: 2188525
profiler_disable_call_duration_us: 148628
parse_kineto_call_duration_us: 1672536
function_events_build_tree_call_duration_us: 285939
kineto_benchmark
duration_ms: 21718
number_of_events: 26556
profiler_prepare_call_duration_us: 885
profiler_enable_call_duration_us: 7037
profiling_window_duration_us: 1772481
profiler_disable_call_duration_us: 174122
parse_kineto_call_duration_us: 1983683
function_events_build_tree_call_duration_us: 333582
Differential Revision: D66904879
|
Successfully rebased |
02e2c3e to
40845a2
Compare
|
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace
Test Plan: Tested using internal performance suites and found no noticeable performance change
Differential Revision: D66904879