Switch to CUDA event based profiling #109338

ipiszy · 2023-09-15T00:18:19Z

In #107901, the CUDA event based
profiling is changed to profiler based profiling to avoid counting CPU-side
kernel launch overhead in final latency numbers. However, it turns out that
torch.profile() is significantly slower than CUDA event which affects model
compilation speed quite significantlly. This PR changes back to CUDA event
based profiling.

Follow-ups:

Try CUDA event profiling with CUDAGraphs;
Multi-GPU profiling;

Stack from ghstack (oldest at bottom):

-> Switch to CUDA event based profiling #109338

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

[ghstack-poisoned]

pytorch-bot · 2023-09-15T00:18:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109338

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit d025ffe with merge base 9021fb8 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

frank-wei

LGTM!

aakhundov

@ipiszy Should we also change these call sites:

pytorch/torch/_inductor/select_algorithm.py

Line 648 in 0cbca85

return do_bench_using_profiling(lambda: algo(*args))
pytorch/torch/_inductor/codegen/common.py

Line 1068 in 0cbca85

return do_bench_using_profiling(lambda: algo(*args, out=out))

ipiszy · 2023-09-15T20:05:27Z

@pytorchbot merge

ipiszy · 2023-09-15T20:06:53Z

Thanks @adnanaziz ! Good catch! Sorry I missed these.

pytorchmergebot · 2023-09-15T20:07:12Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

In #107901, the CUDA event based profiling is changed to profiler based profiling to avoid counting CPU-side kernel launch overhead in final latency numbers. However, it turns out that torch.profile() is significantly slower than CUDA event which affects model compilation speed quite significantlly. This PR changes back to CUDA event based profiling. Follow-ups: * Try CUDA event profiling with CUDAGraphs; * Multi-GPU profiling; cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ghstack-source-id: 4fab7c5 Pull Request resolved: #109338

ipiszy · 2023-09-15T20:14:52Z

@pytorchbot label "topic: not user facing"

ipiszy · 2023-09-15T20:15:04Z

@pytorchbot merge

pytorchmergebot · 2023-09-15T20:16:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-15T20:46:59Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

In #107901, the CUDA event based profiling is changed to profiler based profiling to avoid counting CPU-side kernel launch overhead in final latency numbers. However, it turns out that torch.profile() is significantly slower than CUDA event which affects model compilation speed quite significantlly. This PR changes back to CUDA event based profiling. Follow-ups: * Try CUDA event profiling with CUDAGraphs; * Multi-GPU profiling; cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ghstack-source-id: bfdfb2a Pull Request resolved: #109338

ipiszy · 2023-09-15T21:00:15Z

@pytorchbot merge

pytorchmergebot · 2023-09-15T21:03:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-15T22:50:10Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team

Raised by workflow job

ipiszy · 2023-09-17T03:17:45Z

@pytorchbot merge

pytorchmergebot · 2023-09-17T03:19:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-17T03:19:25Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team

Raised by workflow job

ipiszy · 2023-09-17T06:03:01Z

@pytorchbot merge -f "unrelated failure"

pytorchmergebot · 2023-09-17T06:04:36Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Switch to CUDA event based profiling

ee4db99

[ghstack-poisoned]

github-actions bot added module: inductor ciflow/inductor labels Sep 15, 2023

ipiszy requested review from aakhundov and kadeng September 15, 2023 03:22

frank-wei approved these changes Sep 15, 2023

View reviewed changes

aakhundov requested changes Sep 15, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 15, 2023

pytorchmergebot added the merging label Sep 15, 2023

pytorchmergebot removed the merging label Sep 15, 2023

ipiszy added a commit that referenced this pull request Sep 15, 2023

Switch to CUDA event based profiling

61471e4

ghstack-source-id: 4fab7c5 Pull Request resolved: #109338

pytorch-bot bot added the topic: not user facing topic category label Sep 15, 2023

ipiszy requested a review from aakhundov September 15, 2023 20:15

pytorchmergebot added the merging label Sep 15, 2023

pytorchmergebot removed the merging label Sep 15, 2023

ipiszy added a commit that referenced this pull request Sep 15, 2023

Switch to CUDA event based profiling

6dd83dc

ghstack-source-id: bfdfb2a Pull Request resolved: #109338

pytorchmergebot added the merging label Sep 15, 2023

pytorchmergebot removed the merging label Sep 15, 2023

pytorchmergebot added the merging label Sep 17, 2023

pytorchmergebot removed the merging label Sep 17, 2023

pytorchmergebot added the merging label Sep 17, 2023

pytorchmergebot added Merged and removed merging labels Sep 17, 2023

pytorchmergebot closed this in d8da2a7 Sep 17, 2023

facebook-github-bot deleted the gh/ipiszy@gmail.com/9/head branch September 20, 2023 14:22

Switch to CUDA event based profiling #109338

Switch to CUDA event based profiling #109338

Uh oh!

Conversation

ipiszy commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109338

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

frank-wei left a comment

Choose a reason for hiding this comment

Uh oh!

aakhundov left a comment

Choose a reason for hiding this comment

Uh oh!

ipiszy commented Sep 15, 2023

Uh oh!

ipiszy commented Sep 15, 2023

Uh oh!

pytorchmergebot commented Sep 15, 2023

Merge failed

Uh oh!

ipiszy commented Sep 15, 2023

Uh oh!

ipiszy commented Sep 15, 2023

Uh oh!

pytorchmergebot commented Sep 15, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 15, 2023

Merge failed

Uh oh!

ipiszy commented Sep 15, 2023

Uh oh!

pytorchmergebot commented Sep 15, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 15, 2023

Merge failed

Uh oh!

ipiszy commented Sep 17, 2023

Uh oh!

pytorchmergebot commented Sep 17, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 17, 2023

Merge failed

Uh oh!

ipiszy commented Sep 17, 2023

Uh oh!

pytorchmergebot commented Sep 17, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ipiszy commented Sep 15, 2023 •

edited

Loading

pytorch-bot bot commented Sep 15, 2023 •

edited

Loading