KEMBAR78
Switch to CUDA event based profiling by ipiszy · Pull Request #109338 · pytorch/pytorch · GitHub
Skip to content

Conversation

@ipiszy
Copy link
Contributor

@ipiszy ipiszy commented Sep 15, 2023

In #107901, the CUDA event based
profiling is changed to profiler based profiling to avoid counting CPU-side
kernel launch overhead in final latency numbers. However, it turns out that
torch.profile() is significantly slower than CUDA event which affects model
compilation speed quite significantlly. This PR changes back to CUDA event
based profiling.

Follow-ups:

  • Try CUDA event profiling with CUDAGraphs;
  • Multi-GPU profiling;

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 15, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109338

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit d025ffe with merge base 9021fb8 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@frank-wei frank-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@aakhundov aakhundov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ipiszy Should we also change these call sites:

  1. return do_bench_using_profiling(lambda: algo(*args))

  2. return do_bench_using_profiling(lambda: algo(*args, out=out))

@ipiszy
Copy link
Contributor Author

ipiszy commented Sep 15, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 15, 2023
@ipiszy
Copy link
Contributor Author

ipiszy commented Sep 15, 2023

Thanks @adnanaziz ! Good catch! Sorry I missed these.

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

In #107901, the CUDA event based
profiling is changed to profiler based profiling to avoid counting CPU-side
kernel launch overhead in final latency numbers. However, it turns out that
torch.profile() is significantly slower than CUDA event which affects model
compilation speed quite significantlly. This PR changes back to CUDA event
based profiling.

Follow-ups:
* Try CUDA event profiling with CUDAGraphs;
* Multi-GPU profiling;




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
ipiszy added a commit that referenced this pull request Sep 15, 2023
ghstack-source-id: 4fab7c5
Pull Request resolved: #109338
@ipiszy
Copy link
Contributor Author

ipiszy commented Sep 15, 2023

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Sep 15, 2023
@ipiszy
Copy link
Contributor Author

ipiszy commented Sep 15, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

In #107901, the CUDA event based
profiling is changed to profiler based profiling to avoid counting CPU-side
kernel launch overhead in final latency numbers. However, it turns out that
torch.profile() is significantly slower than CUDA event which affects model
compilation speed quite significantlly. This PR changes back to CUDA event
based profiling.

Follow-ups:
* Try CUDA event profiling with CUDAGraphs;
* Multi-GPU profiling;




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
ipiszy added a commit that referenced this pull request Sep 15, 2023
ghstack-source-id: bfdfb2a
Pull Request resolved: #109338
@ipiszy
Copy link
Contributor Author

ipiszy commented Sep 15, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team Raised by workflow job

@ipiszy
Copy link
Contributor Author

ipiszy commented Sep 17, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team Raised by workflow job

@ipiszy
Copy link
Contributor Author

ipiszy commented Sep 17, 2023

@pytorchbot merge -f "unrelated failure"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/ipiszy@gmail.com/9/head branch September 20, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants