[AOTInductor] Update performance benchmark code #109560

angelayi · 2023-09-18T23:49:32Z

Previously during performance benchmark testing, we would create an AOTInductorModelContainerHandle every time the compiled function is run with new inputs. However after #108473 we now load the constants needed in the runtime when initializing the AOTInductorModelContainerHandle. This resulted in our benchmarks displaying a ~0.4x speedup.

This diff moves the initialization of AOTInductorModelContainerHandle outside of the code where we run the compiled function with different inputs.

For example,

python benchmarks/dynamo/huggingface.py --performance --cold-start-latency --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda --total-partitions 3 --partition-id 0 --only AlbertForMaskedLM

results in 1.359x speedup.

Specifically, this adds a create_container_handle and delete_container_handle function which need to called before run. We call create_container_handle to initialize the AOTInductorModelContainerHandle, call run to run the compiled .so with different inputs, and then delete_container_handle to delete it.

Updated dashboard results

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

pytorch-bot · 2023-09-18T23:49:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109560

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 83eaaec with merge base b91ba22 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

benchmarks/dynamo/common.py

torch/_inductor/utils.py

desertfire · 2023-09-19T15:50:18Z

torch/_inductor/utils.py

-    void run(
-            std::vector<at::Tensor>& input_tensors,
-            std::vector<at::Tensor>& output_tensors) {
+    class AOTInductorModelContainer {


Please use a different name as this can cause confusion with

pytorch/torch/csrc/inductor/aot_runtime/interface.h

Line 39 in 1427b81

using AOTInductorModelContainerHandle = AOTInductorModelContainerOpaque*;

angelayi · 2023-09-20T21:04:44Z

Updated dashboard results: https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2013%20Sep%202023%2021%3A03%3A55%20GMT&stopTime=Wed%2C%2020%20Sep%202023%2021%3A03%3A55%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/aot_inductor_benchmark&lCommit=f9aa49c4c9a1a140b6f0c4520d1d6d99b57e12fa&rBranch=main&rCommit=015be4cedba357eb931e24bf188479235db7c5c8

angelayi · 2023-09-20T21:06:18Z

@pytorchbot merge

pytorchmergebot · 2023-09-20T21:09:26Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-20T21:19:35Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

facebook-github-bot · 2023-09-21T18:59:00Z

@angelayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Same as #109560, made a new PR because we need to land from internal Previously during performance benchmark testing, we would create an AOTInductorModelContainerHandle every time the compiled function is run with new inputs. However after #108473 we now load the constants needed in the runtime when initializing the AOTInductorModelContainerHandle. This resulted in our benchmarks displaying a ~0.4x speedup. This diff moves the initialization of AOTInductorModelContainerHandle outside of the code where we run the compiled function with different inputs. For example, ``` python benchmarks/dynamo/huggingface.py --performance --cold-start-latency --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda --total-partitions 3 --partition-id 0 --only AlbertForMaskedLM ``` results in `1.359x` speedup. Specifically, this adds a `create_container_handle` and `delete_container_handle` function which need to called before `run`. We call `create_container_handle` to initialize the AOTInductorModelContainerHandle, call `run` to run the compiled .so with different inputs, and then `delete_container_handle` to delete it. [Updated dashboard results](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2013%20Sep%202023%2021%3A03%3A55%20GMT&stopTime=Wed%2C%2020%20Sep%202023%2021%3A03%3A55%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/aot_inductor_benchmark&lCommit=f9aa49c4c9a1a140b6f0c4520d1d6d99b57e12fa&rBranch=main&rCommit=015be4cedba357eb931e24bf188479235db7c5c8) Test Plan: CI Differential Revision: D49513934 Pull Request resolved: #109820 Approved by: https://github.com/desertfire

Summary: Change AOTInductor to directly return output tensors instead of taking pre-allocated output tensors to return the results. This gives several benefits: * It makes sure AOTInductor has the same behavior when managing the output tensors as the default Inductor, which is widely tested and thus more reliable. * As we have debugged before, there are cases we still have to codegen extra copy_ ops to fill the pre-allocated output tensors which doesn't make sense for performance. * With the coming enhanced memory planning, this again will make sure the memory planning logic is the between AOTInductor and Inductor, which will greatly simplify the problem and improve the reliability. This change also combines D49494954 from Yang and #109560 from Angela. Differential Revision: D49502318 Pull Request resolved: #109790 Approved by: https://github.com/chenyang78

angelayi requested review from bdhirsh, desertfire and eellison September 18, 2023 23:49

github-actions bot added module: inductor module: dynamo ciflow/inductor labels Sep 18, 2023

angelayi added the topic: not user facing topic category label Sep 18, 2023

desertfire requested changes Sep 19, 2023

View reviewed changes

benchmarks/dynamo/common.py Outdated Show resolved Hide resolved

torch/_inductor/utils.py Outdated Show resolved Hide resolved

angelayi force-pushed the angelayi/aot_inductor_benchmark branch 2 times, most recently from 35165da to 827bf15 Compare September 19, 2023 15:36

desertfire approved these changes Sep 19, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 20, 2023

pytorchmergebot added the merging label Sep 20, 2023

pytorchmergebot removed the merging label Sep 20, 2023

angelayi added 2 commits September 21, 2023 06:21

[AOTInductor] Update performance benchmark code

3274fe8

rename

83eaaec

desertfire force-pushed the angelayi/aot_inductor_benchmark branch from b89ce6b to 83eaaec Compare September 21, 2023 13:42

desertfire mentioned this pull request Sep 21, 2023

[inductor] Change AOTInductor to return output tensors #109790

Closed

angelayi mentioned this pull request Sep 21, 2023

[aotinductor] Update performance benchmark code (109560) #109820

Closed

angelayi closed this Sep 21, 2023

github-actions bot deleted the angelayi/aot_inductor_benchmark branch March 23, 2025 02:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AOTInductor] Update performance benchmark code #109560

[AOTInductor] Update performance benchmark code #109560

Uh oh!

angelayi commented Sep 18, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 18, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

desertfire Sep 19, 2023

Uh oh!

angelayi commented Sep 20, 2023

Uh oh!

angelayi commented Sep 20, 2023

Uh oh!

pytorchmergebot commented Sep 20, 2023

Uh oh!

pytorchmergebot commented Sep 20, 2023

Uh oh!

facebook-github-bot commented Sep 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[AOTInductor] Update performance benchmark code #109560

[AOTInductor] Update performance benchmark code #109560

Uh oh!

Conversation

angelayi commented Sep 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109560

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

desertfire Sep 19, 2023

Choose a reason for hiding this comment

Uh oh!

angelayi commented Sep 20, 2023

Uh oh!

angelayi commented Sep 20, 2023

Uh oh!

pytorchmergebot commented Sep 20, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 20, 2023

Merge failed

Uh oh!

facebook-github-bot commented Sep 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

angelayi commented Sep 18, 2023 •

edited

Loading

pytorch-bot bot commented Sep 18, 2023 •

edited

Loading