[AOTInductor] remove CUDA dependency for cpp backend #110409

chenyang78 · 2023-10-02T20:26:18Z

Summary:
Previously, we link against cuda libs even for pure cpp backend.
This caused issues for cases where the inference platform does not
have GPUs. This diff removed cuda dependency for cpp backend.

Reviewed By: bertmaher, muchulee8, mikekgfb

Differential Revision: D49800712

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @kadeng @muchulee8 @aakhundov @ColinPeppler

pytorch-bot · 2023-10-02T20:26:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110409

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c40c326 with merge base 428cbd7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-10-02T20:26:27Z

This pull request was exported from Phabricator. Differential Revision: D49800712

desertfire · 2023-10-03T01:09:03Z

torch/_inductor/codecache.py

        # For those cases, include the lpath and libs command as we do for pytorch above.
        # This approach allows us to only pay for what we use.
        ipaths = cpp_extension.include_paths(cuda) + [sysconfig.get_path("include")]
+        if aot_mode:


I don't think we will get to else at line 854 for our use case?

We will get to the else branch. I hit this with the internal e2e test, where none of clauses from include_pytorch, vec_isa != invalid_vec_isa, cuda and config.cpp.enable_kernel_profile is True.

vec_isa == invalid_vec_isa sounds problematic if we are talking about optimizing cpu models?

vec_isa == invalid_vec_isa sounds problematic if we are talking about optimizing cpu models?

Hmm, not sure. I think we can re-visit it if we get some issue with this.

desertfire · 2023-10-03T01:30:56Z

test/inductor/test_aot_inductor.py

                functions=["run"],
                extra_ldflags=[so_path],
-                with_cuda=True,  # TODO: change this to not is_cpu
+                with_cuda=False,


This should be with_cuda=not is_cpu,

Ah, good catch!

desertfire · 2023-10-03T01:39:06Z

torch/csrc/inductor/aoti_runtime/model_container.h

+          _binary_constants_bin_start + bytes_read,
+          data_size,
+          cudaMemcpyHostToDevice));
+#else // !USE_CUDA


Probably don't need the else branch here? My general feeling is USE_CUDA and is_cpu start to make the code fragmented. Some of the code can be cleaned up a bit, but it seems like using a macro like USE_CUDA is inevitable.

Yeah, I have the same concern. I think USE_CUDA is inevitable. On the other hand, I think is_cpu seems to be useful for cases where we had mix-arch execution. I will try to minimize the use of is_cpu for now and refine it later when we have a better design for supporting multiple arches.

desertfire · 2023-10-03T01:45:43Z

torch/csrc/inductor/aoti_runtime/cpu_utils.h

+// multi devices.
+#ifndef USE_CUDA
+
+#define AOTI_RUNTIME_DEVICE_CHECK(EXPR)                    \


Do we really need this for CPU? If not, I think make it define empty can save some uses of USE_CUDA later.

Previously, I was thinking to have exactly the same empty macro to reduce the uses of USE_CUDA, but changed my mind later. One rationale is that I want to reduce any surprise. For example, I feel turning something like AOTI_RUNTIME_DEVICE_CHECK(foo()) to empty could be a surprise in cpu mode, because function foo may have both cuda and non-cuda side-effect. Instead, having a dedicated macro explicitly for each arch would make the code easier to interpret and reduce the chance of such surprises.

desertfire · 2023-10-03T01:47:28Z

torch/csrc/inductor/aoti_runtime/device_utils.h

@@ -0,0 +1,4 @@
+#pragma once
+
+#include <torch/csrc/inductor/aoti_runtime/cpu_utils.h>


The exclusiveness of the sets of macros is not obvious from this file. May be better to just keep this file and merge the contents of the two files into this one.

facebook-github-bot · 2023-10-03T09:15:16Z

This pull request was exported from Phabricator. Differential Revision: D49800712

facebook-github-bot · 2023-10-03T09:15:17Z

This pull request was exported from Phabricator. Differential Revision: D49800712

facebook-github-bot · 2023-10-03T09:17:59Z

This pull request was exported from Phabricator. Differential Revision: D49800712

Summary: Previously, we link against cuda libs even for pure cpp backend. This caused issues for cases where the inference platform does not have GPUs. This diff removed cuda dependency for cpp backend. Reviewed By: bertmaher, muchulee8, mikekgfb Differential Revision: D49800712

facebook-github-bot · 2023-10-03T09:18:54Z

This pull request was exported from Phabricator. Differential Revision: D49800712

facebook-github-bot · 2023-10-03T18:33:08Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2023-10-03T18:35:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

facebook-github-bot added the fb-exported label Oct 2, 2023

github-actions bot added module: inductor ciflow/inductor labels Oct 2, 2023

chenyang78 requested review from desertfire and muchulee8 October 2, 2023 20:29

chenyang78 changed the title ~~remove CUDA dependency for cpp backend~~ [AOTInductor] remove CUDA dependency for cpp backend Oct 2, 2023

chenyang78 requested review from bertmaher, hl475 and mikekgfb October 2, 2023 20:30

bertmaher approved these changes Oct 2, 2023

View reviewed changes

chenyang78 added the topic: not user facing topic category label Oct 2, 2023

desertfire reviewed Oct 3, 2023

View reviewed changes

chenyang78 force-pushed the export-D49800712 branch from 4120ca7 to 7041783 Compare October 3, 2023 09:15

chenyang78 force-pushed the export-D49800712 branch from 7041783 to e53cb8b Compare October 3, 2023 09:15

chenyang78 force-pushed the export-D49800712 branch from e53cb8b to 106b6f5 Compare October 3, 2023 09:17

chenyang78 force-pushed the export-D49800712 branch from 106b6f5 to c40c326 Compare October 3, 2023 09:18

desertfire approved these changes Oct 3, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 3, 2023

pytorchmergebot added the merging label Oct 3, 2023

pytorchmergebot added Merged and removed merging labels Oct 3, 2023

pytorchmergebot closed this in da63c7f Oct 3, 2023

		@@ -0,0 +1,4 @@
		#pragma once

		#include <torch/csrc/inductor/aoti_runtime/cpu_utils.h>

[AOTInductor] remove CUDA dependency for cpp backend #110409

[AOTInductor] remove CUDA dependency for cpp backend #110409

Uh oh!

Conversation

chenyang78 commented Oct 2, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110409

✅ No Failures

Uh oh!

facebook-github-bot commented Oct 2, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 3, 2023

Uh oh!

facebook-github-bot commented Oct 3, 2023

Uh oh!

facebook-github-bot commented Oct 3, 2023

Uh oh!

facebook-github-bot commented Oct 3, 2023

Uh oh!

facebook-github-bot commented Oct 3, 2023

Uh oh!

pytorchmergebot commented Oct 3, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chenyang78 commented Oct 2, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 2, 2023 •

edited

Loading