-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[AOTInductor] remove CUDA dependency for cpp backend #110409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110409
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c40c326 with merge base 428cbd7 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D49800712 |
| # For those cases, include the lpath and libs command as we do for pytorch above. | ||
| # This approach allows us to only pay for what we use. | ||
| ipaths = cpp_extension.include_paths(cuda) + [sysconfig.get_path("include")] | ||
| if aot_mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we will get to else at line 854 for our use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will get to the else branch. I hit this with the internal e2e test, where none of clauses from include_pytorch, vec_isa != invalid_vec_isa, cuda and config.cpp.enable_kernel_profile is True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vec_isa == invalid_vec_isa sounds problematic if we are talking about optimizing cpu models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vec_isa == invalid_vec_isasounds problematic if we are talking about optimizing cpu models?
Hmm, not sure. I think we can re-visit it if we get some issue with this.
test/inductor/test_aot_inductor.py
Outdated
| functions=["run"], | ||
| extra_ldflags=[so_path], | ||
| with_cuda=True, # TODO: change this to not is_cpu | ||
| with_cuda=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be with_cuda=not is_cpu,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch!
| _binary_constants_bin_start + bytes_read, | ||
| data_size, | ||
| cudaMemcpyHostToDevice)); | ||
| #else // !USE_CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need the else branch here? My general feeling is USE_CUDA and is_cpu start to make the code fragmented. Some of the code can be cleaned up a bit, but it seems like using a macro like USE_CUDA is inevitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I have the same concern. I think USE_CUDA is inevitable. On the other hand, I think is_cpu seems to be useful for cases where we had mix-arch execution. I will try to minimize the use of is_cpu for now and refine it later when we have a better design for supporting multiple arches.
| // multi devices. | ||
| #ifndef USE_CUDA | ||
|
|
||
| #define AOTI_RUNTIME_DEVICE_CHECK(EXPR) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this for CPU? If not, I think make it define empty can save some uses of USE_CUDA later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, I was thinking to have exactly the same empty macro to reduce the uses of USE_CUDA, but changed my mind later. One rationale is that I want to reduce any surprise. For example, I feel turning something like AOTI_RUNTIME_DEVICE_CHECK(foo()) to empty could be a surprise in cpu mode, because function foo may have both cuda and non-cuda side-effect. Instead, having a dedicated macro explicitly for each arch would make the code easier to interpret and reduce the chance of such surprises.
| @@ -0,0 +1,4 @@ | |||
| #pragma once | |||
|
|
|||
| #include <torch/csrc/inductor/aoti_runtime/cpu_utils.h> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exclusiveness of the sets of macros is not obvious from this file. May be better to just keep this file and merge the contents of the two files into this one.
4120ca7 to
7041783
Compare
7041783 to
e53cb8b
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49800712 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D49800712 |
e53cb8b to
106b6f5
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49800712 |
Summary: Previously, we link against cuda libs even for pure cpp backend. This caused issues for cases where the inference platform does not have GPUs. This diff removed cuda dependency for cpp backend. Reviewed By: bertmaher, muchulee8, mikekgfb Differential Revision: D49800712
106b6f5 to
c40c326
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49800712 |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
Previously, we link against cuda libs even for pure cpp backend.
This caused issues for cases where the inference platform does not
have GPUs. This diff removed cuda dependency for cpp backend.
Reviewed By: bertmaher, muchulee8, mikekgfb
Differential Revision: D49800712
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @kadeng @muchulee8 @aakhundov @ColinPeppler