-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[inductor] Enable floor_div indexing to work under ABI-compat mode #113276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We need these in order to compile indexing expressions. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113276
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ade1029 with merge base 4f2b288 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
||
| } // namespace | ||
|
|
||
| int64_t aoti_torch_div_floor_int64(int64_t a, int64_t b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it appears we only use div_floor_integer<int64_t> when emitting indexing expressions, so I only created this particular instantiation (since C doesn't support overloading)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was doing the same thing for div_floor_integer at some point, but realized we need a way to capture exception here, like divide-by-zero. So making the API similar to others here by using AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE seems inevitable, unless @ezyang has a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swolchok if you're doing a CPU model and there's some code where we do an integer divide, which potentially may have a zero denominator. What would you prefer the AOTInductor DSO behavior to be in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand -- div_floor_integer doesn't check for divide by zero.
I also want to clear up what is supposed to happen -- integer division by zero has undefined behavior (https://en.cppreference.com/w/c/language/operator_arithmetic#Division). Therefore: It is not guaranteed to crash the process. it is very unlikely to raise a C++ exception. C++ authors must guarantee that their code does not perform integer division by zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supposing that div_floor_integer did check for divide by zero, I would prefer that it was compiled into the DSO, not going through an aoti_torch shim, because those are relatively expensive and it is a small function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in particular we could simply emit a zero check for the denominator inline and throw an exception, which would be translated into an error code at the boundary and then back into an exception on the other side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess regardless of whether it does check or not, it's better to have it compiled into the DSO and/or inlined? Let me look into that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compiled into the DSO and/or inlined?
can't inline across a DSO boundary, gotta compile into the DSO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't really phrase things correctly... should not have said "or". I just meant that if we compile the div_floor_* functions into the DSO (I assume that means model.so), then they could possibly be inlined into their callers.
| #include <cstdio> | ||
| #include <iostream> | ||
| #include <memory> | ||
| #include <ATen/native/BinaryOps.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this include is OK, or if I should make it part of AT_PER_OPERATOR_HEADERS?
…ons" We need these in order to compile indexing expressions. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
…ons" We need these in order to compile indexing expressions. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
|
|
||
| } // namespace | ||
|
|
||
| int64_t aoti_torch_div_floor_int64(int64_t a, int64_t b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was doing the same thing for div_floor_integer at some point, but realized we need a way to capture exception here, like divide-by-zero. So making the API similar to others here by using AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE seems inevitable, unless @ezyang has a better idea.
…ons" We need these in order to compile indexing expressions. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
These functions are needed for computing indexing expressions.
We were previously calling into
`aten::native::div_floor_{integer,floating}` when compiling the cpp
wrapper code in non-ABI-compatible mode. However, these functions were
not available when compiling under ABI-compatible mode as we did not
include ATen/native/BinaryOps.h. It is technically safe to call those
functions in ABI-compatible mode as they are header-only, but I didn't
want to include BinaryOps.h into the cpp wrapper file because it exposes
other potentially non-ABI-stable declarations.
As such, I have created a new torch/csrc/inductor/math.h file that
contains versions of div_floor_* that are clearly uncoupled from libc10
and libtorch. We only need to care about int64_t and double types when
computing indexing expressions, so I have only defined
div_floor_{int64_, double}. This makes it easy to decouple from c10
code.
I also noticed that when compiling the cpp kernels, we include
BinaryOps.h directly. I have swapped that out for the new math.h to
better enforce the ABI-compatible interface.
[ghstack-poisoned]
These functions are needed for computing indexing expressions.
We were previously calling into
`aten::native::div_floor_{integer,floating}` when compiling the cpp
wrapper code in non-ABI-compatible mode. However, these functions were
not available when compiling under ABI-compatible mode as we did not
include ATen/native/BinaryOps.h. It is technically safe to call those
functions in ABI-compatible mode as they are header-only, but I didn't
want to include BinaryOps.h into the cpp wrapper file because it exposes
other potentially non-ABI-stable declarations.
As such, I have created a new torch/csrc/inductor/math.h file that
contains versions of div_floor_* that are clearly uncoupled from libc10
and libtorch. We only need to care about int64_t and double types when
computing indexing expressions, so I have only defined
div_floor_{int64_, double}. This makes it easy to decouple from c10
code.
I also noticed that when compiling the cpp kernels, we include
BinaryOps.h directly. I have swapped that out for the new math.h to
better enforce the ABI-compatible interface.
ghstack-source-id: 3fbffa6
Pull Request resolved: #113276
These functions are needed for computing indexing expressions.
We were previously calling into
`aten::native::div_floor_{integer,floating}` when compiling the cpp
wrapper code in non-ABI-compatible mode. However, these functions were
not available when compiling under ABI-compatible mode as we did not
include ATen/native/BinaryOps.h. It is technically safe to call those
functions in ABI-compatible mode as they are header-only, but I didn't
want to include BinaryOps.h into the cpp wrapper file because it exposes
other potentially non-ABI-stable declarations.
As such, I have created a new torch/csrc/inductor/math.h file that
contains versions of div_floor_* that are clearly uncoupled from libc10
and libtorch. We only need to care about int64_t and double types when
computing indexing expressions, so I have only defined
div_floor_{int64_, double}. This makes it easy to decouple from c10
code.
I also noticed that when compiling the cpp kernels, we include
BinaryOps.h directly. I have swapped that out for the new math.h to
better enforce the ABI-compatible interface.
[ghstack-poisoned]
…pat mode" Previously, floor_div operations were defined in ATen/native/BinaryOps.h. Since this header was not included under ABI-compat mode, trying to use those indexing operations would result in compilation errors. Technically, it is safe to use aten::native::floor_div_* functions in ABI-compat mode as they are header-only; we could simply include BinaryOps.h. However, there are other declarations in BinaryOps.h that are not binary-compatible, so this is not ideal. Thus, I have moved those functions into a separate file, and put them under c10/util, since they don't really have tensor-specific logic. c10 functions are not all header-only, so this still isn't ideal, but this still seems like an improvement. Moreover, cpp_prefix.h -- used when compiling cpp kernels -- already includes c10 header files, so ABI-compatibility already depends on maintaining some c10 functions as header-only. [ghstack-poisoned]
|
Alternative approach (with rationale in commit mesage): 3d68d92 I think the current approach is cleaner though. |
torch/csrc/inductor/math.h
Outdated
| #include <cmath> | ||
| #include "c10/macros/Macros.h" | ||
|
|
||
| // Functions in this file should be header-only as it is used in ABI-compatible mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Is this OK to run from the DSO? Why does it include Macros.h then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Macros.h doesn't include anything that links to external C++ code, so it's safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your statement is true, but it also felt like a slippery slope to me. People who touch those recursively included heads won't necessarily know the ABI constraint of AOTI. Say someone decides to split a header into .h and .cpp, and then it will break our assumption here.
With that being said, I am ok with the current approach if you can also update the linker flags for the abi_compatible mode, e.g. https://github.com/pytorch/pytorch/blob/cb233dada43f016c8fa58979838939c1adda416b/torch/_inductor/codecache.py#L1301C24-L1301C24. Then if the aforementioned change happens, at least we should be able to catch it with our abi_compatible test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, on a second thought, my suggestion may not be trivial to implement, because the C shim layout does live in libtorch and we need to link against that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your statement is true, but it also felt like a slippery slope to me.
Yes, I agree that the situation is not ideal. That said I don't think I am actually making the situation that much worse, because cpp_prefix.h already includes a bunch of c10 and aten headers directly (see updated message at the top of the PR.)
With that being said, I am ok with the current approach if you can also update the linker flags for the abi_compatible mode
It looks like we already avoid linking against c10 when building in fbcode, so I think we're good?
pytorch/torch/_inductor/codecache.py
Lines 1398 to 1399 in cb233da
| if not config.is_fbcode(): | |
| libs += ["c10"] |
I guess it would be better if we checked for abi_compatible instead of is_fbcode, but it seems they are being used interchangeably throughout the codebase...
(Side note: we are commenting on the "make an alternative implementation of div_floor_* functions" approach, which is not the latest version of this PR; the latest version just relocates the existing div_floor_* functions from ATen to c10. Either way, the non-linking of c10 makes this safe.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pp_prefix.h already includes a bunch of c10 and aten headers directly
This should be fixed.
I guess it would be better if we checked for abi_compatible instead of is_fbcode, but it seems they are being used interchangeably throughout the codebase...
The long-term goal is to only keep the abi-compatible mode for OSS as well. In this case, please change it to check abi_compatible.
…pat mode" Previously, floor_div operations were defined in ATen/native/BinaryOps.h. Since this header was not included under ABI-compat mode, trying to use those indexing operations would result in compilation errors. Technically, it is safe to use aten::native::floor_div_* functions in ABI-compat mode as they are header-only; we could simply include BinaryOps.h. However, there are other declarations in BinaryOps.h that are not binary-compatible, so this is not ideal. Thus, I have moved those functions into a separate file, and put them under c10/util, since they don't really have tensor-specific logic. c10 functions are not all header-only, so this still isn't ideal, but this still seems like an improvement. Moreover, cpp_prefix.h -- used when compiling cpp kernels -- already includes c10 header files, so ABI-compatibility already depends on maintaining some c10 functions as header-only. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
|
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
…pat mode" Previously, floor_div operations were defined in ATen/native/BinaryOps.h. Since this header was not included under ABI-compat mode, trying to use those indexing operations would result in compilation errors. Technically, it is safe to use aten::native::floor_div_* functions in ABI-compat mode as they are header-only; we could simply include BinaryOps.h. However, there are other declarations in BinaryOps.h that are not binary-compatible, so this is not ideal. Thus, I have moved those functions into a separate file, and put them under c10/util, since they don't really have tensor-specific logic. c10 functions are not all header-only, so this still isn't ideal, but this still seems like an improvement. Moreover, cpp_prefix.h -- used when compiling cpp kernels -- already includes c10 header files, so ABI-compatibility already depends on maintaining some c10 functions as header-only. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Previously, floor_div operations were defined in ATen/native/BinaryOps.h. Since this header was not included under ABI-compat mode, trying to use those indexing operations would result in compilation errors. Technically, it is safe to use aten::native::floor_div_* functions in ABI-compat mode as they are header-only; we could simply include BinaryOps.h. However, there are other declarations in BinaryOps.h that are not binary-compatible, so this is not ideal. Thus, I have moved those functions into a separate file, and put them under c10/util, since they don't really have tensor-specific logic. c10 functions are not all header-only, so this still isn't ideal, but this still seems like an improvement. Moreover, cpp_prefix.h -- used when compiling cpp kernels -- already includes c10 header files, so ABI-compatibility already depends on maintaining some c10 functions as header-only. ghstack-source-id: 2fe8715 Pull Request resolved: #113276
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ytorch#113276) Previously, floor_div operations were defined in ATen/native/BinaryOps.h. Since this header was not included under ABI-compat mode, trying to use those indexing operations would result in compilation errors. Technically, it is safe to use aten::native::floor_div_* functions in ABI-compat mode as they are header-only; we could simply include BinaryOps.h. However, there are other declarations in BinaryOps.h that are not binary-compatible, so this is not ideal. Thus, I have moved those functions into a separate file, and put them under c10/util, since they don't really have tensor-specific logic. c10 functions are not all header-only, so this still isn't ideal, but this still seems like an improvement. Moreover, cpp_prefix.h -- used when compiling cpp kernels -- already includes c10 header files, so ABI-compatibility already depends on maintaining some c10 functions as header-only. Pull Request resolved: pytorch#113276 Approved by: https://github.com/chenyang78, https://github.com/desertfire
|
Hey @int3, I think there could be an error coming from this PR. Here is a repro code and the error message: import torch
def func(x, a):
n = (a * 1.0) // 8.0
y = x + n
return y
cfunc = torch.compile(func, dynamic=True, fullgraph=True)
device = "cpu"
x = torch.tensor(0, dtype=torch.float32, device=device)
a = 12
out = cfunc(x, a)
expected = func(x, a)
torch.testing.assert_close(out, expected)and With #include "/tmp/torchinductor_root/26/c26eqbkuxvn72gf7p2xujmqjcwf4bo6lxmp6rwborxnf4gldnimh.h"
extern "C" void kernel(const float* in_ptr0,
float* out_ptr0,
const long ks0)
{
{
auto tmp0 = in_ptr0[static_cast<long>(0L)];
auto tmp1 = c10::convert<float>(c10::div_floor_floating(static_cast<double>((1.00000000000000*ks0)), static_cast<double>(8.00000000000000)));
auto tmp2 = decltype(tmp0)(tmp0 + tmp1);
out_ptr0[static_cast<long>(0L)] = tmp2;
}
} |
Introduced by #113276. I've added a test to catch future regressions. [ghstack-poisoned]
Introduced by #113276. I've added a test to catch future regressions. [ghstack-poisoned]
Introduced by #113276. I've added a test to catch future regressions. [ghstack-poisoned]
Introduced by #113276. I've added a test to catch future regressions. Pull Request resolved: #115647 Approved by: https://github.com/desertfire, https://github.com/vfdev-5
Introduced by pytorch#113276. I've added a test to catch future regressions. Pull Request resolved: pytorch#115647 Approved by: https://github.com/desertfire, https://github.com/vfdev-5
Stack from ghstack (oldest at bottom):
Previously, floor_div operations were defined in
ATen/native/BinaryOps.h. Since this header was not included under
ABI-compat mode, trying to use those indexing operations would result in
compilation errors.
Technically, it is safe to use aten::native::floor_div_* functions in
ABI-compat mode as they are header-only; we could simply include
BinaryOps.h. However, there are other declarations in BinaryOps.h that
are not binary-compatible, so this is not ideal. Thus, I have moved those
functions into a separate file, and put them under c10/util, since they
don't really have tensor-specific logic.
c10 functions are not all header-only, so this still isn't ideal, but
this still seems like an improvement. Moreover, cpp_prefix.h -- used
when compiling cpp kernels -- already includes c10 header files, so
ABI-compatibility already depends on maintaining some c10 functions as
header-only.
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler