KEMBAR78
AOTI_TORCH_CHECK failed in aot_compile-d model · Issue #143498 · pytorch/pytorch · GitHub
Skip to content

AOTI_TORCH_CHECK failed in aot_compile-d model #143498

@mstebelev

Description

@mstebelev

🐛 Describe the bug

I exported some model using torch.export(strict=False). Exported model itself works well, but if I compile it using torch._inductor.aot_compile, the process crashes with some internal check in generated code.
Reproducer:
https://colab.research.google.com/drive/1U8fe9k85_S4fRurxz_M7g9kYf7Yq2CRy?usp=sharing
Exported program to reproduce is here:
exported_program.pt2.zip
Another reproducer with model generation:
https://colab.research.google.com/drive/1W930GmsJEDVMdsBKHuTQruqV6IyoLqBa?usp=sharing
The same in pytorch nightly build
https://colab.research.google.com/drive/1WcRHyac8K2G6Ed4v1NywCoHSGDAMEstq?usp=sharing

the generated code is here
c5dq6ajvevkzbzmo54sijfiqey4wp7tumw7wdk34ethlfoqcf2by.cpp.zip

Error logs

prediction/deployable_modules/tests/test_something.py !!! Uncaught exception: index out of bounds: 0 <= tmp7 < ks1
Exception raised from cpp_fused_index_index_put_stack_zeros_4 at /tmp/torchinductor_vscode/c5dq6ajvevkzbzmo54sijfiqey4wp7tumw7wdk34ethlfoqcf2by.cpp:1084 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9e (0x7ff115c286de in /home/vscode/.cache/bazel/_bazel_vscode/93fd2cd9b3c5d87ae416561bff883334/execroot/__main__/bazel-out/k8-opt/bin/prediction/deployable_modules/tests/test_something.runfiles/pytorch/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x80 (0x7ff115bcd0a8 in /home/vscode/.cache/bazel/_bazel_vscode/93fd2cd9b3c5d87ae416561bff883334/execroot/__main__/bazel-out/k8-opt/bin/prediction/deployable_modules/tests/test_something.runfiles/pytorch/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x5024044 (0x7ff173653044 in /home/vscode/.cache/bazel/_bazel_vscode/93fd2cd9b3c5d87ae416561bff883334/execroot/__main__/bazel-out/k8-opt/bin/prediction/deployable_modules/tests/test_something.runfiles/pytorch/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: cpp_fused_index_index_put_stack_zeros_4 + 0xdcd (0x7ff0bf4a4acd in /tmp/torchinductor_vscode/cxe2kfjxcuzlrkqbjd6yr6psu3h364iewxfja7zwiewr56krpm3n.so)
frame #4: torch::aot_inductor::AOTInductorModel::run_impl(AtenTensorOpaque**, AtenTensorOpaque**, void*, AOTIProxyExecutorOpaque*) + 0xded (0x7ff0bf4a621d in /tmp/torchinductor_vscode/cxe2kfjxcuzlrkqbjd6yr6psu3h364iewxfja7zwiewr56krpm3n.so)
frame #5: torch::aot_inductor::AOTInductorModelContainer::run(AtenTensorOpaque**, AtenTensorOpaque**, void*, AOTIProxyExecutorOpaque*) + 0xf1 (0x7ff0bf4b4c01 in /tmp/torchinductor_vscode/cxe2kfjxcuzlrkqbjd6yr6psu3h364iewxfja7zwiewr56krpm3n.so)
frame #6: AOTInductorModelContainerRun + 0x86 (0x7ff0bf4a9686 in /tmp/torchinductor_vscode/cxe2kfjxcuzlrkqbjd6yr6psu3h364iewxfja7zwiewr56krpm3n.so)
frame #7: torch::inductor::AOTIModelContainerRunner::run(std::vector<at::Tensor, std::allocator<at::Tensor> >&, AOTInductorStreamOpaque*) + 0x115 (0x7ff1736394e5 in /home/vscode/.cache/bazel/_bazel_vscode/93fd2cd9b3c5d87ae416561bff883334/execroot/__main__/bazel-out/k8-opt/bin/prediction/deployable_modules/tests/test_something.runfiles/pytorch/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::inductor::AOTIModelContainerRunnerCpu::run(std::vector<at::Tensor, std::allocator<at::Tensor> >&) + 0x22 (0x7ff17363a192 in /home/vscode/.cache/bazel/_bazel_vscode/93fd2cd9b3c5d87ae416561bff883334/execroot/__main__/bazel-out/k8-opt/bin/prediction/deployable_modules/tests/test_something.runfiles/pytorch/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x8cbcbf (0x7ff17c609cbf in /home/vscode/.cache/bazel/_bazel_vscode/93fd2cd9b3c5d87ae416561bff883334/execroot/__main__/bazel-out/k8-opt/bin/prediction/deployable_modules/tests/test_something.runfiles/pytorch/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x48edbf (0x7ff17c1ccdbf in /home/vscode/.cache/bazel/_bazel_vscode/93fd2cd9b3c5d87ae416561bff883334/execroot/__main__/bazel-out/k8-opt/bin/prediction/deployable_modules/tests/test_something.runfiles/pytorch/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
, terminating !!!

Versions

I had the problem in torch 2.5.1 built from nixpkgs, but it reproduces also in vanilla torch 2.5.1+cu121 from google colab.
Also I checked it on nightly build 2.6. The difference is that I see some error in python, and ipynb kernel doesn't crash

cc @chauhang @penguinwu @avikchaudhuri @gmagogsfm @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4 @desertfire @chenyang78

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions