[ONNX] Fix index_put_ usage #161263

ironsided777 · 2025-08-22T12:00:55Z

Summary:
It's hard to understand how it's working in most of our models, but in general it looks like aten::copy_ is replaced incorrectly.
There are two schemas for aten::copy_:

aten::copy_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)
aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)

According to the logic in the comments we don't need one of the parameters for aten::index_put_.

It seems logic has been inferred from ordinary aten::copy where there could be a third parameter which is non_blocking flag.

Depending on the execution environment the sliced copying can be replaced either by first schema or by second schema with explicitly setting default parameter to False.

If first schema is selected it will lead to the crash (which is easily to catch in our prod env). In case of the second schema selection, there is no crash, but the third parameter is treated as accumulate parameter of the index_put_ function which doesn't make sense.

So, in any case usage of the third parameter must be removed from the aten::copy_ replacement.

For more details and check this post:
https://fb.workplace.com/groups/1405155842844877/permalink/25337687649165028/

Test Plan:

The test fails in production envirounment only.
In the test env non_blocking flag is mapped as False to the acumulate flag, which doesn't cause test to fail, but has no sense in terms of flags mapping.

The export works without errors, before the fix it was failing with accessing by index out of bounds vector, like this:

   1095     _C._jit_onnx_log("Torch IR graph at exception: ", graph)
File ~/.bento/kernels/bento_kernel_gaia_ml/1578/bento_kernel_gaia_ml_binary-inplace#link-tree/torch/onnx/utils.py:636, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module)
    629 _C._jit_pass_lower_all_tuples(graph)
    630 # in _jit_pass_onnx, symbolic functions are called for each node for conversion.
    631 # However, there are nodes that cannot be converted without additional context.
    632 # For example, the number of outputs from split (and whether it is static or dynamic) is unknown
    633 # until the point where it is unpacked by listUnpack node.
    634 # This pass does a preprocess, and prepares the nodes such that enough context can be received
    635 # by the symbolic function.
--> 636 _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module)
    637 _C._jit_pass_onnx_preprocess(graph)
    639 # onnx does not support tuples, so try to remove them
RuntimeError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2)

The test script:

import torch as th
import tempfile

class CopyTest(th.nn.Module):
    def forward(
        self,
        input_th: th.Tensor
    ):
        to_fill = th.ones((3, 3))
        to_fill[:, 0] = input_th[:, 0]
        return to_fill

m = CopyTest()

test_tensor = th.zeros((3, 3))

with tempfile.NamedTemporaryFile() as f:
    th.onnx.export(
            m,
            (test_tensor,),
            f,
            export_params=True,
            opset_version=17,
            do_constant_folding=True,
            input_names=["input"],
            output_names=["features"],
            dynamo=False,
        )

The exported model test:

import torch
import onnx
import onnxruntime

model_name = '/home/ironsided/test_model.onnx'
onnx_model = onnx.load(model_name)
onnx.checker.check_model(onnx_model)

example_inputs = (torch.zeros(3, 3),)

onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs]
print(f"Input length: {len(onnx_inputs)}")
print(f"Sample input: {onnx_inputs}")

ort_session = onnxruntime.InferenceSession(
    model_name, providers=["CPUExecutionProvider"]
)

onnxruntime_input = {input_arg.name: input_value for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)}

# ONNX Runtime returns a list of outputs
onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0]

print(onnxruntime_outputs)

The produced result is correct:

Input length: 1
Sample input: [array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]], dtype=float32)]
[[0. 1. 1.]
 [0. 1. 1.]
 [0. 1. 1.]]

Rollback Plan:

Differential Revision: D80797028

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

pytorch-bot · 2025-08-22T12:00:58Z

This appears to be a diff that was exported from phabricator, but the PR author does not have sufficient permissions to run CI. @ironsided777, please do step 2 of internal wiki to get write access so you do not need to get CI approvals in the future. If you think this is a mistake, please contact the Pytorch Dev Infra team.

pytorch-bot · 2025-08-22T12:01:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161263

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Multiple CI trunk failures after landing https://github.com/pytorch/pytorch/pull/161002

✅ No Failures

As of commit 016389c with merge base 379ebda ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-08-22T12:01:01Z

The committers listed above are authorized under a signed CLA.

✅ login: ironsided777 / name: Roman Bobniev (016389c)

facebook-github-bot · 2025-08-22T12:01:15Z

This pull request was exported from Phabricator. Differential Revision: D80797028

torch/csrc/jit/passes/onnx/remove_inplace_ops_for_onnx.cpp

Summary: It's hard to understand how it's working in most of our models, but in general it looks like `aten::copy_` is replaced incorrectly. 1. `aten::copy_` has only two parameters, source and destination 2. `aten::index_put_` takes only three arguments and not 4 According to the logic in the comments we don't need one of the parameters for `aten::index_put_`. It seems logic has been inferred from ordinary `aten::copy` where there could be a third parameter which is output tensor. Similarly for `aten::index_put_`, originally it was just `aten::index_put` which had one extra parameter for the output tensor, since this diff D26203114 it has been replaced to the in-place operation. For more details and check this post: https://fb.workplace.com/groups/1405155842844877/permalink/25337687649165028/ Test Plan: The export works without errors, before the fix it was failing with accessing by index out of bounds vector, like this: ``` 1095 _C._jit_onnx_log("Torch IR graph at exception: ", graph) File ~/.bento/kernels/bento_kernel_gaia_ml/1578/bento_kernel_gaia_ml_binary-inplace#link-tree/torch/onnx/utils.py:636, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module) 629 _C._jit_pass_lower_all_tuples(graph) 630 # in _jit_pass_onnx, symbolic functions are called for each node for conversion. 631 # However, there are nodes that cannot be converted without additional context. 632 # For example, the number of outputs from split (and whether it is static or dynamic) is unknown 633 # until the point where it is unpacked by listUnpack node. 634 # This pass does a preprocess, and prepares the nodes such that enough context can be received 635 # by the symbolic function. --> 636 _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module) 637 _C._jit_pass_onnx_preprocess(graph) 639 # onnx does not support tuples, so try to remove them RuntimeError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) ``` The test script: ``` import torch as th import tempfile class CopyTest(th.nn.Module): def forward( self, input_th: th.Tensor ): to_fill = th.ones((3, 3)) to_fill[:, 0] = input_th[:, 0] return to_fill m = CopyTest() test_tensor = th.zeros((3, 3)) with tempfile.NamedTemporaryFile() as f: th.onnx.export( m, (test_tensor,), f, export_params=True, opset_version=17, do_constant_folding=True, input_names=["input"], output_names=["features"], dynamo=False, ) ``` The exported model test: ``` import torch import onnx import onnxruntime model_name = '/home/ironsided/test_model.onnx' onnx_model = onnx.load(model_name) onnx.checker.check_model(onnx_model) example_inputs = (torch.zeros(3, 3),) onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs] print(f"Input length: {len(onnx_inputs)}") print(f"Sample input: {onnx_inputs}") ort_session = onnxruntime.InferenceSession( model_name, providers=["CPUExecutionProvider"] ) onnxruntime_input = {input_arg.name: input_value for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)} # ONNX Runtime returns a list of outputs onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0] print(onnxruntime_outputs) ``` The produced result is correct: ``` Input length: 1 Sample input: [array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=float32)] [[0. 1. 1.] [0. 1. 1.] [0. 1. 1.]] ``` Rollback Plan: Differential Revision: D80797028

facebook-github-bot · 2025-08-25T11:33:55Z

This pull request was exported from Phabricator. Differential Revision: D80797028

Summary: Pull Request resolved: pytorch#161263 It's hard to understand how it's working in most of our models, but in general it looks like `aten::copy_` is replaced incorrectly. 1. `aten::copy_` has only two parameters, source and destination 2. `aten::index_put_` takes only three arguments and not 4 According to the logic in the comments we don't need one of the parameters for `aten::index_put_`. It seems logic has been inferred from ordinary `aten::copy` where there could be a third parameter which is output tensor. Similarly for `aten::index_put_`, originally it was just `aten::index_put` which had one extra parameter for the output tensor, since this diff D26203114 it has been replaced to the in-place operation. For more details and check this post: https://fb.workplace.com/groups/1405155842844877/permalink/25337687649165028/ Test Plan: The export works without errors, before the fix it was failing with accessing by index out of bounds vector, like this: ``` 1095 _C._jit_onnx_log("Torch IR graph at exception: ", graph) File ~/.bento/kernels/bento_kernel_gaia_ml/1578/bento_kernel_gaia_ml_binary-inplace#link-tree/torch/onnx/utils.py:636, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module) 629 _C._jit_pass_lower_all_tuples(graph) 630 # in _jit_pass_onnx, symbolic functions are called for each node for conversion. 631 # However, there are nodes that cannot be converted without additional context. 632 # For example, the number of outputs from split (and whether it is static or dynamic) is unknown 633 # until the point where it is unpacked by listUnpack node. 634 # This pass does a preprocess, and prepares the nodes such that enough context can be received 635 # by the symbolic function. --> 636 _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module) 637 _C._jit_pass_onnx_preprocess(graph) 639 # onnx does not support tuples, so try to remove them RuntimeError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) ``` The test script: ``` import torch as th import tempfile class CopyTest(th.nn.Module): def forward( self, input_th: th.Tensor ): to_fill = th.ones((3, 3)) to_fill[:, 0] = input_th[:, 0] return to_fill m = CopyTest() test_tensor = th.zeros((3, 3)) with tempfile.NamedTemporaryFile() as f: th.onnx.export( m, (test_tensor,), f, export_params=True, opset_version=17, do_constant_folding=True, input_names=["input"], output_names=["features"], dynamo=False, ) ``` The exported model test: ``` import torch import onnx import onnxruntime model_name = '/home/ironsided/test_model.onnx' onnx_model = onnx.load(model_name) onnx.checker.check_model(onnx_model) example_inputs = (torch.zeros(3, 3),) onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs] print(f"Input length: {len(onnx_inputs)}") print(f"Sample input: {onnx_inputs}") ort_session = onnxruntime.InferenceSession( model_name, providers=["CPUExecutionProvider"] ) onnxruntime_input = {input_arg.name: input_value for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)} # ONNX Runtime returns a list of outputs onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0] print(onnxruntime_outputs) ``` The produced result is correct: ``` Input length: 1 Sample input: [array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=float32)] [[0. 1. 1.] [0. 1. 1.] [0. 1. 1.]] ``` Rollback Plan: Differential Revision: D80797028

facebook-github-bot · 2025-08-26T13:55:06Z

This pull request was exported from Phabricator. Differential Revision: D80797028

Summary: It's hard to understand how it's working in most of our models, but in general it looks like `aten::copy_` is replaced incorrectly. There are two schemas for `aten::copy_`: 1. `aten::copy_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)` 2. `aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)` According to the logic in the comments we don't need one of the parameters for `aten::index_put_`. It seems logic has been inferred from ordinary `aten::copy` where there could be a third parameter which is `non_blocking` flag. Depending on the execution environment the sliced copying can be replaced either by first schema or by second schema with explicitly setting default parameter to `False`. If first schema is selected it will lead to the crash (which is easily to catch in our prod env). In case of the second schema selection, there is no crash, but the third parameter is treated as `accumulate` parameter of the `index_put_` function which doesn't make sense. So, in any case usage of the third parameter must be removed from the `aten::copy_` replacement. For more details and check this post: https://fb.workplace.com/groups/1405155842844877/permalink/25337687649165028/ Test Plan: The test fails in production envirounment only. In the test env `non_blocking` flag is mapped as `False` to the `acumulate` flag, which doesn't cause test to fail, but has no sense in terms of flags mapping. The export works without errors, before the fix it was failing with accessing by index out of bounds vector, like this: ``` 1095 _C._jit_onnx_log("Torch IR graph at exception: ", graph) File ~/.bento/kernels/bento_kernel_gaia_ml/1578/bento_kernel_gaia_ml_binary-inplace#link-tree/torch/onnx/utils.py:636, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module) 629 _C._jit_pass_lower_all_tuples(graph) 630 # in _jit_pass_onnx, symbolic functions are called for each node for conversion. 631 # However, there are nodes that cannot be converted without additional context. 632 # For example, the number of outputs from split (and whether it is static or dynamic) is unknown 633 # until the point where it is unpacked by listUnpack node. 634 # This pass does a preprocess, and prepares the nodes such that enough context can be received 635 # by the symbolic function. --> 636 _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module) 637 _C._jit_pass_onnx_preprocess(graph) 639 # onnx does not support tuples, so try to remove them RuntimeError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) ``` The test script: ``` import torch as th import tempfile class CopyTest(th.nn.Module): def forward( self, input_th: th.Tensor ): to_fill = th.ones((3, 3)) to_fill[:, 0] = input_th[:, 0] return to_fill m = CopyTest() test_tensor = th.zeros((3, 3)) with tempfile.NamedTemporaryFile() as f: th.onnx.export( m, (test_tensor,), f, export_params=True, opset_version=17, do_constant_folding=True, input_names=["input"], output_names=["features"], dynamo=False, ) ``` The exported model test: ``` import torch import onnx import onnxruntime model_name = '/home/ironsided/test_model.onnx' onnx_model = onnx.load(model_name) onnx.checker.check_model(onnx_model) example_inputs = (torch.zeros(3, 3),) onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs] print(f"Input length: {len(onnx_inputs)}") print(f"Sample input: {onnx_inputs}") ort_session = onnxruntime.InferenceSession( model_name, providers=["CPUExecutionProvider"] ) onnxruntime_input = {input_arg.name: input_value for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)} # ONNX Runtime returns a list of outputs onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0] print(onnxruntime_outputs) ``` The produced result is correct: ``` Input length: 1 Sample input: [array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=float32)] [[0. 1. 1.] [0. 1. 1.] [0. 1. 1.]] ``` Rollback Plan: Differential Revision: D80797028

justinchuby

Thanks!

jermenkoo

LGTM

Summary: It's hard to understand how it's working in most of our models, but in general it looks like `aten::copy_` is replaced incorrectly. There are two schemas for `aten::copy_`: 1. `aten::copy_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)` 2. `aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)` According to the logic in the comments we don't need one of the parameters for `aten::index_put_`. It seems logic has been inferred from ordinary `aten::copy` where there could be a third parameter which is `non_blocking` flag. Depending on the execution environment the sliced copying can be replaced either by first schema or by second schema with explicitly setting default parameter to `False`. If first schema is selected it will lead to the crash (which is easily to catch in our prod env). In case of the second schema selection, there is no crash, but the third parameter is treated as `accumulate` parameter of the `index_put_` function which doesn't make sense. So, in any case usage of the third parameter must be removed from the `aten::copy_` replacement. For more details and check this post: https://fb.workplace.com/groups/1405155842844877/permalink/25337687649165028/ Test Plan: The test fails in production envirounment only. In the test env `non_blocking` flag is mapped as `False` to the `acumulate` flag, which doesn't cause test to fail, but has no sense in terms of flags mapping. The export works without errors, before the fix it was failing with accessing by index out of bounds vector, like this: ``` 1095 _C._jit_onnx_log("Torch IR graph at exception: ", graph) File ~/.bento/kernels/bento_kernel_gaia_ml/1578/bento_kernel_gaia_ml_binary-inplace#link-tree/torch/onnx/utils.py:636, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module) 629 _C._jit_pass_lower_all_tuples(graph) 630 # in _jit_pass_onnx, symbolic functions are called for each node for conversion. 631 # However, there are nodes that cannot be converted without additional context. 632 # For example, the number of outputs from split (and whether it is static or dynamic) is unknown 633 # until the point where it is unpacked by listUnpack node. 634 # This pass does a preprocess, and prepares the nodes such that enough context can be received 635 # by the symbolic function. --> 636 _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module) 637 _C._jit_pass_onnx_preprocess(graph) 639 # onnx does not support tuples, so try to remove them RuntimeError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) ``` The test script: ``` import torch as th import tempfile class CopyTest(th.nn.Module): def forward( self, input_th: th.Tensor ): to_fill = th.ones((3, 3)) to_fill[:, 0] = input_th[:, 0] return to_fill m = CopyTest() test_tensor = th.zeros((3, 3)) with tempfile.NamedTemporaryFile() as f: th.onnx.export( m, (test_tensor,), f, export_params=True, opset_version=17, do_constant_folding=True, input_names=["input"], output_names=["features"], dynamo=False, ) ``` The exported model test: ``` import torch import onnx import onnxruntime model_name = '/home/ironsided/test_model.onnx' onnx_model = onnx.load(model_name) onnx.checker.check_model(onnx_model) example_inputs = (torch.zeros(3, 3),) onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs] print(f"Input length: {len(onnx_inputs)}") print(f"Sample input: {onnx_inputs}") ort_session = onnxruntime.InferenceSession( model_name, providers=["CPUExecutionProvider"] ) onnxruntime_input = {input_arg.name: input_value for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)} # ONNX Runtime returns a list of outputs onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0] print(onnxruntime_outputs) ``` The produced result is correct: ``` Input length: 1 Sample input: [array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=float32)] [[0. 1. 1.] [0. 1. 1.] [0. 1. 1.]] ``` Rollback Plan: Reviewed By: jermenkoo Differential Revision: D80797028

facebook-github-bot · 2025-08-27T11:02:24Z

This pull request was exported from Phabricator. Differential Revision: D80797028

Summary: Pull Request resolved: pytorch#161263 It's hard to understand how it's working in most of our models, but in general it looks like `aten::copy_` is replaced incorrectly. There are two schemas for `aten::copy_`: 1. `aten::copy_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)` 2. `aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)` According to the logic in the comments we don't need one of the parameters for `aten::index_put_`. It seems logic has been inferred from ordinary `aten::copy` where there could be a third parameter which is `non_blocking` flag. Depending on the execution environment the sliced copying can be replaced either by first schema or by second schema with explicitly setting default parameter to `False`. If first schema is selected it will lead to the crash (which is easily to catch in our prod env). In case of the second schema selection, there is no crash, but the third parameter is treated as `accumulate` parameter of the `index_put_` function which doesn't make sense. So, in any case usage of the third parameter must be removed from the `aten::copy_` replacement. For more details and check this post: https://fb.workplace.com/groups/1405155842844877/permalink/25337687649165028/ Test Plan: The test fails in production envirounment only. In the test env `non_blocking` flag is mapped as `False` to the `acumulate` flag, which doesn't cause test to fail, but has no sense in terms of flags mapping. The export works without errors, before the fix it was failing with accessing by index out of bounds vector, like this: ``` 1095 _C._jit_onnx_log("Torch IR graph at exception: ", graph) File ~/.bento/kernels/bento_kernel_gaia_ml/1578/bento_kernel_gaia_ml_binary-inplace#link-tree/torch/onnx/utils.py:636, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module) 629 _C._jit_pass_lower_all_tuples(graph) 630 # in _jit_pass_onnx, symbolic functions are called for each node for conversion. 631 # However, there are nodes that cannot be converted without additional context. 632 # For example, the number of outputs from split (and whether it is static or dynamic) is unknown 633 # until the point where it is unpacked by listUnpack node. 634 # This pass does a preprocess, and prepares the nodes such that enough context can be received 635 # by the symbolic function. --> 636 _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module) 637 _C._jit_pass_onnx_preprocess(graph) 639 # onnx does not support tuples, so try to remove them RuntimeError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) ``` The test script: ``` import torch as th import tempfile class CopyTest(th.nn.Module): def forward( self, input_th: th.Tensor ): to_fill = th.ones((3, 3)) to_fill[:, 0] = input_th[:, 0] return to_fill m = CopyTest() test_tensor = th.zeros((3, 3)) with tempfile.NamedTemporaryFile() as f: th.onnx.export( m, (test_tensor,), f, export_params=True, opset_version=17, do_constant_folding=True, input_names=["input"], output_names=["features"], dynamo=False, ) ``` The exported model test: ``` import torch import onnx import onnxruntime model_name = '/home/ironsided/test_model.onnx' onnx_model = onnx.load(model_name) onnx.checker.check_model(onnx_model) example_inputs = (torch.zeros(3, 3),) onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs] print(f"Input length: {len(onnx_inputs)}") print(f"Sample input: {onnx_inputs}") ort_session = onnxruntime.InferenceSession( model_name, providers=["CPUExecutionProvider"] ) onnxruntime_input = {input_arg.name: input_value for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)} # ONNX Runtime returns a list of outputs onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0] print(onnxruntime_outputs) ``` The produced result is correct: ``` Input length: 1 Sample input: [array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=float32)] [[0. 1. 1.] [0. 1. 1.] [0. 1. 1.]] ``` Rollback Plan: Reviewed By: jermenkoo Differential Revision: D80797028

Summary: It's hard to understand how it's working in most of our models, but in general it looks like `aten::copy_` is replaced incorrectly. There are two schemas for `aten::copy_`: 1. `aten::copy_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)` 2. `aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)` According to the logic in the comments we don't need one of the parameters for `aten::index_put_`. It seems logic has been inferred from ordinary `aten::copy` where there could be a third parameter which is `non_blocking` flag. Depending on the execution environment the sliced copying can be replaced either by first schema or by second schema with explicitly setting default parameter to `False`. If first schema is selected it will lead to the crash (which is easily to catch in our prod env). In case of the second schema selection, there is no crash, but the third parameter is treated as `accumulate` parameter of the `index_put_` function which doesn't make sense. So, in any case usage of the third parameter must be removed from the `aten::copy_` replacement. For more details and check this post: https://fb.workplace.com/groups/1405155842844877/permalink/25337687649165028/ Test Plan: The test fails in production envirounment only. In the test env `non_blocking` flag is mapped as `False` to the `acumulate` flag, which doesn't cause test to fail, but has no sense in terms of flags mapping. The export works without errors, before the fix it was failing with accessing by index out of bounds vector, like this: ``` 1095 _C._jit_onnx_log("Torch IR graph at exception: ", graph) File ~/.bento/kernels/bento_kernel_gaia_ml/1578/bento_kernel_gaia_ml_binary-inplace#link-tree/torch/onnx/utils.py:636, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module) 629 _C._jit_pass_lower_all_tuples(graph) 630 # in _jit_pass_onnx, symbolic functions are called for each node for conversion. 631 # However, there are nodes that cannot be converted without additional context. 632 # For example, the number of outputs from split (and whether it is static or dynamic) is unknown 633 # until the point where it is unpacked by listUnpack node. 634 # This pass does a preprocess, and prepares the nodes such that enough context can be received 635 # by the symbolic function. --> 636 _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module) 637 _C._jit_pass_onnx_preprocess(graph) 639 # onnx does not support tuples, so try to remove them RuntimeError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) ``` The test script: ``` import torch as th import tempfile class CopyTest(th.nn.Module): def forward( self, input_th: th.Tensor ): to_fill = th.ones((3, 3)) to_fill[:, 0] = input_th[:, 0] return to_fill m = CopyTest() test_tensor = th.zeros((3, 3)) with tempfile.NamedTemporaryFile() as f: th.onnx.export( m, (test_tensor,), f, export_params=True, opset_version=17, do_constant_folding=True, input_names=["input"], output_names=["features"], dynamo=False, ) ``` The exported model test: ``` import torch import onnx import onnxruntime model_name = '/home/ironsided/test_model.onnx' onnx_model = onnx.load(model_name) onnx.checker.check_model(onnx_model) example_inputs = (torch.zeros(3, 3),) onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs] print(f"Input length: {len(onnx_inputs)}") print(f"Sample input: {onnx_inputs}") ort_session = onnxruntime.InferenceSession( model_name, providers=["CPUExecutionProvider"] ) onnxruntime_input = {input_arg.name: input_value for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)} # ONNX Runtime returns a list of outputs onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0] print(onnxruntime_outputs) ``` The produced result is correct: ``` Input length: 1 Sample input: [array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=float32)] [[0. 1. 1.] [0. 1. 1.] [0. 1. 1.]] ``` Rollback Plan: Reviewed By: jermenkoo Differential Revision: D80797028

Summary: Pull Request resolved: pytorch#161263 It's hard to understand how it's working in most of our models, but in general it looks like `aten::copy_` is replaced incorrectly. There are two schemas for `aten::copy_`: 1. `aten::copy_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)` 2. `aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)` According to the logic in the comments we don't need one of the parameters for `aten::index_put_`. It seems logic has been inferred from ordinary `aten::copy` where there could be a third parameter which is `non_blocking` flag. Depending on the execution environment the sliced copying can be replaced either by first schema or by second schema with explicitly setting default parameter to `False`. If first schema is selected it will lead to the crash (which is easily to catch in our prod env). In case of the second schema selection, there is no crash, but the third parameter is treated as `accumulate` parameter of the `index_put_` function which doesn't make sense. So, in any case usage of the third parameter must be removed from the `aten::copy_` replacement. For more details and check this post: https://fb.workplace.com/groups/1405155842844877/permalink/25337687649165028/ Test Plan: The test fails in production envirounment only. In the test env `non_blocking` flag is mapped as `False` to the `acumulate` flag, which doesn't cause test to fail, but has no sense in terms of flags mapping. The export works without errors, before the fix it was failing with accessing by index out of bounds vector, like this: ``` 1095 _C._jit_onnx_log("Torch IR graph at exception: ", graph) File ~/.bento/kernels/bento_kernel_gaia_ml/1578/bento_kernel_gaia_ml_binary-inplace#link-tree/torch/onnx/utils.py:636, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module) 629 _C._jit_pass_lower_all_tuples(graph) 630 # in _jit_pass_onnx, symbolic functions are called for each node for conversion. 631 # However, there are nodes that cannot be converted without additional context. 632 # For example, the number of outputs from split (and whether it is static or dynamic) is unknown 633 # until the point where it is unpacked by listUnpack node. 634 # This pass does a preprocess, and prepares the nodes such that enough context can be received 635 # by the symbolic function. --> 636 _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module) 637 _C._jit_pass_onnx_preprocess(graph) 639 # onnx does not support tuples, so try to remove them RuntimeError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) ``` The test script: ``` import torch as th import tempfile class CopyTest(th.nn.Module): def forward( self, input_th: th.Tensor ): to_fill = th.ones((3, 3)) to_fill[:, 0] = input_th[:, 0] return to_fill m = CopyTest() test_tensor = th.zeros((3, 3)) with tempfile.NamedTemporaryFile() as f: th.onnx.export( m, (test_tensor,), f, export_params=True, opset_version=17, do_constant_folding=True, input_names=["input"], output_names=["features"], dynamo=False, ) ``` The exported model test: ``` import torch import onnx import onnxruntime model_name = '/home/ironsided/test_model.onnx' onnx_model = onnx.load(model_name) onnx.checker.check_model(onnx_model) example_inputs = (torch.zeros(3, 3),) onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs] print(f"Input length: {len(onnx_inputs)}") print(f"Sample input: {onnx_inputs}") ort_session = onnxruntime.InferenceSession( model_name, providers=["CPUExecutionProvider"] ) onnxruntime_input = {input_arg.name: input_value for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)} # ONNX Runtime returns a list of outputs onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0] print(onnxruntime_outputs) ``` The produced result is correct: ``` Input length: 1 Sample input: [array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=float32)] [[0. 1. 1.] [0. 1. 1.] [0. 1. 1.]] ``` Rollback Plan: Reviewed By: jermenkoo Differential Revision: D80797028

facebook-github-bot · 2025-08-27T15:59:48Z

This pull request was exported from Phabricator. Differential Revision: D80797028

ironsided777 · 2025-08-27T16:07:24Z

@pytorchbot merge

pytorchmergebot · 2025-08-27T16:09:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: It's hard to understand how it's working in most of our models, but in general it looks like `aten::copy_` is replaced incorrectly. There are two schemas for `aten::copy_`: 1. `aten::copy_.Tensor(Tensor(a!) self, Tensor other) -> Tensor(a!)` 2. `aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> Tensor(a!)` According to the logic in the comments we don't need one of the parameters for `aten::index_put_`. It seems logic has been inferred from ordinary `aten::copy` where there could be a third parameter which is `non_blocking` flag. Depending on the execution environment the sliced copying can be replaced either by first schema or by second schema with explicitly setting default parameter to `False`. If first schema is selected it will lead to the crash (which is easily to catch in our prod env). In case of the second schema selection, there is no crash, but the third parameter is treated as `accumulate` parameter of the `index_put_` function which doesn't make sense. So, in any case usage of the third parameter must be removed from the `aten::copy_` replacement. For more details and check this post: https://fb.workplace.com/groups/1405155842844877/permalink/25337687649165028/ Test Plan: The test fails in production envirounment only. In the test env `non_blocking` flag is mapped as `False` to the `acumulate` flag, which doesn't cause test to fail, but has no sense in terms of flags mapping. The export works without errors, before the fix it was failing with accessing by index out of bounds vector, like this: ``` 1095 _C._jit_onnx_log("Torch IR graph at exception: ", graph) File ~/.bento/kernels/bento_kernel_gaia_ml/1578/bento_kernel_gaia_ml_binary-inplace#link-tree/torch/onnx/utils.py:636, in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict, dynamic_axes, input_names, module) 629 _C._jit_pass_lower_all_tuples(graph) 630 # in _jit_pass_onnx, symbolic functions are called for each node for conversion. 631 # However, there are nodes that cannot be converted without additional context. 632 # For example, the number of outputs from split (and whether it is static or dynamic) is unknown 633 # until the point where it is unpacked by listUnpack node. 634 # This pass does a preprocess, and prepares the nodes such that enough context can be received 635 # by the symbolic function. --> 636 _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module) 637 _C._jit_pass_onnx_preprocess(graph) 639 # onnx does not support tuples, so try to remove them RuntimeError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) ``` The test script: ``` import torch as th import tempfile class CopyTest(th.nn.Module): def forward( self, input_th: th.Tensor ): to_fill = th.ones((3, 3)) to_fill[:, 0] = input_th[:, 0] return to_fill m = CopyTest() test_tensor = th.zeros((3, 3)) with tempfile.NamedTemporaryFile() as f: th.onnx.export( m, (test_tensor,), f, export_params=True, opset_version=17, do_constant_folding=True, input_names=["input"], output_names=["features"], dynamo=False, ) ``` The exported model test: ``` import torch import onnx import onnxruntime model_name = '/home/ironsided/test_model.onnx' onnx_model = onnx.load(model_name) onnx.checker.check_model(onnx_model) example_inputs = (torch.zeros(3, 3),) onnx_inputs = [tensor.numpy(force=True) for tensor in example_inputs] print(f"Input length: {len(onnx_inputs)}") print(f"Sample input: {onnx_inputs}") ort_session = onnxruntime.InferenceSession( model_name, providers=["CPUExecutionProvider"] ) onnxruntime_input = {input_arg.name: input_value for input_arg, input_value in zip(ort_session.get_inputs(), onnx_inputs)} # ONNX Runtime returns a list of outputs onnxruntime_outputs = ort_session.run(None, onnxruntime_input)[0] print(onnxruntime_outputs) ``` The produced result is correct: ``` Input length: 1 Sample input: [array([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], dtype=float32)] [[0. 1. 1.] [0. 1. 1.] [0. 1. 1.]] ``` Rollback Plan: Differential Revision: D80797028 Pull Request resolved: pytorch#161263 Approved by: https://github.com/justinchuby, https://github.com/jermenkoo

ironsided777 requested a review from titaiwangms as a code owner August 22, 2025 12:00

pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Aug 22, 2025

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 22, 2025

facebook-github-bot added the fb-exported label Aug 22, 2025

jermenkoo self-requested a review August 22, 2025 19:08

justinchuby reviewed Aug 22, 2025

View reviewed changes

torch/csrc/jit/passes/onnx/remove_inplace_ops_for_onnx.cpp Show resolved Hide resolved

ironsided777 force-pushed the export-D80797028 branch from 02fe875 to b6877f9 Compare August 25, 2025 11:29

ironsided777 force-pushed the export-D80797028 branch from b6877f9 to ae7c9db Compare August 25, 2025 11:33

ironsided777 force-pushed the export-D80797028 branch from ae7c9db to b60c1dd Compare August 26, 2025 13:54

justinchuby approved these changes Aug 26, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 26, 2025

jermenkoo approved these changes Aug 27, 2025

View reviewed changes

ironsided777 force-pushed the export-D80797028 branch from b60c1dd to 5a91263 Compare August 27, 2025 10:58

ironsided777 force-pushed the export-D80797028 branch 2 times, most recently from a7aca0d to aaa433f Compare August 27, 2025 15:56

ironsided777 force-pushed the export-D80797028 branch from aaa433f to 016389c Compare August 27, 2025 15:59

pytorchmergebot added the merging label Aug 27, 2025

pytorchmergebot added the Merged label Aug 27, 2025

pytorchmergebot closed this in 47ecd20 Aug 27, 2025

pytorchmergebot removed the merging label Aug 27, 2025

ironsided777 deleted the export-D80797028 branch August 28, 2025 11:36

[ONNX] Fix index_put_ usage #161263

[ONNX] Fix index_put_ usage #161263

Uh oh!

Conversation

ironsided777 commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 22, 2025

Uh oh!

pytorch-bot bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161263

❗ 1 Active SEVs

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Aug 22, 2025

Uh oh!

Uh oh!

facebook-github-bot commented Aug 25, 2025

Uh oh!

facebook-github-bot commented Aug 26, 2025

Uh oh!

justinchuby left a comment

Choose a reason for hiding this comment

Uh oh!

jermenkoo left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 27, 2025

Uh oh!

facebook-github-bot commented Aug 27, 2025

Uh oh!

ironsided777 commented Aug 27, 2025

Uh oh!

pytorchmergebot commented Aug 27, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ironsided777 commented Aug 22, 2025 •

edited

Loading

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading

linux-foundation-easycla bot commented Aug 22, 2025 •

edited

Loading