KEMBAR78
[ONNX] Insert contiguous node between transpose and view before calling run_decompositions by xadupre · Pull Request #137340 · pytorch/pytorch · GitHub
Skip to content

Conversation

@xadupre
Copy link
Collaborator

@xadupre xadupre commented Oct 4, 2024

Works around #136543.

This fix solves the issue only in the context of the ONNX exporter but this issue happens in other context.

The bug happens when method run_decompositions is called. The failing pattern is assumed to be view(transpose(x, ...)). This pattern is replaced by view(flatten(transpose(x, ..))). By changing the dimensions, the strides are updated as well and run_decompositions does not fail anymore. It would be inefficient on a 1D tensor but then transpose would not be used. The extra node appears in the final onnx graph but is removed after optimization. The final onnx graph should not be impacted and no performance loss should be observed for the onnx model.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 4, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137340

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 65057d6 with merge base a063a82 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Oct 4, 2024
@justinchuby
Copy link
Collaborator

Is this intended to be merged in? If we can address the root cause that would be the best imo, since we have had issues with too many FX passes where we don’t know which one to look when an error is introduced. I can run a git bisect today to try to pin down the commit.

If we do decide to merge this as a temporary fix, I recommend moving the logic to _fx_passes, where all fx passes live.

@xadupre
Copy link
Collaborator Author

xadupre commented Oct 4, 2024

The fix is just one pass on the fx graph. It is done inplace. It should be very fast. I can move the function to _fx_passes.py if that's what you mean.

@justinchuby
Copy link
Collaborator

The fix is just one pass on the fx graph. It is done inplace. It should be very fast. I can move the function to _fx_passes.py if that's what you mean.

I am more concerned about complexity and robustness over speed. But since this should be a temporary fix, it should be useful to unblock us for now. I would just make it very clear that it should be removed when the issue is resolved and should be removed before the 2.6 release

xadupre and others added 4 commits October 4, 2024 20:36
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
@justinchuby justinchuby added the topic: bug fixes topic category label Oct 4, 2024
@justinchuby
Copy link
Collaborator

justinchuby commented Oct 5, 2024

I tested that changing to contiguous works.

        with graph.inserting_after(node):
            new_node = graph.call_function(torch.ops.aten.contiguous.default, args=(node,))
            node.replace_all_uses_with(new_node)
            # new_node is replaced as well so we manually revert the replacement
            new_node.update_arg(0, node)
            node.users = {new_node: None}

@justinchuby justinchuby changed the title [ONNX] Insert flatten node between transpose and view before calling run_decompositions [ONNX] Insert contiguous node between transpose and view before calling run_decompositions Oct 7, 2024
@justinchuby
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 8, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: onnx Related to torch.onnx open source release notes: onnx torch.onnx related changes that should show up in the release notes topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants