KEMBAR78
[inductor] force contiguous layout for implicit fallback by shunting314 · Pull Request #140996 · pytorch/pytorch · GitHub
Skip to content

Conversation

@shunting314
Copy link
Contributor

@shunting314 shunting314 commented Nov 19, 2024

Stack from ghstack (oldest at bottom):

Fix #140462 .

Horace found that when we implicitly fallback to eager, some eager kernels may not work correctly if Inductor provide non-contiguous inputs (due to padding etc.). The original issue is found for the backward op of weight_norm. The fix in this PR is a general one: we force inputs to all implicit fallback kernels to be contiguous.

I have to refactor the code a bit to make it work. Previously we apply layout constraint in GraphLowering.run_node. We looks for implicit fallback in call_function. The problem here is, when we setup the implicit fallback in call_function with a layout constraint, we don't have a chance to apply the constraints.. The refactor moves the code that applies layout constraints to call_function.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140996

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 5a54d8e with merge base eff2217 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Fix #140462 .

Horace found that when we implicitly fallback to eager, some eager kernels may not work correctly if Inductor provide non-contiguous inputs (due to padding etc.). The original issue is found for the backward op of weight_norm. The fix in this PR is a general one: we force inputs to all implicit fallback kernels to be contiguous.

I have to refactor the code a bit to make it work. Previously we apply layout constraint in `GraphLowering.run_node`. We looks for implicit fallback in `call_function`. The problem here is, when we setup the implicit fallback in `call_function` with a layout constraint, we don't have a chance to apply the constraints.. The refactor moves the code that applies layout constraints to `call_function`.


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
shunting314 added a commit that referenced this pull request Nov 19, 2024
@shunting314 shunting314 requested review from ezyang and jansel November 19, 2024 02:29
@Chillee
Copy link
Collaborator

Chillee commented Nov 19, 2024

Should we force them to be contiguous or force them to follow the FX strides?

@shunting314
Copy link
Contributor Author

shunting314 commented Nov 19, 2024

Should we force them to be contiguous or force them to follow the FX strides?

The problem is we don't have the real eager strides. The strides stored in meta['val'] may be the one after optimization for the bwd graph.

Richard also mentioned that having the real eager strides is needed for custom-op. That gives us more motivation to pass the real eager strides to Inductor.

@Chillee
Copy link
Collaborator

Chillee commented Nov 19, 2024

@shunting314 If that's true then we need to avoid changing eager strides I think haha.

@shunting314
Copy link
Contributor Author

If that's true then we need to avoid changing eager strides I think haha.

But we should normally be free to change the strides of saved activations since they are not user visible. Keeping them following eager strides may be too restrictive and leaves little padding/layout-optimization opportunities.

@shunting314 shunting314 added the topic: not user facing topic category label Nov 19, 2024
Fix #140462 .

Horace found that when we implicitly fallback to eager, some eager kernels may not work correctly if Inductor provide non-contiguous inputs (due to padding etc.). The original issue is found for the backward op of weight_norm. The fix in this PR is a general one: we force inputs to all implicit fallback kernels to be contiguous.

I have to refactor the code a bit to make it work. Previously we apply layout constraint in `GraphLowering.run_node`. We looks for implicit fallback in `call_function`. The problem here is, when we setup the implicit fallback in `call_function` with a layout constraint, we don't have a chance to apply the constraints.. The refactor moves the code that applies layout constraints to `call_function`.


cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
shunting314 added a commit that referenced this pull request Nov 20, 2024
@shunting314
Copy link
Contributor Author

@pytorchbot merge -f

Line error for nested tensors is unrelated.

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 20, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot merge: error: argument -f/--force: expected one argument

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Try @pytorchbot --help for more info.

@shunting314
Copy link
Contributor Author

@pytorchbot merge -f "Line error for nested tensors is unrelated."

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Dec 2, 2024
)

Fix pytorch#140462 .

Horace found that when we implicitly fallback to eager, some eager kernels may not work correctly if Inductor provide non-contiguous inputs (due to padding etc.). The original issue is found for the backward op of weight_norm. The fix in this PR is a general one: we force inputs to all implicit fallback kernels to be contiguous.

I have to refactor the code a bit to make it work. Previously we apply layout constraint in `GraphLowering.run_node`. We looks for implicit fallback in `call_function`. The problem here is, when we setup the implicit fallback in `call_function` with a layout constraint, we don't have a chance to apply the constraints.. The refactor moves the code that applies layout constraints to `call_function`.

Pull Request resolved: pytorch#140996
Approved by: https://github.com/jansel
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
)

Fix pytorch#140462 .

Horace found that when we implicitly fallback to eager, some eager kernels may not work correctly if Inductor provide non-contiguous inputs (due to padding etc.). The original issue is found for the backward op of weight_norm. The fix in this PR is a general one: we force inputs to all implicit fallback kernels to be contiguous.

I have to refactor the code a bit to make it work. Previously we apply layout constraint in `GraphLowering.run_node`. We looks for implicit fallback in `call_function`. The problem here is, when we setup the implicit fallback in `call_function` with a layout constraint, we don't have a chance to apply the constraints.. The refactor moves the code that applies layout constraints to `call_function`.

Pull Request resolved: pytorch#140996
Approved by: https://github.com/jansel
@github-actions github-actions bot deleted the gh/shunting314/188/head branch December 21, 2024 02:05
@zou3519
Copy link
Contributor

zou3519 commented Feb 26, 2025

Btw, by default we want custom operators to match their strides in eager mode. Forcing .contiguous() is not the best. This unfortunately broke the vllm-compile integration but at least there's a workaround (setting the tags on the custom ops)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants