KEMBAR78
[Inductor] short-term fix for needs_fixed_stride_order silent incorrectness by zou3519 · Pull Request #133452 · pytorch/pytorch · GitHub
Skip to content

Conversation

@zou3519
Copy link
Contributor

@zou3519 zou3519 commented Aug 14, 2024

Stack from ghstack (oldest at bottom):

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:

  • new test

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…ctness

This is a short-term fix for
#128084, for the purposes of a
2.4.1.
needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133452

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 04a6738 with merge base cd565bc (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@zou3519 zou3519 added the ci-no-td Do not run TD on this PR label Aug 14, 2024
…ent incorrectness"

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

[ghstack-poisoned]
…ent incorrectness"

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

[ghstack-poisoned]
zou3519 added a commit that referenced this pull request Aug 14, 2024
…ctness

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

ghstack-source-id: 8d19392
Pull Request resolved: #133452
@zou3519 zou3519 requested a review from eellison August 14, 2024 19:56
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment

new_args = []
new_kwargs = {}
schema_args, schema_kwargs = torch._library.utils.schema_args_kwargs(schema)
for arg, fx_arg, schema_arg in zip(args, fx_node.args, schema_args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no contract that the fx node at this point is using fx_node.args instead of fx_node.kwargs. What I would expect to see here is torch.fx.operator_schemas.normalize_function and then iterate over the schema by argname

Copy link
Contributor Author

@zou3519 zou3519 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I added this to the list of things we might want to tighten restrictions for (#133250) -- it is the case that after AOTDispatcher (args, kwargs) are normalized, but I guess the inductor passes can do whatever they want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way... the code as previously written assumes that fx_node.{args, kwargs} are zip-able with {args, kwargs}. Are you saying that's incorrect too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to handle the case where the schema_args/schema_kwargs can be different from args/kwargs. The way constrain_to_fx_strides is called (and from how the code was written previously) it looks like fx_node.{args, kwargs} being zip-able with {args, kwargs} is a reasonable assumption (but if we think that's a bug I can fix it in a follow-up -- I want this PR to go in without changing too much since it needs to be cherry-picked)

…ent incorrectness"

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

[ghstack-poisoned]
…ent incorrectness"

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

[ghstack-poisoned]
@zou3519 zou3519 requested a review from eellison August 15, 2024 14:55
@zou3519 zou3519 mentioned this pull request Aug 15, 2024
…ent incorrectness"

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

[ghstack-poisoned]
…ent incorrectness"

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

[ghstack-poisoned]
…ent incorrectness"

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

[ghstack-poisoned]
@zou3519 zou3519 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 16, 2024
…ent incorrectness"

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

[ghstack-poisoned]
@zou3519
Copy link
Contributor Author

zou3519 commented Aug 16, 2024

@pytorchbot merge -f "unrelated failure"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

zou3519 added a commit that referenced this pull request Aug 19, 2024
…ctness (#133452)

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

Pull Request resolved: #133452
Approved by: https://github.com/eellison
zou3519 added a commit that referenced this pull request Aug 19, 2024
…ctness (#133452)

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

Pull Request resolved: #133452
Approved by: https://github.com/eellison
atalman pushed a commit that referenced this pull request Aug 21, 2024
…ctness (#133452) (#133888)

This is a low-risk short-term fix for
#128084, for the purposes of
2.4.1. The actual fix for that issue is more risky and we'll target 2.5.

needs_fixed_stride_order is silently incorrect with args that are
mutable because it creates clones of those args, writes into them, and
doesn't update the original args.

This PR makes it so that needs_fixed_stride_order doesn't apply to
inputs that are being mutated.

This PR doesn't completely fix the problem, but it makes it less
incorrect: most of the time the input already has the correct strides
but inductor fails to recognize it, and in those cases writing directly
to the input is fine.

Test Plan:
- new test

Pull Request resolved: #133452
Approved by: https://github.com/eellison
atalman added a commit to atalman/pytorch that referenced this pull request Aug 22, 2024
@github-actions github-actions bot deleted the gh/zou3519/1044/head branch September 17, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants