Fix ConvolutionBinaryInplace using target node #114436

oulgen · 2023-11-23T02:41:29Z

Stack from ghstack (oldest at bottom):

This IR node mutates in place, it needs to use the argument not the
target.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

Fixes #113440 [ghstack-poisoned]

pytorch-bot · 2023-11-23T02:41:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114436

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 35 New Failures, 39 Unrelated Failures

As of commit e75592d with merge base 8f8722e ():

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Fixes #113440 ghstack-source-id: d056bda Pull Request resolved: #114436

oulgen · 2023-11-23T02:42:54Z

There's a bug in convolution_unary where it uses the target of ConvolutionBinaryInplace even though it needs to use the input[1]. I am having hard time finding where this happens so partially reverting this part until I can find it.

cc: @Chillee

EDIT: updated the PR with the correct fix

This IR node mutates in place, it needs to use the argument no the target. Fixes #113440 [ghstack-poisoned]

This IR node mutates in place, it needs to use the argument no the target. Fixes #113440 Pull Request resolved: #114436 ghstack-source-id: f631332

leslie-fang-intel

Thanks for the fix. The UT passes after this fix, but the model still failure. For example: python benchmarks/dynamo/torchbench.py --performance --float32 -dcpu -n50 --no-skip --dashboard --only resnet50 --inference --freezing --timeout 9000 --backend=inductor --output=test.csv

error message: return getattr(self, n.op)(n.target, args, kwargs) File "/home/leslie/inductor/pytorch/torch/_inductor/graph.py", line 589, in call_function return target(*args, **kwargs) File "/home/leslie/inductor/pytorch/torch/_inductor/fx_passes/mkldnn_fusion.py", line 397, in fn assert isinstance(other, ir.TensorBox) torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: AssertionError:
for which I think due to we need the TensorBox instead of ExternKernelAlloc. Instead of current fix, how about we made some change here

pytorch/torch/_inductor/ir.py

Line 5158 in 7daeb65

return packed

to return packed.inputs[0], since the return value will be wrapped into TensorBox.

leslie-fang-intel · 2023-11-23T08:00:22Z

cc @chuanqi129, since more than 1 models failed among fp32/bf16 testing, please help to confirm all the failures fixed by this PR.

This IR node mutates in place, it needs to use the argument not the target. Fixes #113440 [ghstack-poisoned]

This IR node mutates in place, it needs to use the argument no the target. Fixes #113440 Pull Request resolved: #114436 ghstack-source-id: bc5a7ab

oulgen · 2023-11-23T08:38:47Z

@leslie-fang-intel updated the PR, can you check whether this fixes the other issue? If not, please share the full assertion failure

leslie-fang-intel · 2023-11-23T08:40:07Z

@leslie-fang-intel updated the PR, can you check whether this fixes the other issue? If not, please share the full assertion failure

cc @chuanqi129, please help to confirm

leslie-fang-intel · 2023-11-23T08:50:06Z

@leslie-fang-intel updated the PR, can you check whether this fixes the other issue? If not, please share the full assertion failure

Thanks @oulgen, emmm.... I think we should not use inputs[1]. Per my understanding, inputs[1] and packed.inputs[0] might be different type. Since inputs[1] might be a TensorBox and unwrap into the packed.inputs[0] (Buffer) here

pytorch/torch/_inductor/ir.py

Line 3748 in 7daeb65

None, layout, self.unwrap_storage(inputs), constant_args, kwargs or {}

oulgen · 2023-11-23T18:19:33Z

@leslie-fang-intel I don't quite follow what you mean. The op mutates input[1], and previously via MutationLayout it was returning input[1]. So, returning input[1] is the correct result.

leslie-fang-intel · 2023-11-24T01:12:59Z

@leslie-fang-intel I don't quite follow what you mean. The op mutates input[1], and previously via MutationLayout it was returning input[1]. So, returning input[1] is the correct result.

I think inputs[1] equals to TensorBox(StorageBox((packed.inputs[0]))) , and we should return the buffer object (same as previously) instead of the Tensorbox here. Otherwise the model level testing (and also this test case) will still fail. So, maybe return packed.inputs[0] here instead of inputs[1].

This IR node mutates in place, it needs to use the argument not the target. Fixes #113440 [ghstack-poisoned]

This IR node mutates in place, it needs to use the argument no the target. Fixes #113440 Pull Request resolved: #114436 ghstack-source-id: 53e247a

oulgen · 2023-11-24T01:48:02Z

@leslie-fang-intel Thanks! I missed the buffer wrapping when i moved the code around. Updated the code.

jansel

Please double check ConvolutionBinaryInplace set get_mutation_names() properly.

leslie-fang-intel · 2023-11-24T06:18:18Z

@chuanqi129 is working on verifying this fix among the failures we saw in fp32/bf16 static/dynamic shape tests. Expect he wiill help to post the result in early next week

oulgen · 2023-11-24T06:22:21Z

@pytorchbot merge

oulgen · 2023-11-24T06:23:41Z

@pytorchbot abort

pytorch-bot · 2023-11-24T06:23:43Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'abort' (choose from 'merge', 'revert', 'rebase', 'label', 'drci')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci} ...

Try @pytorchbot --help for more info.

pytorchmergebot · 2023-11-24T06:24:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

oulgen · 2023-11-24T06:26:14Z

Actually @jansel is correct, since init reorders the arguments, the get_mutation_names is incorrect. I will put a follow up PR to fix this. In general, reordering arguments will lead to bugs like this

Init function reorders the arguments so the mutation actually happens on argument input[0] I am not sure if there's a good way to test this unfortunately.. Added tests on #114436 [ghstack-poisoned]

Init function reorders the arguments so the mutation actually happens on argument input[0] I am not sure if there's a good way to test this unfortunately.. Added tests on #114436 ghstack-source-id: 41d8a07 Pull Request resolved: #114501

oulgen · 2023-11-24T06:51:45Z

#114501 fixes the mutation tracking bug. My apologies. I tried to cancel the merge but couldn't figure out how to do it.

aakhundov · 2023-11-24T11:23:24Z

I tried to cancel the merge but couldn't figure out how to do it.

@eellison once mentioned that closing and reopening the PR should stop the merge. Although I never tried myself.

Init function reorders the arguments so the mutation actually happens on argument input[0] I am not sure if there's a good way to test this unfortunately.. Added tests on #114436 Pull Request resolved: #114501 Approved by: https://github.com/leslie-fang-intel, https://github.com/aakhundov

Partially revert #112925 to unblock mkldnn

8a98fd5

Fixes #113440 [ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 23, 2023

Partially revert #112925 to unblock mkldnn

6cde468

Fixes #113440 ghstack-source-id: d056bda Pull Request resolved: #114436

github-actions bot added module: inductor ciflow/inductor labels Nov 23, 2023

oulgen requested review from Chillee, jansel and leslie-fang-intel November 23, 2023 02:43

oulgen added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Nov 23, 2023

oulgen changed the title ~~Partially revert #112925 to unblock mkldnn~~ Fix ConvolutionBinaryInplace using target node Nov 23, 2023

Update on "Fix ConvolutionBinaryInplace using target node"

ab4da63

This IR node mutates in place, it needs to use the argument no the target. Fixes #113440 [ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 23, 2023

Fix ConvolutionBinaryInplace using target node

4678b3b

This IR node mutates in place, it needs to use the argument no the target. Fixes #113440 Pull Request resolved: #114436 ghstack-source-id: f631332

leslie-fang-intel requested a review from jgong5 November 23, 2023 03:16

leslie-fang-intel reviewed Nov 23, 2023

View reviewed changes

Update on "Fix ConvolutionBinaryInplace using target node"

edc8820

This IR node mutates in place, it needs to use the argument not the target. Fixes #113440 [ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 23, 2023

Fix ConvolutionBinaryInplace using target node

afa9443

This IR node mutates in place, it needs to use the argument no the target. Fixes #113440 Pull Request resolved: #114436 ghstack-source-id: bc5a7ab

Update on "Fix ConvolutionBinaryInplace using target node"

e75592d

This IR node mutates in place, it needs to use the argument not the target. Fixes #113440 [ghstack-poisoned]

oulgen added a commit that referenced this pull request Nov 24, 2023

Fix ConvolutionBinaryInplace using target node

7fb8ff0

This IR node mutates in place, it needs to use the argument no the target. Fixes #113440 Pull Request resolved: #114436 ghstack-source-id: 53e247a

jansel approved these changes Nov 24, 2023

View reviewed changes

pytorchmergebot added the merging label Nov 24, 2023

pytorchmergebot added the Merged label Nov 24, 2023

pytorchmergebot removed the merging label Nov 24, 2023

pytorchmergebot closed this in 5139072 Nov 24, 2023

oulgen reopened this Nov 24, 2023

oulgen closed this Nov 24, 2023

oulgen mentioned this pull request Nov 24, 2023

[Inductor] Fix mutation tracking of ConvolutionBinaryInplace #114501

Closed

facebook-github-bot deleted the gh/oulgen/38/head branch November 27, 2023 15:29

Fix ConvolutionBinaryInplace using target node #114436

Fix ConvolutionBinaryInplace using target node #114436

Conversation

oulgen commented Nov 23, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114436

❌ 35 New Failures, 39 Unrelated Failures

Uh oh!

oulgen commented Nov 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leslie-fang-intel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leslie-fang-intel commented Nov 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oulgen commented Nov 23, 2023

Uh oh!

leslie-fang-intel commented Nov 23, 2023

Uh oh!

leslie-fang-intel commented Nov 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oulgen commented Nov 23, 2023

Uh oh!

leslie-fang-intel commented Nov 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oulgen commented Nov 24, 2023

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

leslie-fang-intel commented Nov 24, 2023

Uh oh!

oulgen commented Nov 24, 2023

Uh oh!

oulgen commented Nov 24, 2023

Uh oh!

pytorch-bot bot commented Nov 24, 2023

Uh oh!

pytorchmergebot commented Nov 24, 2023

Merge started

Uh oh!

oulgen commented Nov 24, 2023

Uh oh!

oulgen commented Nov 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aakhundov commented Nov 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

oulgen commented Nov 23, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 23, 2023 •

edited

Loading

oulgen commented Nov 23, 2023 •

edited

Loading

leslie-fang-intel left a comment •

edited

Loading

leslie-fang-intel commented Nov 23, 2023 •

edited

Loading

leslie-fang-intel commented Nov 23, 2023 •

edited

Loading

leslie-fang-intel commented Nov 24, 2023 •

edited

Loading

oulgen commented Nov 24, 2023 •

edited

Loading