[MPS] Fix conv backward pass for channels last #141009

malfet · 2024-11-19T08:04:48Z

Looks like a regression caused by use of strided API, but adding the test revealed (at least in CI), that on Ventura it worked but returned garbage results, so fixed by removing all the logic about channels last (as it's irrelevant for strided API case and placeholder already turns tensor into a correct one)

This also allows one to remove mem_format_key and ns_shape_key (it was redundant even back then, as mem_format_key + getTensorsStringKey(grad_output_t) already uniquely identified the operation)

Fixes #140902

pytorch-bot · 2024-11-19T08:04:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141009

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ You can merge normally! (1 Unrelated Failure)

As of commit ed563b4 with merge base a440a01 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Mac MPS / macos-py3-arm64-mps / test (test_mps, 1, 1, macos-m2-15) (gh) (detected as infra flaky with no runner)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Looks like a regression caused by use of strided API. Fixed by ignoring memory layout, as for strided API it shoudl not matter, shoudl it? Fixes #140902

malfet · 2024-11-20T19:48:41Z

@pytorchbot merge -f "Lint + MPS tests are green"

pytorchmergebot · 2024-11-20T19:50:17Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Looks like a regression caused by use of strided API, but adding the test revealed (at least in CI), that on Ventura it worked but returned garbage results, so fixed by removing all the logic about channels last (as it's irrelevant for strided API case and placeholder already turns tensor into a correct one) This also allows one to remove `mem_format_key` and `ns_shape_key` (it was redundant even back then, as `mem_format_key` + `getTensorsStringKey(grad_output_t)` already uniquely identified the operation) Fixes pytorch#140902 Pull Request resolved: pytorch#141009 Approved by: https://github.com/manuelcandales

malfet requested a review from kulinseth as a code owner November 19, 2024 08:04

pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Nov 19, 2024

malfet added 2 commits November 20, 2024 07:34

[MPS] Fix conv backward pass for channels last

0ce82a9

Looks like a regression caused by use of strided API. Fixed by ignoring memory layout, as for strided API it shoudl not matter, shoudl it? Fixes #140902

Get rid of all memory format craziness

d902d9a

malfet force-pushed the malfet/fix-conv-backward-cl branch from b4c3618 to d902d9a Compare November 20, 2024 16:32

malfet added 2 commits November 20, 2024 09:26

Fix lint

930934d

And delete this one as well

ed563b4

malfet added the topic: bug fixes topic category label Nov 20, 2024

manuelcandales self-requested a review November 20, 2024 18:20

manuelcandales approved these changes Nov 20, 2024

View reviewed changes

pytorchmergebot added the merging label Nov 20, 2024

pytorchmergebot added the Merged label Nov 20, 2024

pytorchmergebot closed this in a8794fd Nov 20, 2024

pytorchmergebot removed the merging label Nov 20, 2024

malfet mentioned this pull request Dec 1, 2024

[MPS] Convert channels_last_3d to contiguous for input tensor in nn.Conv3d #141780

Closed

github-actions bot deleted the malfet/fix-conv-backward-cl branch December 22, 2024 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MPS] Fix conv backward pass for channels last #141009

[MPS] Fix conv backward pass for channels last #141009

Uh oh!

malfet commented Nov 19, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 19, 2024 •

edited

Loading

Uh oh!

malfet commented Nov 20, 2024

Uh oh!

pytorchmergebot commented Nov 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[MPS] Fix conv backward pass for channels last #141009

[MPS] Fix conv backward pass for channels last #141009

Uh oh!

Conversation

malfet commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141009

❗ 1 Active SEVs

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

malfet commented Nov 20, 2024

Uh oh!

pytorchmergebot commented Nov 20, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

malfet commented Nov 19, 2024 •

edited

Loading

pytorch-bot bot commented Nov 19, 2024 •

edited

Loading