KEMBAR78
fix unbacked + view incorrectness by eellison · Pull Request #145548 · pytorch/pytorch · GitHub
Skip to content

Conversation

@eellison
Copy link
Contributor

@eellison eellison commented Jan 23, 2025

Stack from ghstack (oldest at bottom):

fix for #143498

We were incorrectly using contiguous strides for a non-contiguous tensor. There are two separate causes:

  1. Fix size_hint call sites failing on unbacked SymInts #110520 made it so we turn Views contiguous with unbacked symints becuase
    dynamic_reshape_indexer below will fail due to the size_hint's inability to process unbacked SymInts. Seems like we should fix. Regardless - it will make the input contiguous if input is unbacked to workaround this.

  2. We weren't actually making it contiguous! I filed an issue for this here: Confusing as_storage_and_layout(x, want_contiguous=True) behavior #145561.

This is still worth landing as a fix, even though we should those issues.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145548

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit e2be08b with merge base 639dd54 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov

[ghstack-poisoned]
eellison added a commit that referenced this pull request Jan 24, 2025
ghstack-source-id: d70000d
Pull Request resolved: #145548
@eellison eellison requested review from desertfire and ezyang January 24, 2025 00:09
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 00:38 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 00:38 Inactive
@ezyang
Copy link
Contributor

ezyang commented Jan 24, 2025

I don't understand what these API functions do well enough to review this, but I would be happy to if someone educates me :)

@eellison eellison requested a review from shunting314 January 24, 2025 06:29
# TODO: unbacked should not diverge from backed in determining striding
# Need to require contiguous here instead of realize, see:
# https://github.com/pytorch/pytorch/issues/145561
x = ExternKernel.require_contiguous(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine following the existing logic, but I think unbacked_symbols_in_sizes doesn't necessarily mean x should be contiguous, so a better way is to realize x and create different view based on x's strides?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create a contiguous copy if x is not contiguous or having symints? (e.g. an non contiguous ReinterpretView of a contiguous buffer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, this branch is operating under the assumption that x is contiguous. Then that was extended to contiguous or has unbacked symints: https://github.com/pytorch/pytorch/pull/110520/files#diff-cf6ca00beddc32a2a6a2933fb9913b6a2b925ffc3b745488967210e4343134acR1735-R1739 bc dynamic_reshape_indexer fails with unbacked. The current api call (want_contiguous) does not enforce that it is contiguous, which I'm fixing in this pr.

I agree, in the future, we should add support for dynamic_reshape_indexer & unbacked symints. I will file an issue for that.

@eellison
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 27, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@eellison
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/eellison/754/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/145548)

@pytorch-bot pytorch-bot bot had a problem deploying to upload-benchmark-results January 27, 2025 21:29 Error
[ghstack-poisoned]
eellison added a commit that referenced this pull request Jan 28, 2025
ghstack-source-id: a804c91
Pull Request resolved: #145548
@eellison
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@eellison
Copy link
Contributor Author

@pytorchbot merge -i

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 28, 2025 00:31 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 28, 2025 00:31 Inactive
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: Lint / lintrunner-noclang / linux-job

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 28, 2025 00:32 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 28, 2025 00:32 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 28, 2025 00:32 Inactive
@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@eellison
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@eellison
Copy link
Contributor Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: Lint / lintrunner-noclang / linux-job

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants