[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136594

masnesral · 2024-09-25T01:29:53Z

Stack from ghstack (oldest at bottom):

-> [inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136594

Summary: We have an internal report of a Triton compiler error ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048] coming from a line like this:

tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK])

#135260 is the cause, presumably because we turn a constant into a 1-element tensor with: (tl.full([1], const, tl.float64)). It looks like changing the syntax to (tl.full([], const, tl.float64)) gives us what we want?

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Differential Revision: D63465169

…ead of 1-element tensor Summary: We have an internal report of a Triton compiler error `ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048]` coming from a line like this: `tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK]) ` #135260 is the cause, presumably because we turn a constant into a 1-element tensor with: `(tl.full([1], const, tl.float64))`. It looks like changing the syntax to `(tl.full([], const, tl.float64))` gives us what we want? [ghstack-poisoned]

pytorch-bot · 2024-09-25T01:29:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136594

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b98deb9 with merge base d5e4a20 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ead of 1-element tensor Summary: We have an internal report of a Triton compiler error `ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048]` coming from a line like this: `tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK]) ` #135260 is the cause, presumably because we turn a constant into a 1-element tensor with: `(tl.full([1], const, tl.float64))`. It looks like changing the syntax to `(tl.full([], const, tl.float64))` gives us what we want? ghstack-source-id: fc004c4 Pull Request resolved: #136594

masnesral · 2024-09-25T01:31:08Z

@masnesral has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

masnesral · 2024-09-25T17:07:53Z

@jansel would you mind taking a look? mengluy0125 verified this fixes the internal usage. Also, I've so far failed to reverse what PyTorch code would generate this particular pattern of a broadcast_to with an fp64 arg. If you know what would do that, I could also add a unit test.

facebook-github-bot · 2024-09-26T03:14:58Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2024-09-26T03:16:57Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-26T03:17:05Z

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x fa57ff70960901190f710aa3d7d5a7c119978ee0 returned non-zero exit code 1

Auto-merging torch/_inductor/codegen/triton.py
CONFLICT (content): Merge conflict in torch/_inductor/codegen/triton.py
error: could not apply fa57ff7096... [inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config advice.mergeConflict false"

Details for Dev Infra team

Raised by workflow job

…nstant instead of 1-element tensor" Summary: We have an internal report of a Triton compiler error `ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048]` coming from a line like this: `tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK]) ` #135260 is the cause, presumably because we turn a constant into a 1-element tensor with: `(tl.full([1], const, tl.float64))`. It looks like changing the syntax to `(tl.full([], const, tl.float64))` gives us what we want? cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang Differential Revision: [D63360293](https://our.internmc.facebook.com/intern/diff/D63360293) [ghstack-poisoned]

…ead of 1-element tensor Summary: We have an internal report of a Triton compiler error `ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048]` coming from a line like this: `tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK])` #135260 is the cause, presumably because we turn a constant into a 1-element tensor with: `(tl.full([1], const, tl.float64))`. It looks like changing the syntax to `(tl.full([], const, tl.float64))` gives us what we want? Pull Request resolved: #136594 ghstack-source-id: 4f2c28d

masnesral · 2024-09-26T15:03:21Z

@masnesral has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

masnesral · 2024-09-27T03:54:04Z

@pytorchbot merge

pytorchmergebot · 2024-09-27T03:55:41Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

facebook-github-bot · 2024-09-27T04:04:25Z

@pytorchbot revert -m="Diff reverted internally" -c="ghfirst"

This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).)

pytorchmergebot · 2024-09-27T04:05:55Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…ant instead of 1-element tensor (#136594)" This reverts commit 2c5f5e3. Reverted #136594 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#136594 (comment)))

pytorchmergebot · 2024-09-27T04:06:10Z

@masnesral your PR has been successfully reverted.

facebook-github-bot · 2024-09-27T04:35:22Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2024-09-27T04:37:08Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team

Raised by workflow job

…ead of 1-element tensor This is a retry of #136594, which is having trouble landing. Summary: We have an internal report of a Triton compiler error `ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048]` coming from a line like this: `tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK])` #135260 is the cause, presumably because we turn a constant into a 1-element tensor with: `(tl.full([1], const, tl.float64))`. It looks like changing the syntax to `(tl.full([], const, tl.float64))` gives us what we want? [ghstack-poisoned]

…ead of 1-element tensor This is a retry of #136594, which is having trouble landing. Summary: We have an internal report of a Triton compiler error `ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048]` coming from a line like this: `tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK])` #135260 is the cause, presumably because we turn a constant into a 1-element tensor with: `(tl.full([1], const, tl.float64))`. It looks like changing the syntax to `(tl.full([], const, tl.float64))` gives us what we want? ghstack-source-id: 141efbd Pull Request resolved: #136858

masnesral · 2024-09-27T14:31:22Z

Starting fresh with: #136858

…ead of 1-element tensor (#136858) This is a retry of #136594, which is having trouble landing. Summary: We have an internal report of a Triton compiler error `ValueError: Cannot broadcast, rank mismatch: [1], [1, 2048]` coming from a line like this: `tmp25 = tl.broadcast_to(((tl.full([1], 1.00000000000000, tl.float64)) + ((ks0 // 3278).to(tl.float64))) / (((tl.full([1], 0.500000000000000, tl.float64))*(libdevice.sqrt((1 + ((ks0 // 3278)*(ks0 // 3278)) + ((-2)*(ks0 // 3278))).to(tl.float64).to(tl.float32)))) + ((tl.full([1], 0.500000000000000, tl.float64))*((1 + (ks0 // 3278)).to(tl.float64)))), [XBLOCK, RBLOCK])` #135260 is the cause, presumably because we turn a constant into a 1-element tensor with: `(tl.full([1], const, tl.float64))`. It looks like changing the syntax to `(tl.full([], const, tl.float64))` gives us what we want? Differential Revision: [D63540693](https://our.internmc.facebook.com/intern/diff/D63540693) Pull Request resolved: #136858 Approved by: https://github.com/atalman

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 25, 2024

masnesral added the topic: not user facing topic category label Sep 25, 2024

mengluy0125 approved these changes Sep 25, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 25, 2024

masnesral marked this pull request as ready for review September 25, 2024 17:04

masnesral requested a review from jansel September 25, 2024 17:05

jansel approved these changes Sep 26, 2024

View reviewed changes

pytorchmergebot added the merging label Sep 26, 2024

pytorchmergebot removed the merging label Sep 26, 2024

mengluy0125 approved these changes Sep 26, 2024

View reviewed changes

pytorchmergebot added the merging label Sep 27, 2024

pytorchmergebot added the Merged label Sep 27, 2024

pytorchmergebot closed this in 2c5f5e3 Sep 27, 2024

pytorchmergebot removed the merging label Sep 27, 2024

pytorchmergebot added the Reverted label Sep 27, 2024

pytorchmergebot reopened this Sep 27, 2024

pytorchmergebot added the merging label Sep 27, 2024

pytorchmergebot removed the merging label Sep 27, 2024

masnesral mentioned this pull request Sep 27, 2024

[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136858

Closed

masnesral closed this Sep 27, 2024

github-actions bot deleted the gh/masnesral/117/head branch October 28, 2024 02:08

[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136594

[inductor] Triton codegen: Use scalar when creating f64 constant instead of 1-element tensor #136594

Uh oh!

Conversation

masnesral commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136594

✅ No Failures

Uh oh!

masnesral commented Sep 25, 2024

Uh oh!

masnesral commented Sep 25, 2024

Uh oh!

facebook-github-bot commented Sep 26, 2024

Uh oh!

pytorchmergebot commented Sep 26, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 26, 2024

Merge failed

Uh oh!

masnesral commented Sep 26, 2024

Uh oh!

masnesral commented Sep 27, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Merge started

Uh oh!

facebook-github-bot commented Sep 27, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Uh oh!

facebook-github-bot commented Sep 27, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Merge failed

Uh oh!

masnesral commented Sep 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

masnesral commented Sep 25, 2024 •

edited

Loading

pytorch-bot bot commented Sep 25, 2024 •

edited

Loading