KEMBAR78
Allow inplacing buffer when other users are inconsequential by exclamaforte · Pull Request #138383 · pytorch/pytorch · GitHub
Skip to content

Conversation

@exclamaforte
Copy link
Contributor

@exclamaforte exclamaforte commented Oct 19, 2024

Summary:
I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer.

Implements:
#132826

Test Plan:
New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138383

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e9fe5cc with merge base e080c89 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@exclamaforte exclamaforte force-pushed the exclamaforte/inplacing-reductions branch 2 times, most recently from 6a512a4 to 5a1d35a Compare October 19, 2024 01:39
@exclamaforte exclamaforte changed the title WIP Allow inplacing buffer when other users are inconsequential Allow inplacing buffer when other users are inconsequential Oct 19, 2024
@exclamaforte exclamaforte requested a review from eellison October 19, 2024 01:42
@exclamaforte exclamaforte removed the request for review from eellison October 19, 2024 20:28
@exclamaforte exclamaforte changed the title Allow inplacing buffer when other users are inconsequential WIP Allow inplacing buffer when other users are inconsequential Oct 19, 2024
@exclamaforte exclamaforte force-pushed the exclamaforte/inplacing-reductions branch 3 times, most recently from be4fc34 to 6d063e9 Compare October 22, 2024 19:19
@exclamaforte exclamaforte requested a review from eellison October 22, 2024 22:24
@exclamaforte exclamaforte changed the title WIP Allow inplacing buffer when other users are inconsequential Allow inplacing buffer when other users are inconsequential Oct 22, 2024
@exclamaforte exclamaforte force-pushed the exclamaforte/inplacing-reductions branch 2 times, most recently from 05c4937 to ed2f275 Compare October 23, 2024 17:45
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! nice to see this required minimal changes. cc @shunting314

Before we land - maybe let's do a dashboard run for safety. this should be motivation for landing donated buffers and turning on #133368 cc @BoyuanFeng

ordered_reads = sorted(self.read_writes.reads, key=lambda x: x.name)
# NOTE remove V.graph.removed_operations once deps issue is fixed
inconsequential_nodes = (
(self.ancestors - {self.get_name()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm surprised a node is an ancestor of itself...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's possible, do you know if the IR has been de-cycled when it gets to the scheduler? I added it defensibly in case there was a cycle, to prevent inplacing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure a node should not be an ancestor of itself - you can add an assertion, run tests, then remove

self.assertEqual(fn_opt(*inps), fn(*inps))

@config.patch(inplace_buffers=True)
def test_layer_norm_inplaces_after_matmul(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can make test work for both devices by passing in self.device and using above

@eellison eellison requested a review from shunting314 October 23, 2024 22:06
Copy link
Contributor

@shunting314 shunting314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Would you share the generated in-place kernel?

Can we do some benchmarking on the toy examples regarding

  1. memory saving
  2. perf impact

_, (code,) = run_and_get_code(torch.compile(fn), inp)
FileCheck().check("copy_").check_same("True").run(code)

@config.patch(inplace_buffers=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these config overriding since 'inplace_buffers' is on by default

self.assertTrue("in_out_ptr" not in code)
self.assertEqual(fn_opt(*inps), fn(*inps))

@config.patch(inplace_buffers=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@exclamaforte exclamaforte force-pushed the exclamaforte/inplacing-reductions branch 2 times, most recently from 51486ee to ce0a7f7 Compare October 25, 2024 19:26
@exclamaforte
Copy link
Contributor Author

Looks like the perf impact is slightly negative 🤔
Screenshot 2024-10-28 at 10 59 57 AM

@eellison
Copy link
Contributor

Is the base commit of your benchmarking run the same one you are comparing to ?

@exclamaforte
Copy link
Contributor Author

Yeah unfortunately
Screenshot 2024-10-28 at 11 14 35 AM

@eellison
Copy link
Contributor

eellison commented Oct 28, 2024

@exclamaforte
Copy link
Contributor Author

Ah cool! Sorry, still getting used to the perf dashboard, so I got the order flipped. Anyways, good to merge on perf after tests pass?

@shunting314
Copy link
Contributor

Can you share a generated inplace kernel? In some odd situations, Inductor codegen an 'in_out_ptr' but does not really do load/store..

@exclamaforte exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from 066b2d7 to a316a64 Compare November 4, 2024 19:02
@exclamaforte exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from a316a64 to f759d6b Compare November 4, 2024 22:03
@exclamaforte exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from f759d6b to a9b6b6e Compare November 4, 2024 22:07
exclamaforte added a commit that referenced this pull request Nov 4, 2024
Summary:
I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer.

Implements:
#132826

resolves: #138383
@exclamaforte exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from a9b6b6e to 6be89ca Compare November 4, 2024 22:21
@exclamaforte
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Summary:
I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer.

Implements:
#132826

resolves: #138383
@exclamaforte exclamaforte force-pushed the exclamaforte/inplacing-reductions branch from 6be89ca to e9fe5cc Compare November 4, 2024 23:59
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: pull / linux-focal-py3.11-clang10 / test (default, 1, 5, linux.4xlarge)

Details for Dev Infra team Raised by workflow job

@exclamaforte
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…138383)

Summary:
I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer.

Implements:
pytorch#132826

Test Plan:
New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer.

Pull Request resolved: pytorch#138383
Approved by: https://github.com/eellison
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…ytorch#138383)"

This reverts commit 8840889.

Reverted pytorch#138383 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems to break trunk after landing ([comment](pytorch#138383 (comment)))
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…138383)

Summary:
I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer.

Implements:
pytorch#132826

Test Plan:
New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer.

Pull Request resolved: pytorch#138383
Approved by: https://github.com/eellison
rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…ytorch#138383)"

This reverts commit 030f70b.

Reverted pytorch#138383 on behalf of https://github.com/huydhn due to Sorry for reverting this again, but I think it has a test failing internally and also on ROCm ([comment](pytorch#138383 (comment)))
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…138383)

Summary:
I think we can inplace a buffer if all of the users of said buffer are "inconsequential", defined as having been removed, being completed, or being part of the ancestors set. In particular, this allows LayerNorm to inplace its input buffer.

Implements:
pytorch#132826

Test Plan:
New unit test of matmul followed by LayerNorm, make sure there's an inplaced buffer.

Pull Request resolved: pytorch#138383
Approved by: https://github.com/eellison
@github-actions github-actions bot deleted the exclamaforte/inplacing-reductions branch December 6, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants