[AOTI] Fix memory leak from all_reduce #159818

desertfire · 2025-08-04T23:44:06Z

Stack from ghstack (oldest at bottom):

-> [AOTI] Fix memory leak from all_reduce #159818

Summary: This PR solves two issues:

When lowering the all_reduce op, Inductor expects to convert it to the in-place version, all_reduce_, but it was calling ir._AllReduceKernel.create_inplace instead of ir._AllReduce_Kernel.create_inplace. This triggers a tricky bug in AOIT because it generates cpp call to the functional version aoti_torch_cpu__c10d_functional_all_reduce, but later corresponding wait operation will still wait on the input to aoti_torch_cpu__c10d_functional_all_reduce instead of the output from aoti_torch_cpu__c10d_functional_all_reduce. This causes unwaited tensor leading to memory leak.
Since AOTI generates the inplace version aoti_torch_cpu__c10d_functional_all_reduce_ now. The return tensor from aoti_torch_cpu__c10d_functional_all_reduce_ doesn't get used. It will be released when the program exists, so it's not a memory leak but it will unnecessarily hold that tensor which causes high memory water mark. This PR generates tensor delete operation right after calling aoti_torch_cpu__c10d_functional_all_reduce_.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

[ghstack-poisoned]

pytorch-bot · 2025-08-04T23:44:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159818

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'

✅ You can merge normally! (3 Unrelated Failures)

As of commit 6b6b37c with merge base f946b25 ():

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

Check Labels / Check labels (gh) (#159894)
RuntimeError: GraphQL query
Check mergeability of ghstack PR / ghstack-mergeability-check (gh) (#159899)
RuntimeError: GraphQL query
pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: This PR solves two issues: 1. When lowering the all_reduce op, Inductor expects to convert it to the in-place version, all_reduce_, but it was calling ir._AllReduceKernel.create_inplace instead of ir._AllReduce_Kernel.create_inplace. This triggers a tricky bug in AOIT because it generates cpp call to the functional version aoti_torch_cpu__c10d_functional_all_reduce, but later corresponding wait operation will still wait on the input to aoti_torch_cpu__c10d_functional_all_reduce instead of the output from aoti_torch_cpu__c10d_functional_all_reduce. This causes unwaited tensor leading to memory leak. 2. Since AOTI generates the inplace version aoti_torch_cpu__c10d_functional_all_reduce_ now. The return tensor from aoti_torch_cpu__c10d_functional_all_reduce_ doesn't get used. It will be released when the program exists, so it's not a memory leak but it will unnecessarily hold that tensor which causes high memory water mark. This PR generates tensor delete operation right after calling aoti_torch_cpu__c10d_functional_all_reduce_. ghstack-source-id: 65b0cd1 Pull-Request: #159818

desertfire · 2025-08-06T00:15:55Z

@pytorchbot merge

pytorchmergebot · 2025-08-06T00:18:02Z

Merge failed

Reason: Approvers from one of the following sets are needed:

superuser (pytorch/metamates)
Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

desertfire · 2025-08-06T18:09:03Z

@pytorchbot merge

pytorchmergebot · 2025-08-06T18:10:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: This PR solves two issues: 1. When lowering the all_reduce op, Inductor expects to convert it to the in-place version, all_reduce_, but it was calling ir._AllReduceKernel.create_inplace instead of ir._AllReduce_Kernel.create_inplace. This triggers a tricky bug in AOIT because it generates cpp call to the functional version aoti_torch_cpu__c10d_functional_all_reduce, but later corresponding wait operation will still wait on the input to aoti_torch_cpu__c10d_functional_all_reduce instead of the output from aoti_torch_cpu__c10d_functional_all_reduce. This causes unwaited tensor leading to memory leak. 2. Since AOTI generates the inplace version aoti_torch_cpu__c10d_functional_all_reduce_ now. The return tensor from aoti_torch_cpu__c10d_functional_all_reduce_ doesn't get used. It will be released when the program exists, so it's not a memory leak but it will unnecessarily hold that tensor which causes high memory water mark. This PR generates tensor delete operation right after calling aoti_torch_cpu__c10d_functional_all_reduce_. Pull Request resolved: pytorch#159818 Approved by: https://github.com/henryhu6, https://github.com/yushangdi

Update

6b6b37c

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category release notes: inductor (aoti) labels Aug 4, 2025

henryhu6 approved these changes Aug 5, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 6, 2025

pytorchmergebot added the merging label Aug 6, 2025

pytorchmergebot removed the merging label Aug 6, 2025

desertfire requested a review from yushangdi August 6, 2025 04:10

yushangdi approved these changes Aug 6, 2025

View reviewed changes

pytorchmergebot added the merging label Aug 6, 2025

pytorchmergebot closed this in 44dd368 Aug 6, 2025

pytorchmergebot added Merged and removed merging labels Aug 6, 2025

github-actions bot deleted the gh/desertfire/596/head branch September 6, 2025 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AOTI] Fix memory leak from all_reduce #159818

[AOTI] Fix memory leak from all_reduce #159818

desertfire commented Aug 4, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 4, 2025 •

edited

Loading

Uh oh!

desertfire commented Aug 6, 2025

Uh oh!

pytorchmergebot commented Aug 6, 2025

Uh oh!

desertfire commented Aug 6, 2025

Uh oh!

pytorchmergebot commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[AOTI] Fix memory leak from all_reduce #159818

[AOTI] Fix memory leak from all_reduce #159818

Conversation

desertfire commented Aug 4, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159818

❗ 1 Active SEVs

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

desertfire commented Aug 6, 2025

Uh oh!

pytorchmergebot commented Aug 6, 2025

Merge failed

Uh oh!

desertfire commented Aug 6, 2025

Uh oh!

pytorchmergebot commented Aug 6, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

desertfire commented Aug 4, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 4, 2025 •

edited

Loading