KEMBAR78

Fix unbacked symint and memory leak in inductor memory planning by yushangdi · Pull Request #159839 · pytorch/pytorch · GitHub

Fix unbacked symint and memory leak in inductor memory planning #159839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

yushangdi wants to merge 1 commit into pytorch:main from yushangdi:export-D79603119

Contributor

yushangdi commented Aug 5, 2025 •

edited

Loading

Summary:

In memory planning, some allocation sizes involve unbacked symints. These unbacked symints are not known before they are computed in run time, so allocation pools that involve unbacked symints cannot be allocated until we have the values of the unbacked symints .

So we add a notion of earliest_available to Allocation nodes. If an allocation node has unbacked symint, it is available at only when its live range begin.

Then in AllocationPool, if a pool involves an Allocation node that has an earliest available time, we restrict its life range.

If a block's earliest available time is later than a pool's life range's start time, we cannot allocate it from the pool.

We also fix a memory leak that's caused by allocating tensor without wrapping it with RAIIAtenTensor.

In python wrapper for JIT inductor, codegen_alloc_from_pool doesn't actually write the alloc lines to wrapper, it just returns the string to alloc. However, in cpp_wrapper, codegen_alloc_from_pool actually write to the wrapper. Specifically, it writes the following and returns string RAIIAtenTensorHandle.

AtenTensorHandle handle_name;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__alloc_from_pool(....);

This is bug prune. If you write aoti_torch__alloc_from_pool lines, you must write the RAIIAtenTensorHandle as well, otherwise you get memory leaks.

We remove the alloc_from_pool call from codegen_create, because this doesn't work for AOTI. In python wrapper, we can generate the same alloc_from_pool variable name for the same block, but cpp_wrapper will generate a different variable name for each call to alloc_from_pool.

Test Plan:

 python test/inductor/test_memory_planning.py

Rollback Plan:

Differential Revision: D79603119

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

pytorch-bot bot commented Aug 5, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159839

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit cc9d1ff with merge base 86eb65f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot bot added ciflow/inductor module: inductor labels

Contributor

facebook-github-bot commented Aug 5, 2025

This pull request was exported from Phabricator. Differential Revision: D79603119

facebook-github-bot added the fb-exported label

yushangdi requested review from desertfire and jansel

August 5, 2025 04:33

yushangdi added the topic: not user facing label

yushangdi marked this pull request as draft

August 5, 2025 04:35

yushangdi removed request for desertfire and jansel

August 5, 2025 04:35

yushangdi force-pushed the export-D79603119 branch from 659bce3 to 3af7ffc Compare

August 8, 2025 16:41

pytorch-bot bot added the release notes: inductor (aoti) label

pytorch-bot bot pushed a commit that referenced this pull request


          Fix unbacked symint in inductor memory planning (#159839)

3af7ffc

Summary:

In memory planning, some allocation sizes involve unbacked symints. These
unbacked symints are not known before they are computed in run time, so allocation pools that involve unbacked symints cannot be allocated until we have the values of the unbacked symints.

So we add a notion of `earliest_available` to Allocation nodes. If an allocation node has unbacked symint, it is available at only when its live range begin.

Then in AllocationPool, if a pool involves an Allocation node that has an earliest available time, we restrict its life range.

If a block's earliest available time is later than a pool's life range's start time, we cannot allocate it from the pool.

We also fix a memory leak that's caused by allocating tensor without wrapping it with RAIIAtenTensor.

In python wrapper for JIT inductor, `codegen_alloc_from_pool` doesn't actually write the alloc lines to wrapper, it just returns the string to alloc. However, in cpp_wrapper, `codegen_alloc_from_pool` actually write to the wrapper. Specifically, it writes the following and returns string `RAIIAtenTensorHandle`.

```
AtenTensorHandle handle_name;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__alloc_from_pool(....);
```

This is bug prune. If you write aoti_torch__alloc_from_pool lines, you must write the RAIIAtenTensorHandle as well, otherwise you get memory leaks.

We remove the alloc_from_pool call from codegen_create, because this doesn't work for AOTI. In python wrapper, we can generate the same alloc_from_pool variable name for the same block, but cpp_wrapper will generate a different variable name for each call to alloc_from_pool.

Test Plan:
```
python test/inductor/test_memory_planning.py
```

Rollback Plan:

Differential Revision: D79603119

Contributor

facebook-github-bot commented Aug 8, 2025

This pull request was exported from Phabricator. Differential Revision: D79603119

yushangdi marked this pull request as ready for review

August 8, 2025 16:45

yushangdi requested review from desertfire and jansel

August 8, 2025 16:45

yushangdi changed the title ~~Fix unbacked symint in inductor memory planning~~ Fix unbacked symint and memory leak in inductor memory planning

pytorch-bot bot added the ciflow/trunk label

yushangdi added a commit to yushangdi/pytorch that referenced this pull request


          Fix unbacked symint in inductor memory planning (pytorch#159839)

d2e8c8e

Summary:

In memory planning, some allocation sizes involve unbacked symints. These
unbacked symints are not known before they are computed in run time, so allocation pools that involve unbacked symints cannot be allocated until we have the values of the unbacked symints.

So we add a notion of `earliest_available` to Allocation nodes. If an allocation node has unbacked symint, it is available at only when its live range begin.

Then in AllocationPool, if a pool involves an Allocation node that has an earliest available time, we restrict its life range.

If a block's earliest available time is later than a pool's life range's start time, we cannot allocate it from the pool.

We also fix a memory leak that's caused by allocating tensor without wrapping it with RAIIAtenTensor.

In python wrapper for JIT inductor, `codegen_alloc_from_pool` doesn't actually write the alloc lines to wrapper, it just returns the string to alloc. However, in cpp_wrapper, `codegen_alloc_from_pool` actually write to the wrapper. Specifically, it writes the following and returns string `RAIIAtenTensorHandle`.

```
AtenTensorHandle handle_name;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__alloc_from_pool(....);
```

This is bug prune. If you write aoti_torch__alloc_from_pool lines, you must write the RAIIAtenTensorHandle as well, otherwise you get memory leaks.

We remove the alloc_from_pool call from codegen_create, because this doesn't work for AOTI. In python wrapper, we can generate the same alloc_from_pool variable name for the same block, but cpp_wrapper will generate a different variable name for each call to alloc_from_pool.

Test Plan:
```
python test/inductor/test_memory_planning.py
```

Rollback Plan:

Differential Revision: D79603119

yushangdi force-pushed the export-D79603119 branch from 3af7ffc to d2e8c8e Compare

August 8, 2025 16:48

Contributor

facebook-github-bot commented Aug 8, 2025

This pull request was exported from Phabricator. Differential Revision: D79603119

yushangdi force-pushed the export-D79603119 branch from d2e8c8e to c29411a Compare

August 8, 2025 16:59

yushangdi added a commit to yushangdi/pytorch that referenced this pull request


          Fix unbacked symint in inductor memory planning (pytorch#159839)

c29411a

Summary:

In memory planning, some allocation sizes involve unbacked symints. These
unbacked symints are not known before they are computed in run time, so allocation pools that involve unbacked symints cannot be allocated until we have the values of the unbacked symints.

So we add a notion of `earliest_available` to Allocation nodes. If an allocation node has unbacked symint, it is available at only when its live range begin.

Then in AllocationPool, if a pool involves an Allocation node that has an earliest available time, we restrict its life range.

If a block's earliest available time is later than a pool's life range's start time, we cannot allocate it from the pool.

We also fix a memory leak that's caused by allocating tensor without wrapping it with RAIIAtenTensor.

In python wrapper for JIT inductor, `codegen_alloc_from_pool` doesn't actually write the alloc lines to wrapper, it just returns the string to alloc. However, in cpp_wrapper, `codegen_alloc_from_pool` actually write to the wrapper. Specifically, it writes the following and returns string `RAIIAtenTensorHandle`.

```
AtenTensorHandle handle_name;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__alloc_from_pool(....);
```

This is bug prune. If you write aoti_torch__alloc_from_pool lines, you must write the RAIIAtenTensorHandle as well, otherwise you get memory leaks.

We remove the alloc_from_pool call from codegen_create, because this doesn't work for AOTI. In python wrapper, we can generate the same alloc_from_pool variable name for the same block, but cpp_wrapper will generate a different variable name for each call to alloc_from_pool.

Test Plan:
```
python test/inductor/test_memory_planning.py
```

Rollback Plan:

Differential Revision: D79603119

yushangdi force-pushed the export-D79603119 branch from c29411a to 1ced89c Compare

August 8, 2025 16:59

yushangdi added a commit to yushangdi/pytorch that referenced this pull request


          Fix unbacked symint in inductor memory planning (pytorch#159839)

1ced89c

Summary:

In memory planning, some allocation sizes involve unbacked symints. These
unbacked symints are not known before they are computed in run time, so allocation pools that involve unbacked symints cannot be allocated until we have the values of the unbacked symints.

So we add a notion of `earliest_available` to Allocation nodes. If an allocation node has unbacked symint, it is available at only when its live range begin.

Then in AllocationPool, if a pool involves an Allocation node that has an earliest available time, we restrict its life range.

If a block's earliest available time is later than a pool's life range's start time, we cannot allocate it from the pool.

We also fix a memory leak that's caused by allocating tensor without wrapping it with RAIIAtenTensor.

In python wrapper for JIT inductor, `codegen_alloc_from_pool` doesn't actually write the alloc lines to wrapper, it just returns the string to alloc. However, in cpp_wrapper, `codegen_alloc_from_pool` actually write to the wrapper. Specifically, it writes the following and returns string `RAIIAtenTensorHandle`.

```
AtenTensorHandle handle_name;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__alloc_from_pool(....);
```

This is bug prune. If you write aoti_torch__alloc_from_pool lines, you must write the RAIIAtenTensorHandle as well, otherwise you get memory leaks.

We remove the alloc_from_pool call from codegen_create, because this doesn't work for AOTI. In python wrapper, we can generate the same alloc_from_pool variable name for the same block, but cpp_wrapper will generate a different variable name for each call to alloc_from_pool.

Test Plan:
```
python test/inductor/test_memory_planning.py
```

Rollback Plan:

Differential Revision: D79603119

Contributor

facebook-github-bot commented Aug 8, 2025

This pull request was exported from Phabricator. Differential Revision: D79603119


          Fix unbacked symint in inductor memory planning (pytorch#159839)

cc9d1ff

Summary:
Pull Request resolved: pytorch#159839

In memory planning, some allocation sizes involve unbacked symints. These
 unbacked symints are not known before they are computed in run time, so allocation pools that involve unbacked symints cannot be allocated until we have the values of the unbacked symints.

So we add a notion of `earliest_available` to Allocation nodes. If an allocation node has unbacked symint, it is available at only when its live range begin.

Then in AllocationPool, if a pool involves an Allocation node that has an earliest available time, we restrict its life range.

If a block's earliest available time is later than a pool's life range's start time, we cannot allocate it from the pool.

We also fix a memory leak that's caused by allocating tensor without wrapping it with RAIIAtenTensor.

In python wrapper for JIT inductor, `codegen_alloc_from_pool` doesn't actually write the alloc lines to wrapper, it just returns the string to alloc. However, in cpp_wrapper, `codegen_alloc_from_pool`  actually write to the wrapper. Specifically, it writes the following and returns string `RAIIAtenTensorHandle`.

```
AtenTensorHandle handle_name;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__alloc_from_pool(....);
```

This is bug prune. If you write aoti_torch__alloc_from_pool lines, you must write the RAIIAtenTensorHandle as well, otherwise you get memory leaks.

We remove the alloc_from_pool call from codegen_create, because this doesn't work for AOTI. In python wrapper, we can generate the same alloc_from_pool variable name for the same block, but cpp_wrapper will generate a different variable name for each call to alloc_from_pool.

Test Plan:
```
 python test/inductor/test_memory_planning.py
```

Rollback Plan:

Differential Revision: D79603119

Contributor

facebook-github-bot commented Aug 8, 2025

This pull request was exported from Phabricator. Differential Revision: D79603119

yushangdi force-pushed the export-D79603119 branch from 1ced89c to cc9d1ff Compare

August 8, 2025 17:04

yushangdi requested a review from eellison

August 9, 2025 00:58

jansel approved these changes

View reviewed changes

Contributor

facebook-github-bot commented Aug 11, 2025

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Aug 11, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot closed this in

9ccd0f5

pytorchmergebot added Merged and removed merging labels

samanklesaria pushed a commit to samanklesaria/pytorch that referenced this pull request


          Fix unbacked symint and memory leak in inductor memory planning (pyto…

594e362

…rch#159839)

Summary:

In memory planning, some allocation sizes involve unbacked symints. These unbacked symints are not known before they are computed in run time, so **allocation pools that involve unbacked symints cannot be allocated until we have the values of the unbacked symints** .

So we add a notion of `earliest_available` to Allocation nodes. If an allocation node has unbacked symint, it is available at only when its live range begin.

Then in AllocationPool, if a pool involves an Allocation node that has an earliest available time, we restrict its life range.

If a block's earliest available time is later than a pool's life range's start time, we cannot allocate it from the pool.

We also fix a memory leak that's caused by allocating tensor without wrapping it with RAIIAtenTensor.

In python wrapper for JIT inductor, `codegen_alloc_from_pool` doesn't actually write the alloc lines to wrapper, it just returns the string to alloc. However, in cpp_wrapper, `codegen_alloc_from_pool`  actually write to the wrapper. Specifically, it writes the following and returns string `RAIIAtenTensorHandle`.

```
AtenTensorHandle handle_name;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__alloc_from_pool(....);
```

This is bug prune. **If you write aoti_torch__alloc_from_pool lines, you must write the RAIIAtenTensorHandle as well**, otherwise you get memory leaks.

We remove the alloc_from_pool call from codegen_create, because this doesn't work for AOTI. In python wrapper, we can generate the same alloc_from_pool variable name for the same block, but cpp_wrapper will generate a different variable name for each call to alloc_from_pool.

Test Plan:
```
 python test/inductor/test_memory_planning.py
```

Rollback Plan:

Differential Revision: D79603119

Pull Request resolved: pytorch#159839
Approved by: https://github.com/jansel

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request


          Fix unbacked symint and memory leak in inductor memory planning (pyto…

2b01e72

…rch#159839)

Summary:

In memory planning, some allocation sizes involve unbacked symints. These unbacked symints are not known before they are computed in run time, so **allocation pools that involve unbacked symints cannot be allocated until we have the values of the unbacked symints** .

So we add a notion of `earliest_available` to Allocation nodes. If an allocation node has unbacked symint, it is available at only when its live range begin.

Then in AllocationPool, if a pool involves an Allocation node that has an earliest available time, we restrict its life range.

If a block's earliest available time is later than a pool's life range's start time, we cannot allocate it from the pool.

We also fix a memory leak that's caused by allocating tensor without wrapping it with RAIIAtenTensor.

In python wrapper for JIT inductor, `codegen_alloc_from_pool` doesn't actually write the alloc lines to wrapper, it just returns the string to alloc. However, in cpp_wrapper, `codegen_alloc_from_pool`  actually write to the wrapper. Specifically, it writes the following and returns string `RAIIAtenTensorHandle`.

```
AtenTensorHandle handle_name;
AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__alloc_from_pool(....);
```

This is bug prune. **If you write aoti_torch__alloc_from_pool lines, you must write the RAIIAtenTensorHandle as well**, otherwise you get memory leaks.

We remove the alloc_from_pool call from codegen_create, because this doesn't work for AOTI. In python wrapper, we can generate the same alloc_from_pool variable name for the same block, but cpp_wrapper will generate a different variable name for each call to alloc_from_pool.

Test Plan:
```
 python test/inductor/test_memory_planning.py
```

Rollback Plan:

Differential Revision: D79603119

Pull Request resolved: pytorch#159839
Approved by: https://github.com/jansel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk fb-exported Merged module: inductor release notes: inductor (aoti) topic: not user facing