Fix memory leak in `ModuleTracker` #141960

danthe3rd · 2024-12-03T12:27:12Z

Thanks @drisspg and @albanD for finding the fix
cc @pragupta

TEST PLAN

import gc
import torch
import torch.nn as nn
from torch.utils.module_tracker import ModuleTracker


class MyModel(nn.Module):
    def forward(self, x):
        return x * x

print(f"torch=={torch.__version__}")
m = MyModel()
m.cuda()
m.to(torch.bfloat16)
mt = ModuleTracker()
for i in range(1000):
    if i % 100 == 0:
        gc.collect()
        print("memory_allocated:", torch.cuda.memory_allocated())
    x = torch.randn([128, 256], device="cuda", dtype=torch.bfloat16, requires_grad=True)
    with mt:
        m(x)

pytorch-bot · 2024-12-03T12:27:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141960

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b6227e4 with merge base 78543e6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danthe3rd · 2024-12-03T12:32:31Z

@pytorchbot label "topic: bug fixes"

albanD

We could make one of the test in test/test_module_tracker.py run on cuda device and enable leak detection on it to catch this. But might be a bit too much for this PR, it sounds ok as is if you don't have time.

danthe3rd · 2024-12-03T16:15:33Z

@pytorchbot merge

pytorchmergebot · 2024-12-03T16:18:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@drisspg

Thanks @drisspg and @albanD for finding the fix **TEST PLAN** ``` import gc import torch import torch.nn as nn from torch.utils.module_tracker import ModuleTracker class MyModel(nn.Module): def forward(self, x): return x * x print(f"torch=={torch.__version__}") m = MyModel() m.cuda() m.to(torch.bfloat16) mt = ModuleTracker() for i in range(1000): if i % 100 == 0: gc.collect() print("memory_allocated:", torch.cuda.memory_allocated()) x = torch.randn([128, 256], device="cuda", dtype=torch.bfloat16, requires_grad=True) with mt: m(x) ``` Pull Request resolved: pytorch#141960 Approved by: https://github.com/albanD

@drisspg

Thanks @drisspg and @albanD for finding the fix **TEST PLAN** ``` import gc import torch import torch.nn as nn from torch.utils.module_tracker import ModuleTracker class MyModel(nn.Module): def forward(self, x): return x * x print(f"torch=={torch.__version__}") m = MyModel() m.cuda() m.to(torch.bfloat16) mt = ModuleTracker() for i in range(1000): if i % 100 == 0: gc.collect() print("memory_allocated:", torch.cuda.memory_allocated()) x = torch.randn([128, 256], device="cuda", dtype=torch.bfloat16, requires_grad=True) with mt: m(x) ``` Pull Request resolved: pytorch#141960 Approved by: https://github.com/albanD

@drisspg

Thanks @drisspg and @albanD for finding the fix **TEST PLAN** ``` import gc import torch import torch.nn as nn from torch.utils.module_tracker import ModuleTracker class MyModel(nn.Module): def forward(self, x): return x * x print(f"torch=={torch.__version__}") m = MyModel() m.cuda() m.to(torch.bfloat16) mt = ModuleTracker() for i in range(1000): if i % 100 == 0: gc.collect() print("memory_allocated:", torch.cuda.memory_allocated()) x = torch.randn([128, 256], device="cuda", dtype=torch.bfloat16, requires_grad=True) with mt: m(x) ``` Pull Request resolved: pytorch#141960 Approved by: https://github.com/albanD (cherry picked from commit 9125e91)

@drisspg

) Thanks @drisspg and @albanD for finding the fix **TEST PLAN** ``` import gc import torch import torch.nn as nn from torch.utils.module_tracker import ModuleTracker class MyModel(nn.Module): def forward(self, x): return x * x print(f"torch=={torch.__version__}") m = MyModel() m.cuda() m.to(torch.bfloat16) mt = ModuleTracker() for i in range(1000): if i % 100 == 0: gc.collect() print("memory_allocated:", torch.cuda.memory_allocated()) x = torch.randn([128, 256], device="cuda", dtype=torch.bfloat16, requires_grad=True) with mt: m(x) ``` Pull Request resolved: pytorch#141960 Approved by: https://github.com/albanD (cherry picked from commit 9125e91) Fixes #ISSUE_NUMBER Co-authored-by: dan_the_3rd <43445237+danthe3rd@users.noreply.github.com>

Fix memory leak in ModuleTracker

d20d768

danthe3rd requested a review from albanD December 3, 2024 12:27

pytorch-bot bot added the topic: bug fixes topic category label Dec 3, 2024

danthe3rd added the release notes: python_frontend python frontend release notes category label Dec 3, 2024

albanD approved these changes Dec 3, 2024

View reviewed changes

danthe3rd added 3 commits December 3, 2024 15:28

lint

aed283e

Update module_tracker.py

ea91eca

Update module_tracker.py

b6227e4

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 3, 2024

pytorchmergebot added the merging label Dec 3, 2024

pytorchmergebot added the Merged label Dec 3, 2024

pytorchmergebot closed this in 9125e91 Dec 3, 2024

pytorchmergebot removed the merging label Dec 3, 2024

pragupta mentioned this pull request Dec 4, 2024

ModuleTracker: Add explicit garbage collection #139214

Closed

github-actions bot deleted the dhaziza-mod-tracker-memleak branch January 3, 2025 02:07

pragupta mentioned this pull request Jan 16, 2025

[release/2.5] Fix memory leak in ModuleTracker (#141960) ROCm/pytorch#1841

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix memory leak in `ModuleTracker` #141960

Fix memory leak in `ModuleTracker` #141960

Uh oh!

danthe3rd commented Dec 3, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 3, 2024 •

edited

Loading

Uh oh!

danthe3rd commented Dec 3, 2024

Uh oh!

albanD left a comment

Uh oh!

danthe3rd commented Dec 3, 2024

Uh oh!

pytorchmergebot commented Dec 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix memory leak in ModuleTracker #141960

Fix memory leak in ModuleTracker #141960

Uh oh!

Conversation

danthe3rd commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141960

✅ No Failures

Uh oh!

danthe3rd commented Dec 3, 2024

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

danthe3rd commented Dec 3, 2024

Uh oh!

pytorchmergebot commented Dec 3, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix memory leak in `ModuleTracker` #141960

Fix memory leak in `ModuleTracker` #141960

danthe3rd commented Dec 3, 2024 •

edited

Loading

pytorch-bot bot commented Dec 3, 2024 •

edited

Loading