KEMBAR78
[MPS] Fix memory leak by malfet · Pull Request #142052 · pytorch/pytorch · GitHub
Skip to content

Conversation

@malfet
Copy link
Contributor

@malfet malfet commented Dec 4, 2024

Stack from ghstack (oldest at bottom):

NSProcessInfo was allocated inside autorelease pool, but was not added to the pool

Test plan: leaks --atExit -- ./bin/mps_test_print

Before it reported the leaks as follows

leaks Report Version: 4.0, multi-line stacks
Process 30066: 39595 nodes malloced for 5034 KB
Process 30066: 7 leaks for 448 total leaked bytes.

STACK OF 1 INSTANCE OF 'ROOT LEAK: <NSProcessInfo>':
29  dyld                                  0x197a94274 start + 2840
28  mps_test_print                        0x10224440c main + 68
27  mps_test_print                        0x1022733e4 testing::UnitTest::Run() + 124
26  mps_test_print                        0x102273468 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80
25  mps_test_print                        0x102273cac testing::internal::UnitTestImpl::RunAllTests() + 1588
24  mps_test_print                        0x102262990 testing::TestSuite::Run() + 1032
23  mps_test_print                        0x1022616e4 testing::TestInfo::Run() + 960
22  mps_test_print                        0x1022601b8 testing::Test::Run() + 812
21  mps_test_print                        0x10226025c void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80
20  mps_test_print                        0x102240f88 MPSPrintTest_PrintFloatMatrix_Test::TestBody() + 88
19  mps_test_print                        0x1022414f4 torch::randn(c10::ArrayRef<long long>, c10::TensorOptions) + 72
18  libtorch_cpu.dylib                    0x10de1cb34 at::_ops::randn::call(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 280
17  libtorch_cpu.dylib                    0x10de1cf1c at::_ops::randn::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 152
16  libtorch_cpu.dylib                    0x10d9b1078 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 60
15  libtorch_cpu.dylib                    0x10d9b1220 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<at::Generator>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 256
14  libtorch_cpu.dylib                    0x10e0151f8 at::_ops::normal_::call(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 476
13  libtorch_cpu.dylib                    0x10f08ceac c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>)>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor&, double, double, std::__1::optional<at::Generator>>>, at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, double, double, std::__1::optional<at::Generator>) + 84
12  libtorch_cpu.dylib                    0x10f037674 at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 72
11  libtorch_cpu.dylib                    0x111d8bde8 at::native::normal_mps_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 132
10  libtorch_cpu.dylib                    0x111d8c334 at::native::mps::normal_mps_impl(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 884
9   libtorch_cpu.dylib                    0x111d8b8d8 at::Tensor& at::native::mps::random_mps_impl<double>(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, MPSGraphRandomDistribution, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, MPSGraphTensor* (at::native::mps::RandomCachedGraph*, MPSGraphTensor*) block_pointer) + 2508
8   libtorch_cpu.dylib                    0x111d453bc at::native::mps::Placeholder::Placeholder(MPSGraphTensor*, at::Tensor const&, NSArray<NSNumber*>*, bool, MPSDataType, bool) + 5120
7   libtorch_cpu.dylib                    0x111d2dbc8 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const + 404
6   libtorch_cpu.dylib                    0x111d2ddf0 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const::$_0::operator()(int, int) const + 48
5   libobjc.A.dylib                       0x197a7b3f4 objc_alloc_init + 80
4   com.apple.Foundation                  0x19995fbe4 +[NSProcessInfo alloc] + 112
3   com.apple.Foundation                  0x19995faec +[NSProcessInfo allocWithZone:] + 120
2   libobjc.A.dylib                       0x197a49ddc _objc_rootAllocWithZone + 48
1   libsystem_malloc.dylib                0x197c3baf8 _calloc + 88
0   libsystem_malloc.dylib                0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128
====
    1 (64 bytes) ROOT LEAK: <NSProcessInfo 0x102ce4de0> [64]

After test run finishes with no leaks reported

Process 29875 is not debuggable. Due to security restrictions, leaks can only show or save contents of readonly memory of restricted processes.

Process:         mps_test_print [29875]
Path:            /Users/USER/*/mps_test_print
Load Address:    0x10223c000
Identifier:      mps_test_print
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [29874]

Date/Time:       2024-12-04 07:43:15.287 -0800
Launch Time:     2024-12-04 07:43:14.400 -0800
OS Version:      macOS 15.1.1 (24B2091)
Report Version:  7
Analysis Tool:   /usr/bin/leaks

Physical footprint:         172.0M
Physical footprint (peak):  234.1M
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 29875: 39508 nodes malloced for 5021 KB
Process 29875: 0 leaks for 0 total leaked bytes.

[ghstack-poisoned]
@malfet malfet requested a review from kulinseth as a code owner December 4, 2024 15:57
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 4, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142052

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9403bd2 with merge base 61dc5e9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Dec 4, 2024
pytorchmergebot pushed a commit that referenced this pull request Dec 4, 2024
By releasing retained `id<MTLFunction>` and `id<MTLComputePipelineState>`
Please note, that `id<MTLLibrary>` associated with class are currently leaked, which is by design, all dynamic shader allocations shoudl use `DynamicMetalShaderLibrary`

Test plan: `leaks --atExit -- ./bin/mps_test_metal_library`

Before:
```
STACK OF 1 INSTANCE OF 'ROOT LEAK: <_MTLFunctionInternal>':
18  dyld                                  0x197a94274 start + 2840
17  mps_test_metal_library                0x1002cb420 main + 68
16  mps_test_metal_library                0x1002fa388 testing::UnitTest::Run() + 124
15  mps_test_metal_library                0x1002fa40c bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80
14  mps_test_metal_library                0x1002fac50 testing::internal::UnitTestImpl::RunAllTests() + 1588
13  mps_test_metal_library                0x1002e9934 testing::TestSuite::Run() + 1032
12  mps_test_metal_library                0x1002e8688 testing::TestInfo::Run() + 960
11  mps_test_metal_library                0x1002e715c testing::Test::Run() + 812
10  mps_test_metal_library                0x1002e7200 void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80
9   mps_test_metal_library                0x1002c5518 MPSTestMetalLibrary_ArangeShader_Test::TestBody() + 420
8   libtorch_cpu.dylib                    0x10fdd3804 at::native::mps::MetalShaderLibrary::getKernelFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 56
7   libtorch_cpu.dylib                    0x10fdd3394 at::native::mps::MetalShaderLibrary::getLibraryPipelineState(id<MTLLibrary>, std::__1::basic_string<char, id<MTLLibrary>::char_traits<char>, id<MTLLibrary>::allocator<char>> const&) + 268
6   com.apple.Metal                       0x1a2be43b4 -[_MTLLibrary newFunctionWithName:] + 28
5   com.apple.Metal                       0x1a2be4498 -[_MTLLibrary newFunctionWithNameInternal:] + 148
4   com.apple.Metal                       0x1a2be4580 MTLLibraryContainer::functionWithName(NSString*, id<MTLDevice>) + 68
3   com.apple.Metal                       0x1a2be4724 MTLLibraryDataWithArchive::newFunction(NSString*, id<MTLDevice>) + 368
2   libobjc.A.dylib                       0x197a49ddc _objc_rootAllocWithZone + 48
1   libsystem_malloc.dylib                0x197c3baf8 _calloc + 88
0   libsystem_malloc.dylib                0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128
====
    2 (592 bytes) ROOT LEAK: <_MTLFunctionInternal 0x1325e5550> [448]
       1 (144 bytes) _functionQueue --> <dispatch_queue_t (serial) 0x13254c340> [144]  "function queue" (from Metal)
```
After:
```
Process:         mps_test_metal_library [30687]
Path:            /Users/USER/*/mps_test_metal_library
Load Address:    0x100f74000
Identifier:      mps_test_metal_library
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [30686]

Date/Time:       2024-12-04 07:57:01.020 -0800
Launch Time:     2024-12-04 07:56:59.030 -0800
OS Version:      macOS 15.1.1 (24B2091)
Report Version:  7
Analysis Tool:   /usr/bin/leaks

Physical footprint:         177.2M
Physical footprint (peak):  236.5M
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 30687: 40691 nodes malloced for 5575 KB
Process 30687: 0 leaks for 0 total leaked bytes.
```
Pull Request resolved: #142053
Approved by: https://github.com/manuelcandales
ghstack dependencies: #142052
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
`NSProcessInfo` was allocated inside autorelease pool, but was not added to the pool

Test plan: `leaks --atExit -- ./bin/mps_test_print`

Before it reported the leaks as follows
```
leaks Report Version: 4.0, multi-line stacks
Process 30066: 39595 nodes malloced for 5034 KB
Process 30066: 7 leaks for 448 total leaked bytes.

STACK OF 1 INSTANCE OF 'ROOT LEAK: <NSProcessInfo>':
29  dyld                                  0x197a94274 start + 2840
28  mps_test_print                        0x10224440c main + 68
27  mps_test_print                        0x1022733e4 testing::UnitTest::Run() + 124
26  mps_test_print                        0x102273468 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80
25  mps_test_print                        0x102273cac testing::internal::UnitTestImpl::RunAllTests() + 1588
24  mps_test_print                        0x102262990 testing::TestSuite::Run() + 1032
23  mps_test_print                        0x1022616e4 testing::TestInfo::Run() + 960
22  mps_test_print                        0x1022601b8 testing::Test::Run() + 812
21  mps_test_print                        0x10226025c void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80
20  mps_test_print                        0x102240f88 MPSPrintTest_PrintFloatMatrix_Test::TestBody() + 88
19  mps_test_print                        0x1022414f4 torch::randn(c10::ArrayRef<long long>, c10::TensorOptions) + 72
18  libtorch_cpu.dylib                    0x10de1cb34 at::_ops::randn::call(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 280
17  libtorch_cpu.dylib                    0x10de1cf1c at::_ops::randn::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 152
16  libtorch_cpu.dylib                    0x10d9b1078 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 60
15  libtorch_cpu.dylib                    0x10d9b1220 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<at::Generator>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 256
14  libtorch_cpu.dylib                    0x10e0151f8 at::_ops::normal_::call(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 476
13  libtorch_cpu.dylib                    0x10f08ceac c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>)>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor&, double, double, std::__1::optional<at::Generator>>>, at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, double, double, std::__1::optional<at::Generator>) + 84
12  libtorch_cpu.dylib                    0x10f037674 at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 72
11  libtorch_cpu.dylib                    0x111d8bde8 at::native::normal_mps_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 132
10  libtorch_cpu.dylib                    0x111d8c334 at::native::mps::normal_mps_impl(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 884
9   libtorch_cpu.dylib                    0x111d8b8d8 at::Tensor& at::native::mps::random_mps_impl<double>(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, MPSGraphRandomDistribution, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, MPSGraphTensor* (at::native::mps::RandomCachedGraph*, MPSGraphTensor*) block_pointer) + 2508
8   libtorch_cpu.dylib                    0x111d453bc at::native::mps::Placeholder::Placeholder(MPSGraphTensor*, at::Tensor const&, NSArray<NSNumber*>*, bool, MPSDataType, bool) + 5120
7   libtorch_cpu.dylib                    0x111d2dbc8 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const + 404
6   libtorch_cpu.dylib                    0x111d2ddf0 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const::$_0::operator()(int, int) const + 48
5   libobjc.A.dylib                       0x197a7b3f4 objc_alloc_init + 80
4   com.apple.Foundation                  0x19995fbe4 +[NSProcessInfo alloc] + 112
3   com.apple.Foundation                  0x19995faec +[NSProcessInfo allocWithZone:] + 120
2   libobjc.A.dylib                       0x197a49ddc _objc_rootAllocWithZone + 48
1   libsystem_malloc.dylib                0x197c3baf8 _calloc + 88
0   libsystem_malloc.dylib                0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128
====
    1 (64 bytes) ROOT LEAK: <NSProcessInfo 0x102ce4de0> [64]
```
After test run finishes with no leaks reported
```
Process 29875 is not debuggable. Due to security restrictions, leaks can only show or save contents of readonly memory of restricted processes.

Process:         mps_test_print [29875]
Path:            /Users/USER/*/mps_test_print
Load Address:    0x10223c000
Identifier:      mps_test_print
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [29874]

Date/Time:       2024-12-04 07:43:15.287 -0800
Launch Time:     2024-12-04 07:43:14.400 -0800
OS Version:      macOS 15.1.1 (24B2091)
Report Version:  7
Analysis Tool:   /usr/bin/leaks

Physical footprint:         172.0M
Physical footprint (peak):  234.1M
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 29875: 39508 nodes malloced for 5021 KB
Process 29875: 0 leaks for 0 total leaked bytes.
```
Pull Request resolved: pytorch#142052
Approved by: https://github.com/manuelcandales
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
By releasing retained `id<MTLFunction>` and `id<MTLComputePipelineState>`
Please note, that `id<MTLLibrary>` associated with class are currently leaked, which is by design, all dynamic shader allocations shoudl use `DynamicMetalShaderLibrary`

Test plan: `leaks --atExit -- ./bin/mps_test_metal_library`

Before:
```
STACK OF 1 INSTANCE OF 'ROOT LEAK: <_MTLFunctionInternal>':
18  dyld                                  0x197a94274 start + 2840
17  mps_test_metal_library                0x1002cb420 main + 68
16  mps_test_metal_library                0x1002fa388 testing::UnitTest::Run() + 124
15  mps_test_metal_library                0x1002fa40c bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80
14  mps_test_metal_library                0x1002fac50 testing::internal::UnitTestImpl::RunAllTests() + 1588
13  mps_test_metal_library                0x1002e9934 testing::TestSuite::Run() + 1032
12  mps_test_metal_library                0x1002e8688 testing::TestInfo::Run() + 960
11  mps_test_metal_library                0x1002e715c testing::Test::Run() + 812
10  mps_test_metal_library                0x1002e7200 void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80
9   mps_test_metal_library                0x1002c5518 MPSTestMetalLibrary_ArangeShader_Test::TestBody() + 420
8   libtorch_cpu.dylib                    0x10fdd3804 at::native::mps::MetalShaderLibrary::getKernelFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 56
7   libtorch_cpu.dylib                    0x10fdd3394 at::native::mps::MetalShaderLibrary::getLibraryPipelineState(id<MTLLibrary>, std::__1::basic_string<char, id<MTLLibrary>::char_traits<char>, id<MTLLibrary>::allocator<char>> const&) + 268
6   com.apple.Metal                       0x1a2be43b4 -[_MTLLibrary newFunctionWithName:] + 28
5   com.apple.Metal                       0x1a2be4498 -[_MTLLibrary newFunctionWithNameInternal:] + 148
4   com.apple.Metal                       0x1a2be4580 MTLLibraryContainer::functionWithName(NSString*, id<MTLDevice>) + 68
3   com.apple.Metal                       0x1a2be4724 MTLLibraryDataWithArchive::newFunction(NSString*, id<MTLDevice>) + 368
2   libobjc.A.dylib                       0x197a49ddc _objc_rootAllocWithZone + 48
1   libsystem_malloc.dylib                0x197c3baf8 _calloc + 88
0   libsystem_malloc.dylib                0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128
====
    2 (592 bytes) ROOT LEAK: <_MTLFunctionInternal 0x1325e5550> [448]
       1 (144 bytes) _functionQueue --> <dispatch_queue_t (serial) 0x13254c340> [144]  "function queue" (from Metal)
```
After:
```
Process:         mps_test_metal_library [30687]
Path:            /Users/USER/*/mps_test_metal_library
Load Address:    0x100f74000
Identifier:      mps_test_metal_library
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [30686]

Date/Time:       2024-12-04 07:57:01.020 -0800
Launch Time:     2024-12-04 07:56:59.030 -0800
OS Version:      macOS 15.1.1 (24B2091)
Report Version:  7
Analysis Tool:   /usr/bin/leaks

Physical footprint:         177.2M
Physical footprint (peak):  236.5M
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 30687: 40691 nodes malloced for 5575 KB
Process 30687: 0 leaks for 0 total leaked bytes.
```
Pull Request resolved: pytorch#142053
Approved by: https://github.com/manuelcandales
ghstack dependencies: pytorch#142052
AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
`NSProcessInfo` was allocated inside autorelease pool, but was not added to the pool

Test plan: `leaks --atExit -- ./bin/mps_test_print`

Before it reported the leaks as follows
```
leaks Report Version: 4.0, multi-line stacks
Process 30066: 39595 nodes malloced for 5034 KB
Process 30066: 7 leaks for 448 total leaked bytes.

STACK OF 1 INSTANCE OF 'ROOT LEAK: <NSProcessInfo>':
29  dyld                                  0x197a94274 start + 2840
28  mps_test_print                        0x10224440c main + 68
27  mps_test_print                        0x1022733e4 testing::UnitTest::Run() + 124
26  mps_test_print                        0x102273468 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80
25  mps_test_print                        0x102273cac testing::internal::UnitTestImpl::RunAllTests() + 1588
24  mps_test_print                        0x102262990 testing::TestSuite::Run() + 1032
23  mps_test_print                        0x1022616e4 testing::TestInfo::Run() + 960
22  mps_test_print                        0x1022601b8 testing::Test::Run() + 812
21  mps_test_print                        0x10226025c void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80
20  mps_test_print                        0x102240f88 MPSPrintTest_PrintFloatMatrix_Test::TestBody() + 88
19  mps_test_print                        0x1022414f4 torch::randn(c10::ArrayRef<long long>, c10::TensorOptions) + 72
18  libtorch_cpu.dylib                    0x10de1cb34 at::_ops::randn::call(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 280
17  libtorch_cpu.dylib                    0x10de1cf1c at::_ops::randn::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 152
16  libtorch_cpu.dylib                    0x10d9b1078 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 60
15  libtorch_cpu.dylib                    0x10d9b1220 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<at::Generator>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 256
14  libtorch_cpu.dylib                    0x10e0151f8 at::_ops::normal_::call(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 476
13  libtorch_cpu.dylib                    0x10f08ceac c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>)>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor&, double, double, std::__1::optional<at::Generator>>>, at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, double, double, std::__1::optional<at::Generator>) + 84
12  libtorch_cpu.dylib                    0x10f037674 at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 72
11  libtorch_cpu.dylib                    0x111d8bde8 at::native::normal_mps_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 132
10  libtorch_cpu.dylib                    0x111d8c334 at::native::mps::normal_mps_impl(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 884
9   libtorch_cpu.dylib                    0x111d8b8d8 at::Tensor& at::native::mps::random_mps_impl<double>(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, MPSGraphRandomDistribution, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, MPSGraphTensor* (at::native::mps::RandomCachedGraph*, MPSGraphTensor*) block_pointer) + 2508
8   libtorch_cpu.dylib                    0x111d453bc at::native::mps::Placeholder::Placeholder(MPSGraphTensor*, at::Tensor const&, NSArray<NSNumber*>*, bool, MPSDataType, bool) + 5120
7   libtorch_cpu.dylib                    0x111d2dbc8 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const + 404
6   libtorch_cpu.dylib                    0x111d2ddf0 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const::$_0::operator()(int, int) const + 48
5   libobjc.A.dylib                       0x197a7b3f4 objc_alloc_init + 80
4   com.apple.Foundation                  0x19995fbe4 +[NSProcessInfo alloc] + 112
3   com.apple.Foundation                  0x19995faec +[NSProcessInfo allocWithZone:] + 120
2   libobjc.A.dylib                       0x197a49ddc _objc_rootAllocWithZone + 48
1   libsystem_malloc.dylib                0x197c3baf8 _calloc + 88
0   libsystem_malloc.dylib                0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128
====
    1 (64 bytes) ROOT LEAK: <NSProcessInfo 0x102ce4de0> [64]
```
After test run finishes with no leaks reported
```
Process 29875 is not debuggable. Due to security restrictions, leaks can only show or save contents of readonly memory of restricted processes.

Process:         mps_test_print [29875]
Path:            /Users/USER/*/mps_test_print
Load Address:    0x10223c000
Identifier:      mps_test_print
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [29874]

Date/Time:       2024-12-04 07:43:15.287 -0800
Launch Time:     2024-12-04 07:43:14.400 -0800
OS Version:      macOS 15.1.1 (24B2091)
Report Version:  7
Analysis Tool:   /usr/bin/leaks

Physical footprint:         172.0M
Physical footprint (peak):  234.1M
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 29875: 39508 nodes malloced for 5021 KB
Process 29875: 0 leaks for 0 total leaked bytes.
```
Pull Request resolved: pytorch#142052
Approved by: https://github.com/manuelcandales
AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
By releasing retained `id<MTLFunction>` and `id<MTLComputePipelineState>`
Please note, that `id<MTLLibrary>` associated with class are currently leaked, which is by design, all dynamic shader allocations shoudl use `DynamicMetalShaderLibrary`

Test plan: `leaks --atExit -- ./bin/mps_test_metal_library`

Before:
```
STACK OF 1 INSTANCE OF 'ROOT LEAK: <_MTLFunctionInternal>':
18  dyld                                  0x197a94274 start + 2840
17  mps_test_metal_library                0x1002cb420 main + 68
16  mps_test_metal_library                0x1002fa388 testing::UnitTest::Run() + 124
15  mps_test_metal_library                0x1002fa40c bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80
14  mps_test_metal_library                0x1002fac50 testing::internal::UnitTestImpl::RunAllTests() + 1588
13  mps_test_metal_library                0x1002e9934 testing::TestSuite::Run() + 1032
12  mps_test_metal_library                0x1002e8688 testing::TestInfo::Run() + 960
11  mps_test_metal_library                0x1002e715c testing::Test::Run() + 812
10  mps_test_metal_library                0x1002e7200 void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80
9   mps_test_metal_library                0x1002c5518 MPSTestMetalLibrary_ArangeShader_Test::TestBody() + 420
8   libtorch_cpu.dylib                    0x10fdd3804 at::native::mps::MetalShaderLibrary::getKernelFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 56
7   libtorch_cpu.dylib                    0x10fdd3394 at::native::mps::MetalShaderLibrary::getLibraryPipelineState(id<MTLLibrary>, std::__1::basic_string<char, id<MTLLibrary>::char_traits<char>, id<MTLLibrary>::allocator<char>> const&) + 268
6   com.apple.Metal                       0x1a2be43b4 -[_MTLLibrary newFunctionWithName:] + 28
5   com.apple.Metal                       0x1a2be4498 -[_MTLLibrary newFunctionWithNameInternal:] + 148
4   com.apple.Metal                       0x1a2be4580 MTLLibraryContainer::functionWithName(NSString*, id<MTLDevice>) + 68
3   com.apple.Metal                       0x1a2be4724 MTLLibraryDataWithArchive::newFunction(NSString*, id<MTLDevice>) + 368
2   libobjc.A.dylib                       0x197a49ddc _objc_rootAllocWithZone + 48
1   libsystem_malloc.dylib                0x197c3baf8 _calloc + 88
0   libsystem_malloc.dylib                0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128
====
    2 (592 bytes) ROOT LEAK: <_MTLFunctionInternal 0x1325e5550> [448]
       1 (144 bytes) _functionQueue --> <dispatch_queue_t (serial) 0x13254c340> [144]  "function queue" (from Metal)
```
After:
```
Process:         mps_test_metal_library [30687]
Path:            /Users/USER/*/mps_test_metal_library
Load Address:    0x100f74000
Identifier:      mps_test_metal_library
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [30686]

Date/Time:       2024-12-04 07:57:01.020 -0800
Launch Time:     2024-12-04 07:56:59.030 -0800
OS Version:      macOS 15.1.1 (24B2091)
Report Version:  7
Analysis Tool:   /usr/bin/leaks

Physical footprint:         177.2M
Physical footprint (peak):  236.5M
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 30687: 40691 nodes malloced for 5575 KB
Process 30687: 0 leaks for 0 total leaked bytes.
```
Pull Request resolved: pytorch#142053
Approved by: https://github.com/manuelcandales
ghstack dependencies: pytorch#142052
@github-actions github-actions bot deleted the gh/malfet/74/head branch January 4, 2025 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) Merged release notes: mps Release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants