[MPS] Fix memory leak #142052

malfet · 2024-12-04T15:57:27Z

Stack from ghstack (oldest at bottom):

NSProcessInfo was allocated inside autorelease pool, but was not added to the pool

Test plan: leaks --atExit -- ./bin/mps_test_print

Before it reported the leaks as follows

leaks Report Version: 4.0, multi-line stacks
Process 30066: 39595 nodes malloced for 5034 KB
Process 30066: 7 leaks for 448 total leaked bytes.

STACK OF 1 INSTANCE OF 'ROOT LEAK: <NSProcessInfo>':
29  dyld                                  0x197a94274 start + 2840
28  mps_test_print                        0x10224440c main + 68
27  mps_test_print                        0x1022733e4 testing::UnitTest::Run() + 124
26  mps_test_print                        0x102273468 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80
25  mps_test_print                        0x102273cac testing::internal::UnitTestImpl::RunAllTests() + 1588
24  mps_test_print                        0x102262990 testing::TestSuite::Run() + 1032
23  mps_test_print                        0x1022616e4 testing::TestInfo::Run() + 960
22  mps_test_print                        0x1022601b8 testing::Test::Run() + 812
21  mps_test_print                        0x10226025c void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80
20  mps_test_print                        0x102240f88 MPSPrintTest_PrintFloatMatrix_Test::TestBody() + 88
19  mps_test_print                        0x1022414f4 torch::randn(c10::ArrayRef<long long>, c10::TensorOptions) + 72
18  libtorch_cpu.dylib                    0x10de1cb34 at::_ops::randn::call(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 280
17  libtorch_cpu.dylib                    0x10de1cf1c at::_ops::randn::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 152
16  libtorch_cpu.dylib                    0x10d9b1078 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 60
15  libtorch_cpu.dylib                    0x10d9b1220 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<at::Generator>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 256
14  libtorch_cpu.dylib                    0x10e0151f8 at::_ops::normal_::call(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 476
13  libtorch_cpu.dylib                    0x10f08ceac c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>)>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor&, double, double, std::__1::optional<at::Generator>>>, at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, double, double, std::__1::optional<at::Generator>) + 84
12  libtorch_cpu.dylib                    0x10f037674 at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 72
11  libtorch_cpu.dylib                    0x111d8bde8 at::native::normal_mps_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 132
10  libtorch_cpu.dylib                    0x111d8c334 at::native::mps::normal_mps_impl(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 884
9   libtorch_cpu.dylib                    0x111d8b8d8 at::Tensor& at::native::mps::random_mps_impl<double>(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, MPSGraphRandomDistribution, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, MPSGraphTensor* (at::native::mps::RandomCachedGraph*, MPSGraphTensor*) block_pointer) + 2508
8   libtorch_cpu.dylib                    0x111d453bc at::native::mps::Placeholder::Placeholder(MPSGraphTensor*, at::Tensor const&, NSArray<NSNumber*>*, bool, MPSDataType, bool) + 5120
7   libtorch_cpu.dylib                    0x111d2dbc8 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const + 404
6   libtorch_cpu.dylib                    0x111d2ddf0 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const::$_0::operator()(int, int) const + 48
5   libobjc.A.dylib                       0x197a7b3f4 objc_alloc_init + 80
4   com.apple.Foundation                  0x19995fbe4 +[NSProcessInfo alloc] + 112
3   com.apple.Foundation                  0x19995faec +[NSProcessInfo allocWithZone:] + 120
2   libobjc.A.dylib                       0x197a49ddc _objc_rootAllocWithZone + 48
1   libsystem_malloc.dylib                0x197c3baf8 _calloc + 88
0   libsystem_malloc.dylib                0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128
====
    1 (64 bytes) ROOT LEAK: <NSProcessInfo 0x102ce4de0> [64]

After test run finishes with no leaks reported

Process 29875 is not debuggable. Due to security restrictions, leaks can only show or save contents of readonly memory of restricted processes.

Process:         mps_test_print [29875]
Path:            /Users/USER/*/mps_test_print
Load Address:    0x10223c000
Identifier:      mps_test_print
Version:         0
Code Type:       ARM64
Platform:        macOS
Parent Process:  leaks [29874]

Date/Time:       2024-12-04 07:43:15.287 -0800
Launch Time:     2024-12-04 07:43:14.400 -0800
OS Version:      macOS 15.1.1 (24B2091)
Report Version:  7
Analysis Tool:   /usr/bin/leaks

Physical footprint:         172.0M
Physical footprint (peak):  234.1M
Idle exit:                  untracked
----

leaks Report Version: 4.0, multi-line stacks
Process 29875: 39508 nodes malloced for 5021 KB
Process 29875: 0 leaks for 0 total leaked bytes.

[ghstack-poisoned]

pytorch-bot · 2024-12-04T15:57:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142052

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9403bd2 with merge base 61dc5e9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

By releasing retained `id<MTLFunction>` and `id<MTLComputePipelineState>` Please note, that `id<MTLLibrary>` associated with class are currently leaked, which is by design, all dynamic shader allocations shoudl use `DynamicMetalShaderLibrary` Test plan: `leaks --atExit -- ./bin/mps_test_metal_library` Before: ``` STACK OF 1 INSTANCE OF 'ROOT LEAK: <_MTLFunctionInternal>': 18 dyld 0x197a94274 start + 2840 17 mps_test_metal_library 0x1002cb420 main + 68 16 mps_test_metal_library 0x1002fa388 testing::UnitTest::Run() + 124 15 mps_test_metal_library 0x1002fa40c bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80 14 mps_test_metal_library 0x1002fac50 testing::internal::UnitTestImpl::RunAllTests() + 1588 13 mps_test_metal_library 0x1002e9934 testing::TestSuite::Run() + 1032 12 mps_test_metal_library 0x1002e8688 testing::TestInfo::Run() + 960 11 mps_test_metal_library 0x1002e715c testing::Test::Run() + 812 10 mps_test_metal_library 0x1002e7200 void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80 9 mps_test_metal_library 0x1002c5518 MPSTestMetalLibrary_ArangeShader_Test::TestBody() + 420 8 libtorch_cpu.dylib 0x10fdd3804 at::native::mps::MetalShaderLibrary::getKernelFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 56 7 libtorch_cpu.dylib 0x10fdd3394 at::native::mps::MetalShaderLibrary::getLibraryPipelineState(id<MTLLibrary>, std::__1::basic_string<char, id<MTLLibrary>::char_traits<char>, id<MTLLibrary>::allocator<char>> const&) + 268 6 com.apple.Metal 0x1a2be43b4 -[_MTLLibrary newFunctionWithName:] + 28 5 com.apple.Metal 0x1a2be4498 -[_MTLLibrary newFunctionWithNameInternal:] + 148 4 com.apple.Metal 0x1a2be4580 MTLLibraryContainer::functionWithName(NSString*, id<MTLDevice>) + 68 3 com.apple.Metal 0x1a2be4724 MTLLibraryDataWithArchive::newFunction(NSString*, id<MTLDevice>) + 368 2 libobjc.A.dylib 0x197a49ddc _objc_rootAllocWithZone + 48 1 libsystem_malloc.dylib 0x197c3baf8 _calloc + 88 0 libsystem_malloc.dylib 0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128 ==== 2 (592 bytes) ROOT LEAK: <_MTLFunctionInternal 0x1325e5550> [448] 1 (144 bytes) _functionQueue --> <dispatch_queue_t (serial) 0x13254c340> [144] "function queue" (from Metal) ``` After: ``` Process: mps_test_metal_library [30687] Path: /Users/USER/*/mps_test_metal_library Load Address: 0x100f74000 Identifier: mps_test_metal_library Version: 0 Code Type: ARM64 Platform: macOS Parent Process: leaks [30686] Date/Time: 2024-12-04 07:57:01.020 -0800 Launch Time: 2024-12-04 07:56:59.030 -0800 OS Version: macOS 15.1.1 (24B2091) Report Version: 7 Analysis Tool: /usr/bin/leaks Physical footprint: 177.2M Physical footprint (peak): 236.5M Idle exit: untracked ---- leaks Report Version: 4.0, multi-line stacks Process 30687: 40691 nodes malloced for 5575 KB Process 30687: 0 leaks for 0 total leaked bytes. ``` Pull Request resolved: #142053 Approved by: https://github.com/manuelcandales ghstack dependencies: #142052

`NSProcessInfo` was allocated inside autorelease pool, but was not added to the pool Test plan: `leaks --atExit -- ./bin/mps_test_print` Before it reported the leaks as follows ``` leaks Report Version: 4.0, multi-line stacks Process 30066: 39595 nodes malloced for 5034 KB Process 30066: 7 leaks for 448 total leaked bytes. STACK OF 1 INSTANCE OF 'ROOT LEAK: <NSProcessInfo>': 29 dyld 0x197a94274 start + 2840 28 mps_test_print 0x10224440c main + 68 27 mps_test_print 0x1022733e4 testing::UnitTest::Run() + 124 26 mps_test_print 0x102273468 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80 25 mps_test_print 0x102273cac testing::internal::UnitTestImpl::RunAllTests() + 1588 24 mps_test_print 0x102262990 testing::TestSuite::Run() + 1032 23 mps_test_print 0x1022616e4 testing::TestInfo::Run() + 960 22 mps_test_print 0x1022601b8 testing::Test::Run() + 812 21 mps_test_print 0x10226025c void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80 20 mps_test_print 0x102240f88 MPSPrintTest_PrintFloatMatrix_Test::TestBody() + 88 19 mps_test_print 0x1022414f4 torch::randn(c10::ArrayRef<long long>, c10::TensorOptions) + 72 18 libtorch_cpu.dylib 0x10de1cb34 at::_ops::randn::call(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 280 17 libtorch_cpu.dylib 0x10de1cf1c at::_ops::randn::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 152 16 libtorch_cpu.dylib 0x10d9b1078 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 60 15 libtorch_cpu.dylib 0x10d9b1220 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<at::Generator>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 256 14 libtorch_cpu.dylib 0x10e0151f8 at::_ops::normal_::call(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 476 13 libtorch_cpu.dylib 0x10f08ceac c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>)>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor&, double, double, std::__1::optional<at::Generator>>>, at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, double, double, std::__1::optional<at::Generator>) + 84 12 libtorch_cpu.dylib 0x10f037674 at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 72 11 libtorch_cpu.dylib 0x111d8bde8 at::native::normal_mps_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 132 10 libtorch_cpu.dylib 0x111d8c334 at::native::mps::normal_mps_impl(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 884 9 libtorch_cpu.dylib 0x111d8b8d8 at::Tensor& at::native::mps::random_mps_impl<double>(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, MPSGraphRandomDistribution, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, MPSGraphTensor* (at::native::mps::RandomCachedGraph*, MPSGraphTensor*) block_pointer) + 2508 8 libtorch_cpu.dylib 0x111d453bc at::native::mps::Placeholder::Placeholder(MPSGraphTensor*, at::Tensor const&, NSArray<NSNumber*>*, bool, MPSDataType, bool) + 5120 7 libtorch_cpu.dylib 0x111d2dbc8 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const + 404 6 libtorch_cpu.dylib 0x111d2ddf0 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const::$_0::operator()(int, int) const + 48 5 libobjc.A.dylib 0x197a7b3f4 objc_alloc_init + 80 4 com.apple.Foundation 0x19995fbe4 +[NSProcessInfo alloc] + 112 3 com.apple.Foundation 0x19995faec +[NSProcessInfo allocWithZone:] + 120 2 libobjc.A.dylib 0x197a49ddc _objc_rootAllocWithZone + 48 1 libsystem_malloc.dylib 0x197c3baf8 _calloc + 88 0 libsystem_malloc.dylib 0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128 ==== 1 (64 bytes) ROOT LEAK: <NSProcessInfo 0x102ce4de0> [64] ``` After test run finishes with no leaks reported ``` Process 29875 is not debuggable. Due to security restrictions, leaks can only show or save contents of readonly memory of restricted processes. Process: mps_test_print [29875] Path: /Users/USER/*/mps_test_print Load Address: 0x10223c000 Identifier: mps_test_print Version: 0 Code Type: ARM64 Platform: macOS Parent Process: leaks [29874] Date/Time: 2024-12-04 07:43:15.287 -0800 Launch Time: 2024-12-04 07:43:14.400 -0800 OS Version: macOS 15.1.1 (24B2091) Report Version: 7 Analysis Tool: /usr/bin/leaks Physical footprint: 172.0M Physical footprint (peak): 234.1M Idle exit: untracked ---- leaks Report Version: 4.0, multi-line stacks Process 29875: 39508 nodes malloced for 5021 KB Process 29875: 0 leaks for 0 total leaked bytes. ``` Pull Request resolved: pytorch#142052 Approved by: https://github.com/manuelcandales

By releasing retained `id<MTLFunction>` and `id<MTLComputePipelineState>` Please note, that `id<MTLLibrary>` associated with class are currently leaked, which is by design, all dynamic shader allocations shoudl use `DynamicMetalShaderLibrary` Test plan: `leaks --atExit -- ./bin/mps_test_metal_library` Before: ``` STACK OF 1 INSTANCE OF 'ROOT LEAK: <_MTLFunctionInternal>': 18 dyld 0x197a94274 start + 2840 17 mps_test_metal_library 0x1002cb420 main + 68 16 mps_test_metal_library 0x1002fa388 testing::UnitTest::Run() + 124 15 mps_test_metal_library 0x1002fa40c bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80 14 mps_test_metal_library 0x1002fac50 testing::internal::UnitTestImpl::RunAllTests() + 1588 13 mps_test_metal_library 0x1002e9934 testing::TestSuite::Run() + 1032 12 mps_test_metal_library 0x1002e8688 testing::TestInfo::Run() + 960 11 mps_test_metal_library 0x1002e715c testing::Test::Run() + 812 10 mps_test_metal_library 0x1002e7200 void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80 9 mps_test_metal_library 0x1002c5518 MPSTestMetalLibrary_ArangeShader_Test::TestBody() + 420 8 libtorch_cpu.dylib 0x10fdd3804 at::native::mps::MetalShaderLibrary::getKernelFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 56 7 libtorch_cpu.dylib 0x10fdd3394 at::native::mps::MetalShaderLibrary::getLibraryPipelineState(id<MTLLibrary>, std::__1::basic_string<char, id<MTLLibrary>::char_traits<char>, id<MTLLibrary>::allocator<char>> const&) + 268 6 com.apple.Metal 0x1a2be43b4 -[_MTLLibrary newFunctionWithName:] + 28 5 com.apple.Metal 0x1a2be4498 -[_MTLLibrary newFunctionWithNameInternal:] + 148 4 com.apple.Metal 0x1a2be4580 MTLLibraryContainer::functionWithName(NSString*, id<MTLDevice>) + 68 3 com.apple.Metal 0x1a2be4724 MTLLibraryDataWithArchive::newFunction(NSString*, id<MTLDevice>) + 368 2 libobjc.A.dylib 0x197a49ddc _objc_rootAllocWithZone + 48 1 libsystem_malloc.dylib 0x197c3baf8 _calloc + 88 0 libsystem_malloc.dylib 0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128 ==== 2 (592 bytes) ROOT LEAK: <_MTLFunctionInternal 0x1325e5550> [448] 1 (144 bytes) _functionQueue --> <dispatch_queue_t (serial) 0x13254c340> [144] "function queue" (from Metal) ``` After: ``` Process: mps_test_metal_library [30687] Path: /Users/USER/*/mps_test_metal_library Load Address: 0x100f74000 Identifier: mps_test_metal_library Version: 0 Code Type: ARM64 Platform: macOS Parent Process: leaks [30686] Date/Time: 2024-12-04 07:57:01.020 -0800 Launch Time: 2024-12-04 07:56:59.030 -0800 OS Version: macOS 15.1.1 (24B2091) Report Version: 7 Analysis Tool: /usr/bin/leaks Physical footprint: 177.2M Physical footprint (peak): 236.5M Idle exit: untracked ---- leaks Report Version: 4.0, multi-line stacks Process 30687: 40691 nodes malloced for 5575 KB Process 30687: 0 leaks for 0 total leaked bytes. ``` Pull Request resolved: pytorch#142053 Approved by: https://github.com/manuelcandales ghstack dependencies: pytorch#142052

`NSProcessInfo` was allocated inside autorelease pool, but was not added to the pool Test plan: `leaks --atExit -- ./bin/mps_test_print` Before it reported the leaks as follows ``` leaks Report Version: 4.0, multi-line stacks Process 30066: 39595 nodes malloced for 5034 KB Process 30066: 7 leaks for 448 total leaked bytes. STACK OF 1 INSTANCE OF 'ROOT LEAK: <NSProcessInfo>': 29 dyld 0x197a94274 start + 2840 28 mps_test_print 0x10224440c main + 68 27 mps_test_print 0x1022733e4 testing::UnitTest::Run() + 124 26 mps_test_print 0x102273468 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80 25 mps_test_print 0x102273cac testing::internal::UnitTestImpl::RunAllTests() + 1588 24 mps_test_print 0x102262990 testing::TestSuite::Run() + 1032 23 mps_test_print 0x1022616e4 testing::TestInfo::Run() + 960 22 mps_test_print 0x1022601b8 testing::Test::Run() + 812 21 mps_test_print 0x10226025c void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80 20 mps_test_print 0x102240f88 MPSPrintTest_PrintFloatMatrix_Test::TestBody() + 88 19 mps_test_print 0x1022414f4 torch::randn(c10::ArrayRef<long long>, c10::TensorOptions) + 72 18 libtorch_cpu.dylib 0x10de1cb34 at::_ops::randn::call(c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 280 17 libtorch_cpu.dylib 0x10de1cf1c at::_ops::randn::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 152 16 libtorch_cpu.dylib 0x10d9b1078 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 60 15 libtorch_cpu.dylib 0x10d9b1220 at::native::randn(c10::ArrayRef<long long>, std::__1::optional<at::Generator>, std::__1::optional<c10::ScalarType>, std::__1::optional<c10::Layout>, std::__1::optional<c10::Device>, std::__1::optional<bool>) + 256 14 libtorch_cpu.dylib 0x10e0151f8 at::_ops::normal_::call(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 476 13 libtorch_cpu.dylib 0x10f08ceac c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>)>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor&, double, double, std::__1::optional<at::Generator>>>, at::Tensor& (at::Tensor&, double, double, std::__1::optional<at::Generator>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor&, double, double, std::__1::optional<at::Generator>) + 84 12 libtorch_cpu.dylib 0x10f037674 at::(anonymous namespace)::(anonymous namespace)::wrapper_MPS__normal_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 72 11 libtorch_cpu.dylib 0x111d8bde8 at::native::normal_mps_(at::Tensor&, double, double, std::__1::optional<at::Generator>) + 132 10 libtorch_cpu.dylib 0x111d8c334 at::native::mps::normal_mps_impl(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 884 9 libtorch_cpu.dylib 0x111d8b8d8 at::Tensor& at::native::mps::random_mps_impl<double>(at::Tensor&, double, double, std::__1::optional<at::Tensor> const&, std::__1::optional<at::Tensor> const&, MPSGraphRandomDistribution, std::__1::optional<at::Generator>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, MPSGraphTensor* (at::native::mps::RandomCachedGraph*, MPSGraphTensor*) block_pointer) + 2508 8 libtorch_cpu.dylib 0x111d453bc at::native::mps::Placeholder::Placeholder(MPSGraphTensor*, at::Tensor const&, NSArray<NSNumber*>*, bool, MPSDataType, bool) + 5120 7 libtorch_cpu.dylib 0x111d2dbc8 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const + 404 6 libtorch_cpu.dylib 0x111d2ddf0 at::mps::MPSDevice::isMacOS13Plus(at::mps::MacOSVersion) const::$_0::operator()(int, int) const + 48 5 libobjc.A.dylib 0x197a7b3f4 objc_alloc_init + 80 4 com.apple.Foundation 0x19995fbe4 +[NSProcessInfo alloc] + 112 3 com.apple.Foundation 0x19995faec +[NSProcessInfo allocWithZone:] + 120 2 libobjc.A.dylib 0x197a49ddc _objc_rootAllocWithZone + 48 1 libsystem_malloc.dylib 0x197c3baf8 _calloc + 88 0 libsystem_malloc.dylib 0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128 ==== 1 (64 bytes) ROOT LEAK: <NSProcessInfo 0x102ce4de0> [64] ``` After test run finishes with no leaks reported ``` Process 29875 is not debuggable. Due to security restrictions, leaks can only show or save contents of readonly memory of restricted processes. Process: mps_test_print [29875] Path: /Users/USER/*/mps_test_print Load Address: 0x10223c000 Identifier: mps_test_print Version: 0 Code Type: ARM64 Platform: macOS Parent Process: leaks [29874] Date/Time: 2024-12-04 07:43:15.287 -0800 Launch Time: 2024-12-04 07:43:14.400 -0800 OS Version: macOS 15.1.1 (24B2091) Report Version: 7 Analysis Tool: /usr/bin/leaks Physical footprint: 172.0M Physical footprint (peak): 234.1M Idle exit: untracked ---- leaks Report Version: 4.0, multi-line stacks Process 29875: 39508 nodes malloced for 5021 KB Process 29875: 0 leaks for 0 total leaked bytes. ``` Pull Request resolved: pytorch#142052 Approved by: https://github.com/manuelcandales

By releasing retained `id<MTLFunction>` and `id<MTLComputePipelineState>` Please note, that `id<MTLLibrary>` associated with class are currently leaked, which is by design, all dynamic shader allocations shoudl use `DynamicMetalShaderLibrary` Test plan: `leaks --atExit -- ./bin/mps_test_metal_library` Before: ``` STACK OF 1 INSTANCE OF 'ROOT LEAK: <_MTLFunctionInternal>': 18 dyld 0x197a94274 start + 2840 17 mps_test_metal_library 0x1002cb420 main + 68 16 mps_test_metal_library 0x1002fa388 testing::UnitTest::Run() + 124 15 mps_test_metal_library 0x1002fa40c bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 80 14 mps_test_metal_library 0x1002fac50 testing::internal::UnitTestImpl::RunAllTests() + 1588 13 mps_test_metal_library 0x1002e9934 testing::TestSuite::Run() + 1032 12 mps_test_metal_library 0x1002e8688 testing::TestInfo::Run() + 960 11 mps_test_metal_library 0x1002e715c testing::Test::Run() + 812 10 mps_test_metal_library 0x1002e7200 void testing::internal::HandleExceptionsInMethodIfSupported<testing::TestSuite, void>(testing::TestSuite*, void (testing::TestSuite::*)(), char const*) + 80 9 mps_test_metal_library 0x1002c5518 MPSTestMetalLibrary_ArangeShader_Test::TestBody() + 420 8 libtorch_cpu.dylib 0x10fdd3804 at::native::mps::MetalShaderLibrary::getKernelFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 56 7 libtorch_cpu.dylib 0x10fdd3394 at::native::mps::MetalShaderLibrary::getLibraryPipelineState(id<MTLLibrary>, std::__1::basic_string<char, id<MTLLibrary>::char_traits<char>, id<MTLLibrary>::allocator<char>> const&) + 268 6 com.apple.Metal 0x1a2be43b4 -[_MTLLibrary newFunctionWithName:] + 28 5 com.apple.Metal 0x1a2be4498 -[_MTLLibrary newFunctionWithNameInternal:] + 148 4 com.apple.Metal 0x1a2be4580 MTLLibraryContainer::functionWithName(NSString*, id<MTLDevice>) + 68 3 com.apple.Metal 0x1a2be4724 MTLLibraryDataWithArchive::newFunction(NSString*, id<MTLDevice>) + 368 2 libobjc.A.dylib 0x197a49ddc _objc_rootAllocWithZone + 48 1 libsystem_malloc.dylib 0x197c3baf8 _calloc + 88 0 libsystem_malloc.dylib 0x197c4e9bc _malloc_zone_calloc_instrumented_or_legacy + 128 ==== 2 (592 bytes) ROOT LEAK: <_MTLFunctionInternal 0x1325e5550> [448] 1 (144 bytes) _functionQueue --> <dispatch_queue_t (serial) 0x13254c340> [144] "function queue" (from Metal) ``` After: ``` Process: mps_test_metal_library [30687] Path: /Users/USER/*/mps_test_metal_library Load Address: 0x100f74000 Identifier: mps_test_metal_library Version: 0 Code Type: ARM64 Platform: macOS Parent Process: leaks [30686] Date/Time: 2024-12-04 07:57:01.020 -0800 Launch Time: 2024-12-04 07:56:59.030 -0800 OS Version: macOS 15.1.1 (24B2091) Report Version: 7 Analysis Tool: /usr/bin/leaks Physical footprint: 177.2M Physical footprint (peak): 236.5M Idle exit: untracked ---- leaks Report Version: 4.0, multi-line stacks Process 30687: 40691 nodes malloced for 5575 KB Process 30687: 0 leaks for 0 total leaked bytes. ``` Pull Request resolved: pytorch#142053 Approved by: https://github.com/manuelcandales ghstack dependencies: pytorch#142052

Update

9403bd2

[ghstack-poisoned]

malfet requested a review from kulinseth as a code owner December 4, 2024 15:57

pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Dec 4, 2024

This was referenced Dec 4, 2024

[MPS] Release MetalShaderLibrary cached resources #142053

Closed

[MPS] Add CompileShader method #141478

Closed

malfet requested review from Skylion007 and manuelcandales December 4, 2024 15:58

manuelcandales approved these changes Dec 4, 2024

View reviewed changes

pytorchmergebot closed this in e8200a5 Dec 4, 2024

pytorchmergebot added the Merged label Dec 4, 2024

github-actions bot deleted the gh/malfet/74/head branch January 4, 2025 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MPS] Fix memory leak #142052

[MPS] Fix memory leak #142052

Uh oh!

malfet commented Dec 4, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 4, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[MPS] Fix memory leak #142052

[MPS] Fix memory leak #142052

Uh oh!

Conversation

malfet commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142052

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

malfet commented Dec 4, 2024 •

edited

Loading

pytorch-bot bot commented Dec 4, 2024 •

edited

Loading