AP can deadlock the JVM is used alongside tcmalloc

### Describe the bug

It appears that there's a deadlock that's reliably reproduced if one uses tcmalloc (i.e. via `LD_PRELOAD`) and AP together.



![Image](https://github.com/user-attachments/assets/deb65da2-3ffd-42bf-9f48-d3c67c12ce42)

## AP side

The profiler goes through the glibc’s function [dl_iterate_phdr()](https://man7.org/linux/man-pages/man3/dl_iterate_phdr.3.html) which performs a lock on glibc’s side to ensure the library is not [dlclose()](https://man7.org/linux/man-pages/man3/dlclose.3.html) at the same time we iterate through the symbols, i.e.

- locks `dl_load_write_lock`

- performs some [malloc()](https://man7.org/linux/man-pages/man3/free.3.html) in its callback, which may grow TCMalloc’s local cache and then reserve some memory from the OS, locking `tcmalloc::Static::pageheap_lock_`

## TCMalloc side

When TCMalloc actually reserves some memory from the OS:

- it locks `tcmalloc::Static::pageheap_lock_` to perform the memory allocations

- when this happens it keeps the current stack trace for the heap profiler

- this heap profiler then fetches the stack trace, calling glibc’s ELF symbols lookups and locking `dl_load_write_lock` (this can perform some mallocs but they do not re-trigger the heap profiler thanks to a [recursive call check](https://github.com/gperftools/gperftools/blob/gperftools-2.9.1/src/stacktrace_libunwind-inl.h#L61), thus tcmalloc cannot deadlock itself)

### Reproduction Steps

Using `LD_PRELOAD` with tcmalloc on x86_64, specifically in order to rely on the deadlocking `libunwind`-based stacktrace mechanism, should yield a fairly high defect rate. See https://github.com/gperftools/gperftools/wiki/gperftools'-stacktrace-capturing-methods-and-their-issues

### Async-profiler version

latest + whenever dl_iterate locks were introduced, in theory

### Environment details

x86_64 machine, `libunwind` based ST mechanism for tcmalloc, tcmalloc via `LD_PRELOAD`


Thanks to @trazfr for identifying & root-causing this internally. For our Datadog fork of AP, we are looking at writing a custom allocator for use within the callback function + eliminating the explicit `malloc` calls used by the CodeCache.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AP can deadlock the JVM is used alongside tcmalloc #1151

Describe the bug

AP side

TCMalloc side

Reproduction Steps

Async-profiler version

Environment details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AP can deadlock the JVM is used alongside tcmalloc #1151

Description

Describe the bug

AP side

TCMalloc side

Reproduction Steps

Async-profiler version

Environment details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions