KEMBAR78
Skip last 10% allocations for leak detection by apangin · Pull Request #1299 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

apangin
Copy link
Member

@apangin apangin commented May 18, 2025

Description

For native leak profiling, ignore allocations made in the last 10% of the profiling session.

Motivation and context

#1064 introduced the feature to create a flamegraph of native memory allocations. With --leak option, only those allocations are displayed that do not have matching free calls - such allocations are considered memory leaks. However, this algorithm is biased towards allocations made closer to the end of the profiling session, because such allocations are more likely to be freed only after the session has ended.

To avoid such a bias, it is proposed to simply ignore younger allocations. For example, if a profiling session runs for 10 minutes, allocations made during the last minute won't be included in the leak flamegraph.

How has this been tested?

Created an artificial memory leak by not closing DirectoryStream and ran

jfrconv --nativemem --leak --total --lines leak.jfr

Before the change:
leak-old

After the change:
leak-new


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Baraa-Hasheesh
Copy link
Contributor

@apangin
Using a % could lead to a very large ignore window for long profiling sessions
For example if the profiling session is 1 hour => 60Minutes, we will ignore the last 6 minutes of allocations for memory leakage

At the same time using a static value like last minute could also lead to bad outcomes for short profiling sessions

I think we should have both a base static ignore window & % ignore window

We can define a base max ignore window of for example 10 seconds
And evaluate against the min value between the 10% & our base window

So for short profiling sessions we would use the % evaluation while for longer profiling sessions we would use the base max ignore window we define.

@apangin apangin changed the title Leak tail Skip last 10% allocations for leak detection May 18, 2025
@apangin
Copy link
Member Author

apangin commented May 18, 2025

if the profiling session is 1 hour => 60Minutes, we will ignore the last 6 minutes of allocations

On the other hand, profiler will collect useful allocation data for 54 minutes - that's a lot!
If there is a continuous leak, then finding a meaningful signal in 54 minutes is just as efficient as in 59 minutes, but in the latter case, the noise will be likely higher due to young allocations that have not been freed. The problem with a constant size tail is that there always be such allocations, no matter how long profiler runs, whereas with the fixed ratio, results accuracy increases with the duration of profiling session.

The approach proposed in this PR is simple and efficient. There is no point in complicating it without obvious benefits.

@fandreuz
Copy link
Contributor

@apangin As a user, I'd want to have this behavior documented, and to have the tail size configurable

@apangin
Copy link
Member Author

apangin commented May 19, 2025

@fandreuz I agree on the documentation point - will add a sentence.
As to making it configurable, I'm not sure. In my paradigm, software should just work well by default; the less things require manual configuration, the better. That said, we may add a knob in the future, if there is a demand for it.

@krk
Copy link
Contributor

krk commented May 19, 2025

There is a use case to record all allocations, in case the process is short-running or if this is executed as part of a CI, to verify lack of leaks. We could make the tail feature optiona, opt-out perhaps.

@apangin
Copy link
Member Author

apangin commented May 21, 2025

OK, made this configurable with --tail option. Updated docs.

@fandreuz
Copy link
Contributor

Looks good to me

@apangin apangin merged commit d89ab7a into master May 21, 2025
36 checks passed
@apangin apangin deleted the leak-tail branch May 21, 2025 12:28
roy-soumadipta pushed a commit to roy-soumadipta/async-profiler that referenced this pull request Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants