KEMBAR78
Per-thread flamegraph option in JFR heatmap converter by fandreuz · Pull Request #1414 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

fandreuz
Copy link
Contributor

Description

In this PR I introduce a new feature in the JFR heatmap converter to have an artificial frame at the base of the flamegraph containing the thread name where the stacktrace happened.

Related issues

#1316

Motivation and context

Motivation in the original issue.

How has this been tested?

Manual testing.

image

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

}

private Integer getMethodIndex(MethodKey key) {
return methodCache.computeIfAbsent(key, this::makeMethod);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Now comes the trick: instead of this::makeMethod pass a lambda/method reference cached in final Function<MethodKey, Integer> field, and you'll get 5% performance improvement out of thin air.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you can replace computeIfAbsent with get+put pair without lambda and get the same effect.

Copy link
Contributor Author

@fandreuz fandreuz Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cb506dd

I had a quick look at the flamegraph and it looks a tiny bit faster, but why can't the JVM make this optimization on its own? I don't see the difference between a final Function<MethodKey, Integer> and the method reference. Is it just the additional reference to the class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why can't the JVM make this optimization on its own

Function object created by this::someMethod depends on the instance. JVM does do per-instance caching. I can understand why: let's say there are 10 method references in the code - this would require JVM to add 10 extra synthetic fields, which may or may not be used, for every instance of an otherwise small object.

So, it's up to developer to decide whether to cache non-static lambdas or not.

@fandreuz fandreuz force-pushed the per-thread-flamegraph branch from c4569a0 to cb506dd Compare August 12, 2025 08:10
@fandreuz fandreuz force-pushed the per-thread-flamegraph branch from cb506dd to 041cfb7 Compare August 12, 2025 11:03
stackTracesCache.put(id, stackTracesRemap.index(cachedStackTrace, size));
}

private Integer getMethodIndex(MethodKey key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Integer? Return value is always used as int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return methodIdx;
}

private Integer makeMethod(MethodKey key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int. Or just inline this function in the above method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private final MethodKeyType keyType;

public MethodKey(MethodKeyType keyType, long methodId, int location, byte type, boolean firstInStack) {
if (type < 0) throw new IllegalArgumentException("Unexpected type: " + type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary limitation. Just replace (long) type with (type & 0xffL).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

public void beforeChunk() {
state.methodsCache.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this corrupted frames in multi-chunk recordings.
Cleaning method cache between chunks is essential.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you find this out? I converted a 4gb JFR and the output looked ok. I'd assume it contains more than one chunk, will check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works like a charm now, thanks!

@fandreuz fandreuz force-pushed the per-thread-flamegraph branch from 0b9e7a4 to 568425b Compare August 13, 2025 07:51
Comment on lines 55 to 56
jfr.stackTraces.forEach(new Dictionary.Visitor<StackTrace>() {
@Override
public void visit(long key, StackTrace trace) {
heatmap.beforeChunk();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for each stacktrace??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb mistake, sorry. 9d970af

fandreuz and others added 2 commits August 13, 2025 13:07
Signed-off-by: Andrei Pangin <1749416+apangin@users.noreply.github.com>
@apangin apangin merged commit 89ead82 into async-profiler:master Aug 13, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants