KEMBAR78
Use JavaFrameAnchor to find top Java frame with cstack=vm by apangin · Pull Request #1517 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

apangin
Copy link
Member

@apangin apangin commented Oct 1, 2025

Description

If a thread has JavaFrameAnchor set, prefer getting PC/SP/FP from the anchor, since the anchor's values are always reliable.

Related issues

#1317, #1179, #1513

Motivation and context

Async-profiler may fail to unwind native part of the stack when a library is compiled without frame pointers AND with no DWARF unwinding info in .eh_frame section (typically on Alpine Linux). In this case, stack walker may use incorrect values of SP/FP once it reaches the topmost Java frame or it can miss the top Java frame completely. This in turn results in a large number of truncated stack traces with [unknown] or [break_interpreted] frame.

The solution is to use JavaFrameAnchor saved in the VM's JavaThread structure to discover the top Java frame even when the native part of the stack cannot be traversed.

How has this been tested?

Tested manually on Liberica JDK 11 with sample programs provided in #1179.
Also, updated stackwalker integration tests: previously they checked that stack traces were truncated, now they check that stack traces are complete.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

// retry stack walking from the anchor
if (anchor != NULL && anchor->getFrame(pc, sp, fp)) {
anchor = NULL;
while (depth > 0 && frames[depth - 1].method_id == NULL) depth--; // pop unknown frames
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • shouldn't we also pop error frames as well -> top frame can be "unknown_nmethod"
  • I think we should add an error frame to indicate that we recovered from anchor (I.E. we unwinding failed in native layer before reaching the first proper java frame)

Copy link
Member Author

@apangin apangin Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I no longer add unknown_nmethod error if anchor is available.
All other erros happen in Java context, where there is no anchor to recover from.

Copy link
Member Author

@apangin apangin Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recovering from anchor is a normal situation, we don't need to mark it with a synthetic error frame.
This typically happens at the border between a native Java method and its JNI implementation, like in your StackWalker tests:

    Java_test_stackwalker_StackGenerator_largeFrame
-->
    test/stackwalker/StackGenerator.largeFrame
    test/stackwalker/StackGenerator.main

Copy link
Contributor

@Baraa-Hasheesh Baraa-Hasheesh Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that it's possible for the recovery to happen way before that boundary
For example:

NativeMethod3
NativeMethod2 --> Stack walker can fail here due to missing FPs & dwarf info
NativeMethod1
NativeJniMethod
JavaJniMethod
JavaMainMethod

The final collected stack will look like

NativeMethod3
NativeMethod2
                ------> Here we missed NativeMethod1 & NativeJniMethod
JavaJniMethod
JavaMainMethod

I understand there is no real way to know if we really missed any frames, so this suggested "synthetic error frame" could be wrong, but I think there's value to know it's possible to have missing frames as that does tell us there is potential to enhance our unwinding logic

My point in short is I don't want to hide potential unwinding bugs in DWARF that we can fix/enhance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, I'd prefer not to have these error frames. They add noise and give a feeling that something works wrong, although this is exactly how the algorithm is designed to work.
Often, especially on arm64 where there is a dedicated register for frame pointers, native stack will be fully unwound.

After all, this behavior replicates how cstack=dwarf + AsyncGetCallTrace work right now, so there will be no surprises for users.


// Walk until the bottom of the stack or until the first Java frame
JavaFrameAnchor* anchor = NULL;
if (detail > VM_BASIC && vm_thread != NULL && vm_thread->isJavaThread()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this change

Question: once VM becomes the default stack walker can we delete this VM_BASIC mode or do we still need it?

=> It's only used for alloc_tracer samples when VM structs are available by using the JavaFrameAnchor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VM_BASIC mode mimics what JVM TI GetStackTrace or AsyncGetCallTrace produce.
This can be probably removed, but only after the next release, when there will be a large cleanup of AGCT-related code.

@apangin apangin merged commit ea09546 into master Oct 3, 2025
53 checks passed
@apangin apangin deleted the anchor branch October 3, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants