-
Notifications
You must be signed in to change notification settings - Fork 937
Use JavaFrameAnchor to find top Java frame with cstack=vm #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// retry stack walking from the anchor | ||
if (anchor != NULL && anchor->getFrame(pc, sp, fp)) { | ||
anchor = NULL; | ||
while (depth > 0 && frames[depth - 1].method_id == NULL) depth--; // pop unknown frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- shouldn't we also pop error frames as well -> top frame can be "unknown_nmethod"
- I think we should add an error frame to indicate that we recovered from anchor (I.E. we unwinding failed in native layer before reaching the first proper java frame)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I no longer add unknown_nmethod
error if anchor is available.
All other erros happen in Java context, where there is no anchor to recover from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recovering from anchor is a normal situation, we don't need to mark it with a synthetic error frame.
This typically happens at the border between a native Java method and its JNI implementation, like in your StackWalker tests:
Java_test_stackwalker_StackGenerator_largeFrame
-->
test/stackwalker/StackGenerator.largeFrame
test/stackwalker/StackGenerator.main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that it's possible for the recovery to happen way before that boundary
For example:
NativeMethod3
NativeMethod2 --> Stack walker can fail here due to missing FPs & dwarf info
NativeMethod1
NativeJniMethod
JavaJniMethod
JavaMainMethod
The final collected stack will look like
NativeMethod3
NativeMethod2
------> Here we missed NativeMethod1 & NativeJniMethod
JavaJniMethod
JavaMainMethod
I understand there is no real way to know if we really missed any frames, so this suggested "synthetic error frame" could be wrong, but I think there's value to know it's possible to have missing frames as that does tell us there is potential to enhance our unwinding logic
My point in short is I don't want to hide potential unwinding bugs in DWARF that we can fix/enhance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user, I'd prefer not to have these error frames. They add noise and give a feeling that something works wrong, although this is exactly how the algorithm is designed to work.
Often, especially on arm64 where there is a dedicated register for frame pointers, native stack will be fully unwound.
After all, this behavior replicates how cstack=dwarf
+ AsyncGetCallTrace
work right now, so there will be no surprises for users.
|
||
// Walk until the bottom of the stack or until the first Java frame | ||
JavaFrameAnchor* anchor = NULL; | ||
if (detail > VM_BASIC && vm_thread != NULL && vm_thread->isJavaThread()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this change
Question: once VM becomes the default stack walker can we delete this VM_BASIC mode or do we still need it?
=> It's only used for alloc_tracer samples when VM structs are available by using the JavaFrameAnchor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VM_BASIC
mode mimics what JVM TI GetStackTrace or AsyncGetCallTrace produce.
This can be probably removed, but only after the next release, when there will be a large cleanup of AGCT-related code.
Description
If a thread has
JavaFrameAnchor
set, prefer getting PC/SP/FP from the anchor, since the anchor's values are always reliable.Related issues
#1317, #1179, #1513
Motivation and context
Async-profiler may fail to unwind native part of the stack when a library is compiled without frame pointers AND with no DWARF unwinding info in
.eh_frame
section (typically on Alpine Linux). In this case, stack walker may use incorrect values of SP/FP once it reaches the topmost Java frame or it can miss the top Java frame completely. This in turn results in a large number of truncated stack traces with[unknown]
or[break_interpreted]
frame.The solution is to use JavaFrameAnchor saved in the VM's JavaThread structure to discover the top Java frame even when the native part of the stack cannot be traversed.
How has this been tested?
Tested manually on Liberica JDK 11 with sample programs provided in #1179.
Also, updated
stackwalker
integration tests: previously they checked that stack traces were truncated, now they check that stack traces are complete.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.