KEMBAR78
Add sanity tests for various idling/blocking operations by jbachorik · Pull Request #1179 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

@jbachorik
Copy link

Description

This adds two additional tests based on what we have in Datadog Java Profiler.

Motivation and context

These tests will reliably fail on AARCH64 when running on JDK 8/11 from Temurin or Liberica distribution. On other distributions (Corretto and Zulu - other don't seem to provide up-to-date updates of 8 and 11 so, hopefully, their usage will be phased out) the tests are passing just fine.

The main issue seems to be related to somehow mangled FP/LR when calling to some JIT compiled methods (most frequently this fails for things like 'Thread.sleep()or Object.wait()`) where following the standard rules for obtaining FP and LR as SP[-8] and SP[-16] yield bogus values.
The exact location is here - https://github.com/DataDog/async-profiler/blob/f71c31af7b6972fc5432d0be33ed75a34ab3f710/src/stackWalker.cpp#L292

I checked the instructions for the affected methods and the frame size is correct, so the FP and LR values already arrive mangled.

I have a very experimental and WIP code in branch where I am trying to wrap my head around what and how is actually mangling the linkage.
A desperate attempt is to use the standard FP walking to recover in such situation - and, to my surprise, it mostly works. Well, if paired with opportunistic recovery from the thread JavaFrameAnchor (if there is any). But still it can create broken or 'surprising' stacktraces

I want to add these three tests (well, two tests and one extra configuration of the existing test) to allow perhaps someone more knowledgeable in ways how different toolchain/compilation might mangle FP/LR such that they are not directly usable from a compiled method.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@krk
Copy link
Contributor

krk commented Jul 1, 2025

Thank you for your contribution.

I tested these changes with Corretto 21.0.7 locally on x86_64 and the WallTests.waitingWallVM is failing ~50% of the time, with Corretto 24.0.1, both WallTests.waitingWallVM and pingPongWallVM fail.

As it is now, the tests are not stable enough to be executed in the CI.

Corretto 21.0.7

FAIL [3/4] WallTests.waitingWallVM took 3.791 s
java.lang.AssertionError: Expected 0.0 == 2.0
        >  test/test/wall/WallTests.java:55
        >  Assert.isEqual(0, unknown);
        at one.profiler.test.Assert.assertComparison(Assert.java:39)
        at one.profiler.test.Assert.isEqual(Assert.java:44)
        at test.wall.WallTests.waitingWallVM(WallTests.java:55)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at one.profiler.test.Runner.run(Runner.java:133)
        at one.profiler.test.Runner.main(Runner.java:230)
Corretto 24.0.1

FAIL [4/4] WallTests.pingPongWallVM took 3.790 s
java.lang.AssertionError: Expected 0.0 == 3006.0
        >  test/test/wall/WallTests.java:68
        >  Assert.isEqual(0, unknown);
        at one.profiler.test.Assert.assertComparison(Assert.java:39)
        at one.profiler.test.Assert.isEqual(Assert.java:44)
        at test.wall.WallTests.pingPongWallVM(WallTests.java:68)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:565)
        at one.profiler.test.Runner.run(Runner.java:133)
        at one.profiler.test.Runner.main(Runner.java:230)

@jbachorik
Copy link
Author

Hi @krk - were you able to check the profiles to see if the failures are incorrect or we, indeed, have a large number of stacktraces unresolved?

I would guess it is the second case, judging on my experience - and the tests are here to actually show the gaps in the vm structs based stackwalking and then decide whether it is something we want to fix (and how) or we will just go 'meh, it's as good as it gets'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants