Add sanity tests for various idling/blocking operations #1179

jbachorik · 2025-03-13T17:37:08Z

Description

This adds two additional tests based on what we have in Datadog Java Profiler.

Motivation and context

These tests will reliably fail on AARCH64 when running on JDK 8/11 from Temurin or Liberica distribution. On other distributions (Corretto and Zulu - other don't seem to provide up-to-date updates of 8 and 11 so, hopefully, their usage will be phased out) the tests are passing just fine.

The main issue seems to be related to somehow mangled FP/LR when calling to some JIT compiled methods (most frequently this fails for things like 'Thread.sleep()or Object.wait()`) where following the standard rules for obtaining FP and LR as SP[-8] and SP[-16] yield bogus values.
The exact location is here - https://github.com/DataDog/async-profiler/blob/f71c31af7b6972fc5432d0be33ed75a34ab3f710/src/stackWalker.cpp#L292

I checked the instructions for the affected methods and the frame size is correct, so the FP and LR values already arrive mangled.

I have a very experimental and WIP code in branch where I am trying to wrap my head around what and how is actually mangling the linkage.
A desperate attempt is to use the standard FP walking to recover in such situation - and, to my surprise, it mostly works. Well, if paired with opportunistic recovery from the thread JavaFrameAnchor (if there is any). But still it can create broken or 'surprising' stacktraces

I want to add these three tests (well, two tests and one extra configuration of the existing test) to allow perhaps someone more knowledgeable in ways how different toolchain/compilation might mangle FP/LR such that they are not directly usable from a compiled method.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

krk · 2025-07-01T11:18:30Z

Thank you for your contribution.

I tested these changes with Corretto 21.0.7 locally on x86_64 and the WallTests.waitingWallVM is failing ~50% of the time, with Corretto 24.0.1, both WallTests.waitingWallVM and pingPongWallVM fail.

As it is now, the tests are not stable enough to be executed in the CI.

Corretto 21.0.7

FAIL [3/4] WallTests.waitingWallVM took 3.791 s
java.lang.AssertionError: Expected 0.0 == 2.0
        >  test/test/wall/WallTests.java:55
        >  Assert.isEqual(0, unknown);
        at one.profiler.test.Assert.assertComparison(Assert.java:39)
        at one.profiler.test.Assert.isEqual(Assert.java:44)
        at test.wall.WallTests.waitingWallVM(WallTests.java:55)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at one.profiler.test.Runner.run(Runner.java:133)
        at one.profiler.test.Runner.main(Runner.java:230)

Corretto 24.0.1

FAIL [4/4] WallTests.pingPongWallVM took 3.790 s
java.lang.AssertionError: Expected 0.0 == 3006.0
        >  test/test/wall/WallTests.java:68
        >  Assert.isEqual(0, unknown);
        at one.profiler.test.Assert.assertComparison(Assert.java:39)
        at one.profiler.test.Assert.isEqual(Assert.java:44)
        at test.wall.WallTests.pingPongWallVM(WallTests.java:68)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:565)
        at one.profiler.test.Runner.run(Runner.java:133)
        at one.profiler.test.Runner.main(Runner.java:230)

jbachorik · 2025-07-01T11:25:06Z

Hi @krk - were you able to check the profiles to see if the failures are incorrect or we, indeed, have a large number of stacktraces unresolved?

I would guess it is the second case, judging on my experience - and the tests are here to actually show the gaps in the vm structs based stackwalking and then decide whether it is something we want to fix (and how) or we will just go 'meh, it's as good as it gets'.

Add sanity tests for various idling/blocking operations

7bb70f5

apangin force-pushed the master branch from ee69cf1 to a78793b Compare March 20, 2025 10:38

apangin mentioned this pull request Oct 1, 2025

Use JavaFrameAnchor to find top Java frame with cstack=vm #1517

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sanity tests for various idling/blocking operations #1179

Add sanity tests for various idling/blocking operations #1179

Uh oh!

jbachorik commented Mar 13, 2025

Uh oh!

krk commented Jul 1, 2025

Uh oh!

jbachorik commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add sanity tests for various idling/blocking operations #1179

Are you sure you want to change the base?

Add sanity tests for various idling/blocking operations #1179

Uh oh!

Conversation

jbachorik commented Mar 13, 2025

Description

Motivation and context

Uh oh!

krk commented Jul 1, 2025

Uh oh!

jbachorik commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants