KEMBAR78
Unwind checksum and digest intrinsics on ARM64 by apangin · Pull Request #1400 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

apangin
Copy link
Member

@apangin apangin commented Jul 20, 2025

Description

Unwind certain ARM64 intrinsics.

Related issues

#1385

Motivation and context

HotSpot JVM has hand-coded assembly implementation for some CPU intensive checksum and digest computation algorithms suchs as CRC32, MD5, SHA256, etc. These compiler intrinsic functions are generated at runtime and they do not follow commonly used frame layout, especially on ARM64. That's why async-profiler (as well as JDK Flight Recorder and other profilers) cannot unwind Java stack trace during execution of these functions.

I divided stub routines for these intrinsics into 4 categories:

  1. Stubs that do not modify stack pointer or frame pointer. Unwinding these stubs is trivial: return address stays in x30 register.
  2. Stubs that have fixed size frame layout and begin with stp instruction that decrement stack pointer by this fixed amount. To unwind these stubs, I find frame size directly from the instruction stream and increment sp back by this amount.
  3. Stubs that setup regular frame pointer link. For them, I use fp (x29) register to find the previous frame.
  4. All other stubs that are either non-leaf functions or have dynamic frame size (modify sp in the middle). These cases are rare, and I do not attempt to unwind such frames.

How has this been tested?

Added new test CodingIntrinsics.java that executes the following intrinsics:

  • updateBytesCRC32
  • updateBytesAdler32
  • md5_implCompress, md5_implCompressMB
  • sha1_implCompress, sha1_implCompressMB
  • sha256_implCompress, sha256_implCompressMB
  • encodeBlock
  • zero_blocks
  • _large_arrays_hashcode_byte
  • jbyte_disjoint_arraycopy

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@apangin apangin changed the title Arm64 intrinsics Unwind certain ARM64 intrinsics Jul 20, 2025
@apangin apangin changed the title Unwind certain ARM64 intrinsics Unwind checksum and digest intrinsics on ARM64 Jul 20, 2025
@apangin apangin merged commit 843f1d9 into master Jul 21, 2025
39 checks passed

@Test(mainClass = CodingIntrinsics.class, debugNonSafepoints = true, arch = {Arch.ARM64, Arch.X64}, inputs = "")
@Test(mainClass = CodingIntrinsics.class, debugNonSafepoints = true, arch = {Arch.ARM64, Arch.X64}, inputs = "--cstack vm", nameSuffix = "VM")
public void intrinsics(TestProcess p) throws Exception {
Copy link
Contributor

@Baraa-Hasheesh Baraa-Hasheesh Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin I've observed that this test could be quite fragile on Systems where we don't have dwarf support

This can be observed on macOS in our GHA

I would recommend disabling this test for macOS until we have proper dwarf support in it, what do you think?

Example => https://github.com/async-profiler/async-profiler/actions/runs/16412814698/job/46371796767

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. The problem has nothing to do with dwarf support, but it's likely that the test reveals an actual gap in the profiler's stack walking algorithm. I'll take a look.

@apangin apangin deleted the arm64-intrinsics branch July 30, 2025 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants