Optimize method tracing when the function is not profiled #1471

fandreuz · 2025-09-10T08:21:23Z

Description

In the current implementation, we always make a JNI call to recordExit regardless of whether the current function call breaks the latency threshold or not. We could move the check from C++ to Java, so we could save a JNI call when the method is not going to be profiled.

Related issues

#1421

Motivation and context

This simple class shows a significant improvement in terms of throughput:

public class Test {
    private static volatile long x = 0;

    public static void doStuff() throws Exception {
        ++x;
    }

    public static void main(String[] args) throws Exception {
        long start = System.nanoTime();
        while (System.nanoTime() - start < 10_000_000_000l) {
            doStuff();
        }
        System.out.println(x);
    }
}

Branch	Value
`master`	83084633
`latency-jni-optimize`	119116380

So we get approximately 40% more method calls when the method is never profiled (in this case the workload is negligible).

How has this been tested?

The current set of tests is good enough to validate this change.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

src/helper/one/profiler/Instrument.java

src/instrument.cpp

src/profiler.cpp

apangin · 2025-09-11T19:17:48Z

src/instrument.cpp

+void JNICALL Instrument::recordExit0(JNIEnv* jni, jobject unused, jlong durationNs) {
    if (!_enabled) return;

    u64 now_ticks = TSC::ticks();


What worries me most in this change is that it introduces inaccuracy in reported timings.
There can be a potentially large delay between the last call to System.nanoTime and this TSC::ticks()`, especially if a thread is de-scheduled from CPU in this time window. This delay can cause shift in the reported event start time, which may in turn break correlation with other events.

I see, how about 027ba68 ?

We'll record a slightly higher durationNs than what we use to test the latency. But if anything, the measurement is going to be more realistic, since it'll also include the JNI penalty.

That's better. Now timings should be more accurate, for the price of an extra nanotime call.
I know how to improve this further by leveraging jdk.jfr.internal.JVM.counterTime() intrinsic in Java code, but this will require lots of changes. We can discuss it and probably leave for a separate PR.

src/profiler.cpp

src/instrument.cpp

apangin · 2025-09-12T18:13:43Z

src/instrument.cpp

+            case Result::CLASS_DOES_NOT_MATCH:
+                Log::debug("Skipping instrumentation of %s: class does not match", class_name.c_str());
+                break;
+            case Result::PROFILER_CLASS:
+                Log::trace("Skipping instrumentation of %s: internal profiler class", class_name.c_str());
+                break;


Why debug in one case and trace in another?
But in my opinion, neither of these logs is useful.

Also, add

default: break;

to suppress compiler warning.

But in my opinion, neither of these logs is useful.

I agree, that's why I made them invisible by default. The idea is not to skip quietly any decision I take during the flow, and let this information be available upon closer inspection.

to suppress compiler warning.

e87c715

But these messages get recorded in .jfr anyway.
I suggest to remove them completely or change them both to trace and adjust logging not write trace level logs to jfr.

I see, I didn't know they go there. 74053fc

src/instrument.cpp

apangin · 2025-09-12T18:21:27Z

src/instrument.cpp

+void JNICALL Instrument::recordExit0(JNIEnv* jni, jobject unused, jlong durationNs) {
    if (!_enabled) return;

    u64 now_ticks = TSC::ticks();


That's better. Now timings should be more accurate, for the price of an extra nanotime call.
I know how to improve this further by leveraging jdk.jfr.internal.JVM.counterTime() intrinsic in Java code, but this will require lots of changes. We can discuss it and probably leave for a separate PR.

apangin · 2025-09-15T17:28:21Z

src/flightRecorder.cpp

+        buf->putVar32(LOG_ERROR - FlightRecorder::MIN_LOG_LEVEL + 1);
+        for (int i = FlightRecorder::MIN_LOG_LEVEL; i <= LOG_ERROR; i++) {


Leave the code as it was. There's nothing bad in listing all available levels, especially since there is a (theoretical) possibility to call writeLog directly bypassing checks in Log::log.

I see, reverted in 148e6b3

apangin · 2025-09-15T17:35:09Z

src/log.cpp

 }

 void Log::log(LogLevel level, const char* msg, va_list args) {
+    if (_level > level && FlightRecorder::MIN_LOG_LEVEL > level) {


nit: I'd reverse the check: when an input argument participates in some condition, it better reads when an argument is on the left side. CONSTANT > arg looks unnatural to me.

if (level < _level && level < FlightRecorder::MIN_LOG_LEVEL)

apangin · 2025-09-15T17:44:19Z

src/log.cpp


-    // Write all messages to JFR, if active
-    if (level < LOG_ERROR) {
+    if (level >= FlightRecorder::MIN_LOG_LEVEL) {


Have you removed < LOG_ERROR intentionally?
Error message typically indicates a situation when profiler is unable to execute a user command. Errors don't happen out of the blue at runtime and don't need to be recorded.

Got it, 148e6b3

apangin · 2025-09-15T18:50:46Z

src/helper/one/profiler/Instrument.java

+
+    public static void recordExit(long startTimeNs, long minLatency) {
+        if (System.nanoTime() - startTimeNs >= minLatency) {
+            recordExit0(startTimeNs);


If latency == 0, you can call recordExit0 directly from the generated code. Even the name of the method suits well :)

Ah, this will complicate decision how many frames to skip in the stacktrace.
But I think the optimization to skip redundant nanoTime call is useful. Maybe we can add another variant of recordExit call? wdyt?

Maybe we can add another variant of recordExit call?

Yeah I think this is the easiest way: a9a8604

Or we add something like EventType::METHOD_TRACE_ZERO.

…iler#1471)

fandreuz added 7 commits September 9, 2025 19:00

rE0

0527639

take duration

9057458

no bool

8c6bdc9

do stuff

211cb3c

logs

f36b5b3

ns

048fb57

comment and logs

2b50ebf

fandreuz commented Sep 10, 2025

View reviewed changes

src/helper/one/profiler/Instrument.java Show resolved Hide resolved

fandreuz added 2 commits September 10, 2025 08:40

fix cmpl

46e8eb5

ops

afedffe

fandreuz force-pushed the latency-jni-optimize-v2 branch from 6bf8adb to afedffe Compare September 10, 2025 09:07

Baraa-Hasheesh reviewed Sep 10, 2025

View reviewed changes

src/instrument.cpp Show resolved Hide resolved

fandreuz force-pushed the latency-jni-optimize-v2 branch from 28079b3 to afedffe Compare September 10, 2025 14:01

Baraa-Hasheesh reviewed Sep 11, 2025

View reviewed changes

src/profiler.cpp Outdated Show resolved Hide resolved

fix switch fmt

b7fec95

apangin reviewed Sep 11, 2025

View reviewed changes

fandreuz added 3 commits September 12, 2025 09:16

pass startTimeNs

027ba68

no switch

454e908

inline

377c867

apangin reviewed Sep 12, 2025

View reviewed changes

fandreuz added 3 commits September 15, 2025 08:35

review comments

e87c715

dont write trace

74053fc

ops

bd4b91b

apangin reviewed Sep 15, 2025

View reviewed changes

fandreuz added 2 commits September 15, 2025 18:52

review

148e6b3

zero

a9a8604

apangin merged commit 6f2a9b8 into async-profiler:master Sep 15, 2025
35 of 37 checks passed

krk pushed a commit to krk/async-profiler that referenced this pull request Sep 22, 2025

Optimize method tracing when the function is not profiled (async-prof…

28340ae

…iler#1471)

		buf->putVar32(LOG_ERROR - FlightRecorder::MIN_LOG_LEVEL + 1);
		for (int i = FlightRecorder::MIN_LOG_LEVEL; i <= LOG_ERROR; i++) {

Optimize method tracing when the function is not profiled #1471

Optimize method tracing when the function is not profiled #1471

Uh oh!

Conversation

fandreuz commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Motivation and context

How has this been tested?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fandreuz commented Sep 10, 2025 •

edited

Loading