KEMBAR78
Optimize method tracing when the function is not profiled by fandreuz · Pull Request #1471 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

@fandreuz
Copy link
Contributor

@fandreuz fandreuz commented Sep 10, 2025

Description

In the current implementation, we always make a JNI call to recordExit regardless of whether the current function call breaks the latency threshold or not. We could move the check from C++ to Java, so we could save a JNI call when the method is not going to be profiled.

Related issues

#1421

Motivation and context

This simple class shows a significant improvement in terms of throughput:

public class Test {
    private static volatile long x = 0;

    public static void doStuff() throws Exception {
        ++x;
    }

    public static void main(String[] args) throws Exception {
        long start = System.nanoTime();
        while (System.nanoTime() - start < 10_000_000_000l) {
            doStuff();
        }
        System.out.println(x);
    }
}
Branch Value
master 83084633
latency-jni-optimize 119116380

So we get approximately 40% more method calls when the method is never profiled (in this case the workload is negligible).

How has this been tested?

The current set of tests is good enough to validate this change.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@fandreuz fandreuz force-pushed the latency-jni-optimize-v2 branch from 6bf8adb to afedffe Compare September 10, 2025 09:07
@fandreuz fandreuz force-pushed the latency-jni-optimize-v2 branch from 28079b3 to afedffe Compare September 10, 2025 14:01
void JNICALL Instrument::recordExit0(JNIEnv* jni, jobject unused, jlong durationNs) {
if (!_enabled) return;

u64 now_ticks = TSC::ticks();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What worries me most in this change is that it introduces inaccuracy in reported timings.
There can be a potentially large delay between the last call to System.nanoTime and this TSC::ticks()`, especially if a thread is de-scheduled from CPU in this time window. This delay can cause shift in the reported event start time, which may in turn break correlation with other events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, how about 027ba68 ?

We'll record a slightly higher durationNs than what we use to test the latency. But if anything, the measurement is going to be more realistic, since it'll also include the JNI penalty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better. Now timings should be more accurate, for the price of an extra nanotime call.
I know how to improve this further by leveraging jdk.jfr.internal.JVM.counterTime() intrinsic in Java code, but this will require lots of changes. We can discuss it and probably leave for a separate PR.

Comment on lines 353 to 358
case Result::CLASS_DOES_NOT_MATCH:
Log::debug("Skipping instrumentation of %s: class does not match", class_name.c_str());
break;
case Result::PROFILER_CLASS:
Log::trace("Skipping instrumentation of %s: internal profiler class", class_name.c_str());
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why debug in one case and trace in another?
But in my opinion, neither of these logs is useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, add

default:
    break;

to suppress compiler warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in my opinion, neither of these logs is useful.

I agree, that's why I made them invisible by default. The idea is not to skip quietly any decision I take during the flow, and let this information be available upon closer inspection.

to suppress compiler warning.

e87c715

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But these messages get recorded in .jfr anyway.
I suggest to remove them completely or change them both to trace and adjust logging not write trace level logs to jfr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I didn't know they go there. 74053fc

void JNICALL Instrument::recordExit0(JNIEnv* jni, jobject unused, jlong durationNs) {
if (!_enabled) return;

u64 now_ticks = TSC::ticks();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better. Now timings should be more accurate, for the price of an extra nanotime call.
I know how to improve this further by leveraging jdk.jfr.internal.JVM.counterTime() intrinsic in Java code, but this will require lots of changes. We can discuss it and probably leave for a separate PR.

Comment on lines 1023 to 1024
buf->putVar32(LOG_ERROR - FlightRecorder::MIN_LOG_LEVEL + 1);
for (int i = FlightRecorder::MIN_LOG_LEVEL; i <= LOG_ERROR; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave the code as it was. There's nothing bad in listing all available levels, especially since there is a (theoretical) possibility to call writeLog directly bypassing checks in Log::log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, reverted in 148e6b3

src/log.cpp Outdated
}

void Log::log(LogLevel level, const char* msg, va_list args) {
if (_level > level && FlightRecorder::MIN_LOG_LEVEL > level) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd reverse the check: when an input argument participates in some condition, it better reads when an argument is on the left side. CONSTANT > arg looks unnatural to me.

if (level < _level && level < FlightRecorder::MIN_LOG_LEVEL)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/log.cpp Outdated

// Write all messages to JFR, if active
if (level < LOG_ERROR) {
if (level >= FlightRecorder::MIN_LOG_LEVEL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you removed < LOG_ERROR intentionally?
Error message typically indicates a situation when profiler is unable to execute a user command. Errors don't happen out of the blue at runtime and don't need to be recorded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, 148e6b3


public static void recordExit(long startTimeNs, long minLatency) {
if (System.nanoTime() - startTimeNs >= minLatency) {
recordExit0(startTimeNs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If latency == 0, you can call recordExit0 directly from the generated code. Even the name of the method suits well :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this will complicate decision how many frames to skip in the stacktrace.
But I think the optimization to skip redundant nanoTime call is useful. Maybe we can add another variant of recordExit call? wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add another variant of recordExit call?

Yeah I think this is the easiest way: a9a8604

Or we add something like EventType::METHOD_TRACE_ZERO.

@apangin apangin merged commit 6f2a9b8 into async-profiler:master Sep 15, 2025
35 of 37 checks passed
krk pushed a commit to krk/async-profiler that referenced this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants