-
Notifications
You must be signed in to change notification settings - Fork 938
Optimize method tracing when the function is not profiled #1471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize method tracing when the function is not profiled #1471
Conversation
6bf8adb
to
afedffe
Compare
28079b3
to
afedffe
Compare
src/instrument.cpp
Outdated
void JNICALL Instrument::recordExit0(JNIEnv* jni, jobject unused, jlong durationNs) { | ||
if (!_enabled) return; | ||
|
||
u64 now_ticks = TSC::ticks(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What worries me most in this change is that it introduces inaccuracy in reported timings.
There can be a potentially large delay between the last call to System.nanoTime
and this TSC::ticks()`, especially if a thread is de-scheduled from CPU in this time window. This delay can cause shift in the reported event start time, which may in turn break correlation with other events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, how about 027ba68 ?
We'll record a slightly higher durationNs
than what we use to test the latency. But if anything, the measurement is going to be more realistic, since it'll also include the JNI penalty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's better. Now timings should be more accurate, for the price of an extra nanotime
call.
I know how to improve this further by leveraging jdk.jfr.internal.JVM.counterTime()
intrinsic in Java code, but this will require lots of changes. We can discuss it and probably leave for a separate PR.
case Result::CLASS_DOES_NOT_MATCH: | ||
Log::debug("Skipping instrumentation of %s: class does not match", class_name.c_str()); | ||
break; | ||
case Result::PROFILER_CLASS: | ||
Log::trace("Skipping instrumentation of %s: internal profiler class", class_name.c_str()); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why debug
in one case and trace
in another?
But in my opinion, neither of these logs is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, add
default:
break;
to suppress compiler warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in my opinion, neither of these logs is useful.
I agree, that's why I made them invisible by default. The idea is not to skip quietly any decision I take during the flow, and let this information be available upon closer inspection.
to suppress compiler warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But these messages get recorded in .jfr anyway.
I suggest to remove them completely or change them both to trace
and adjust logging not write trace
level logs to jfr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I didn't know they go there. 74053fc
src/instrument.cpp
Outdated
void JNICALL Instrument::recordExit0(JNIEnv* jni, jobject unused, jlong durationNs) { | ||
if (!_enabled) return; | ||
|
||
u64 now_ticks = TSC::ticks(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's better. Now timings should be more accurate, for the price of an extra nanotime
call.
I know how to improve this further by leveraging jdk.jfr.internal.JVM.counterTime()
intrinsic in Java code, but this will require lots of changes. We can discuss it and probably leave for a separate PR.
src/flightRecorder.cpp
Outdated
buf->putVar32(LOG_ERROR - FlightRecorder::MIN_LOG_LEVEL + 1); | ||
for (int i = FlightRecorder::MIN_LOG_LEVEL; i <= LOG_ERROR; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave the code as it was. There's nothing bad in listing all available levels, especially since there is a (theoretical) possibility to call writeLog
directly bypassing checks in Log::log
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, reverted in 148e6b3
src/log.cpp
Outdated
} | ||
|
||
void Log::log(LogLevel level, const char* msg, va_list args) { | ||
if (_level > level && FlightRecorder::MIN_LOG_LEVEL > level) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd reverse the check: when an input argument participates in some condition, it better reads when an argument is on the left side. CONSTANT > arg
looks unnatural to me.
if (level < _level && level < FlightRecorder::MIN_LOG_LEVEL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/log.cpp
Outdated
|
||
// Write all messages to JFR, if active | ||
if (level < LOG_ERROR) { | ||
if (level >= FlightRecorder::MIN_LOG_LEVEL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you removed < LOG_ERROR
intentionally?
Error message typically indicates a situation when profiler is unable to execute a user command. Errors don't happen out of the blue at runtime and don't need to be recorded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, 148e6b3
|
||
public static void recordExit(long startTimeNs, long minLatency) { | ||
if (System.nanoTime() - startTimeNs >= minLatency) { | ||
recordExit0(startTimeNs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If latency == 0
, you can call recordExit0
directly from the generated code. Even the name of the method suits well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this will complicate decision how many frames to skip in the stacktrace.
But I think the optimization to skip redundant nanoTime
call is useful. Maybe we can add another variant of recordExit
call? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add another variant of recordExit call?
Yeah I think this is the easiest way: a9a8604
Or we add something like EventType::METHOD_TRACE_ZERO
.
Description
In the current implementation, we always make a JNI call to
recordExit
regardless of whether the current function call breaks the latency threshold or not. We could move the check from C++ to Java, so we could save a JNI call when the method is not going to be profiled.Related issues
#1421
Motivation and context
This simple class shows a significant improvement in terms of throughput:
master
latency-jni-optimize
So we get approximately 40% more method calls when the method is never profiled (in this case the workload is negligible).
How has this been tested?
The current set of tests is good enough to validate this change.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.