-
Notifications
You must be signed in to change notification settings - Fork 937
Log when no samples are collected #1167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log when no samples are collected #1167
Conversation
src/callTraceStorage.cpp
Outdated
} | ||
} | ||
if (samples.size() == initialSize) { | ||
Log::info("No samples collected"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is warning why label it as info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look fine not adding too much of an overhead, however Andrei is the best person to comment on it. Also I think creating an issue and linking the change to it would be better. Again would like Andrei to weigh in on it.
src/callTraceStorage.cpp
Outdated
} | ||
if (!samplesCollected) { | ||
Log::info("No samples collected"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CallTraceStorage
is just a data structure - a kind of specialized hash table. It's not a job of a storage to report UX-related stuff. Furthermore, it does not have enough context to do so: e.g., a JFR recording may have multiple chunks; samples are collected for each chunk, and even though one chunk may contain no samples, other chunks probably do.
It's better to emit a message at the same place where we dump results and where we know for sure that no samples were reported.
Other things to consider:
- In continuous profiling mode (i.e. with
--loop
option), we keep dumping output periodically with no user interaction involved. A log message will be redundant in this case. - If output filter is specified (
-I
/-X
options), output may be empty even when number of collected samples is non-zero. It makes sense to emit a different message in this case, e.g.All %d samples were filtered out
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin this makes total sense. Thank you for your insights, I will make the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense @apangin, will make the changes.
Some thoughts/questions:
- The method
dump()
inProfiler.cpp
class takes care of dumping the output. Now, this method is only invoked once per invocation of async-profiler. So far, this is what I understood by navigating the code. Is this the correct understanding? Or do we have cases where for a single invocation of async profiler, we invoke thedump()
method multiple times? - One of the benefits I was getting by adding the log in the
collectSamples()
method is that all the "dump" methods invoke it so one stop place to log our message. If I add it in the individualdump
methods, the logic of logging needs to be added in all those methods. Are we okay with that or did you have something else in mind? (I am thinking of creating a common method which contains the logging logic and invoke it from differentdump
methods likedumpCollapsed()
,dumpText()
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin please see the updated PR for reference to the above questions. I dont think the current diff covers JFR based dump option, need to dig deep into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JFR is tricky. It contains some useful info even if no samples are collected. JFR may also record multiple event types at the same time.
As mentioned in the review, I think only collapsed
and flamegraph/tree
output deserve a log message. It's ok to add a utility function to do this logging.
src/profiler.cpp
Outdated
if (trace == NULL || excludeTrace(&fn, trace)) continue; | ||
if (trace == NULL) continue; | ||
if (excludeTrace(&fn, trace)) { | ||
filtered_sample_count++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing filtered_sample_count
to _total_samples
is not reliable: filtered_sample_count
does not take into account samples with trace == NULL
or counter == 0
.
Why not counting printed samples instead? And simply emit a message if no samples were printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin we can do that. I am assuming we count whenever we hit this LOC and compare it with _total_samples
. However, that means that we won't be able to print statements like %d samples were filtered out
but rather it will just be blanket statement like no samples collected
. Another option is we count both: Number of printed samples and number of filtered samples (because of -I or -X). Based on these numbers we can print the both the messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively we can have a logic like the following:
- Count samples filtered if
excludeTrace
returnstrue
to a var calledfiltered_sample_count
- Count samples skipped because of trace being
NULL
orcounter = 0
to a var calledskipped_sample_count
- Count samples being printed to the output buffer to a var called
printed_sample_count
Then thelogIfNoSamples()
method can look like this:
void logIfNoSamples() {
if (args._loop) return;
if (filtered_sample_count > 0 && skipped_sample_count + filtered_sample_count == _total_samples) {
print("%d samples were filtered out", filtered_sample_count);
return;
}
if (printed_sample_count == 0) {
print("No samples collected");
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stay simple. The reason for a message is to let user know that the output is empty expectedly, not because of some I/O error. We may also print a warning if we know there was an I/O error (dumpCollapsed
already does this; may be a good idea to merge this into your function).
I propose to identify 3 cases:
- there was an I/O error:
!out.good()
; - no samples were collected:
_total_samples - _failures[-ticks_skipped] == 0
; - some samples were collected, but nothing was printed (actual number of filtered samples does not really matter):
printed_samples == 0
. So, all you need is to countprinted_samples
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah makes sense. Quick question though: is it safe to assume that skipped_sample_count
var if computed based on the above logic, is the same as _failures[-ticks_skipped]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin updated the PR btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these are different things. A failure of type ticks_skipped
is counted when profiler was unable to record a sample at all (failed to acquire a lock).
trace == NULL
is an extremely rare situation when dump is requested while profiler is running. It denotes a race when a new stack trace is being recorded concurrently with dumping.
counter == 0
is when a stack trace has been found in the CallTraceStorage, but has no counted samples associated with it. For example, when dumping results of live object profiling, a stack trace is recorded on object allocation, but it does not count unless this object survives throughout profiling session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, thank you for the explanation. I will dive deep into the code once more now with this information to gain a better understanding of the code.
PS: In the current diff, we check !out.good()
for the dumpFlameGraph()
method as well. It seems redundant to me (you can correct me if I am wrong) as we don't really interact with it. Instead, we interact with the FlameGraph
class. I did go through the FlameGraph
class to see if we can do something similar there but could not really come up with something. Do you have any suggestions around that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flamegraph.dump
interacts with out
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh! I have no idea how I missed it. Thank you.
src/profiler.cpp
Outdated
} | ||
samples.push_back(it->second); | ||
} | ||
logIfNoSamples(args, filtered_sample_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text output already includes the number of collected samples in the summary. I'm not sure if it's worth logging in addition to that. I'd say log matters mostly for collapsed
and flamegraph
output. Let me know if you think otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It prints _total_samples
but that does not consider samples excluded because of -I or -X
flag right? Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. But this information allows user to identify the cases mentioned above (no collected samples vs. no printed samples).
src/profiler.h
Outdated
void stopTimer(); | ||
void timerLoop(void* timer_id); | ||
|
||
void logIfNoSamples(Arguments& args, u64 excluded_samples) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function is too big to be inlined in the header file. I'd move implementation to .cpp instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution. I left several minor comments. They may sound scrupulous, but since you are new to the project, this is to calibrate attitude to the coding style.
src/profiler.cpp
Outdated
std::vector<CallTraceSample*> samples; | ||
_call_trace_storage.collectSamples(samples); | ||
|
||
u64 printed_sample_count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small wish: do not remove empty lines that were left for readability purpose.
|
||
void Profiler::logSampleCollectionStatus(Arguments& args, u64 printed_samples_count, Writer& out) { | ||
if (!out.good()) { | ||
Log::warn("Output file may be incomplete"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of I/O error, the output is undefined, so return here without printing further messages.
src/profiler.cpp
Outdated
if (args._loop) { | ||
return; | ||
} | ||
// If total samples count match the samples which were skipped - we log a message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid comments that just repeat what is written in the code.
IMO, in this case, the code is pretty much self-describing. I'd keep it concise like
if (_total_samples == _failures[-ticks_skipped]) {
Log::info("No samples were collected");
} else if (printed_samples_count == 0) {
Log::info("All samples were filtered out");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, although I typically prefer simpler expressions, in this particular case
if (_total_samples - _failures[-ticks_skipped] == 0)
would be more obvious to a reader (i.e. "No samples" means that some value is zero).
src/profiler.h
Outdated
void stopTimer(); | ||
void timerLoop(void* timer_id); | ||
|
||
void logSampleCollectionStatus(Arguments& args, u64 printed_samples_count, Writer& out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the function is long but vague. I'd call it something like logEmptyOutput
which is closer to its original purpose.
src/profiler.cpp
Outdated
f->_self += counter; | ||
printed_sample_count++; | ||
} | ||
logSampleCollectionStatus(args, printed_sample_count, out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be called after flamegraph.dump
Thank you @apangin for the checkstyle suggestions. I have incorporated them in the current PR and will ensure that they are followed in the PRs going forward. |
Description
Adding a log message when there are no samples collected by the profiler.
Related issues
N/A
Motivation and context
During some UT failure investigations earlier, I noticed that the issue was because there were no samples being collected. It was not obvious by looking just at the Test output that this was the case. Thus, adding this log line to provide insight for any future investigations.
How has this been tested?
The CI builds successfully pass thus ensuring that this is not a breaking change.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.