KEMBAR78
Log when no samples are collected by visheshruparelia · Pull Request #1167 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

visheshruparelia
Copy link
Contributor

Description

Adding a log message when there are no samples collected by the profiler.

Related issues

N/A

Motivation and context

During some UT failure investigations earlier, I noticed that the issue was because there were no samples being collected. It was not obvious by looking just at the Test output that this was the case. Thus, adding this log line to provide insight for any future investigations.

How has this been tested?

The CI builds successfully pass thus ensuring that this is not a breaking change.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

}
}
if (samples.size() == initialSize) {
Log::info("No samples collected");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is warning why label it as info?

Copy link
Contributor

@roy-soumadipta roy-soumadipta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine not adding too much of an overhead, however Andrei is the best person to comment on it. Also I think creating an issue and linking the change to it would be better. Again would like Andrei to weigh in on it.

}
if (!samplesCollected) {
Log::info("No samples collected");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CallTraceStorage is just a data structure - a kind of specialized hash table. It's not a job of a storage to report UX-related stuff. Furthermore, it does not have enough context to do so: e.g., a JFR recording may have multiple chunks; samples are collected for each chunk, and even though one chunk may contain no samples, other chunks probably do.

It's better to emit a message at the same place where we dump results and where we know for sure that no samples were reported.

Other things to consider:

  • In continuous profiling mode (i.e. with --loop option), we keep dumping output periodically with no user interaction involved. A log message will be redundant in this case.
  • If output filter is specified (-I/-X options), output may be empty even when number of collected samples is non-zero. It makes sense to emit a different message in this case, e.g. All %d samples were filtered out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin this makes total sense. Thank you for your insights, I will make the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense @apangin, will make the changes.

Some thoughts/questions:

  • The method dump() in Profiler.cpp class takes care of dumping the output. Now, this method is only invoked once per invocation of async-profiler. So far, this is what I understood by navigating the code. Is this the correct understanding? Or do we have cases where for a single invocation of async profiler, we invoke the dump() method multiple times?
  • One of the benefits I was getting by adding the log in the collectSamples() method is that all the "dump" methods invoke it so one stop place to log our message. If I add it in the individual dump methods, the logic of logging needs to be added in all those methods. Are we okay with that or did you have something else in mind? (I am thinking of creating a common method which contains the logging logic and invoke it from different dump methods like dumpCollapsed(), dumpText() etc.

Copy link
Contributor Author

@visheshruparelia visheshruparelia Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin please see the updated PR for reference to the above questions. I dont think the current diff covers JFR based dump option, need to dig deep into that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFR is tricky. It contains some useful info even if no samples are collected. JFR may also record multiple event types at the same time.
As mentioned in the review, I think only collapsed and flamegraph/tree output deserve a log message. It's ok to add a utility function to do this logging.

src/profiler.cpp Outdated
if (trace == NULL || excludeTrace(&fn, trace)) continue;
if (trace == NULL) continue;
if (excludeTrace(&fn, trace)) {
filtered_sample_count++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing filtered_sample_count to _total_samples is not reliable: filtered_sample_count does not take into account samples with trace == NULL or counter == 0.

Why not counting printed samples instead? And simply emit a message if no samples were printed.

Copy link
Contributor Author

@visheshruparelia visheshruparelia Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin we can do that. I am assuming we count whenever we hit this LOC and compare it with _total_samples. However, that means that we won't be able to print statements like %d samples were filtered out but rather it will just be blanket statement like no samples collected. Another option is we count both: Number of printed samples and number of filtered samples (because of -I or -X). Based on these numbers we can print the both the messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively we can have a logic like the following:

  • Count samples filtered if excludeTrace returns true to a var called filtered_sample_count
  • Count samples skipped because of trace being NULL or counter = 0to a var called skipped_sample_count
  • Count samples being printed to the output buffer to a var called printed_sample_count
    Then the logIfNoSamples() method can look like this:
void logIfNoSamples() {
    if (args._loop) return;
    if (filtered_sample_count > 0 && skipped_sample_count + filtered_sample_count == _total_samples) {
        print("%d samples were filtered out", filtered_sample_count); 
        return;
    }
    if (printed_sample_count == 0) {
        print("No samples collected");
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stay simple. The reason for a message is to let user know that the output is empty expectedly, not because of some I/O error. We may also print a warning if we know there was an I/O error (dumpCollapsed already does this; may be a good idea to merge this into your function).

I propose to identify 3 cases:

  1. there was an I/O error: !out.good();
  2. no samples were collected: _total_samples - _failures[-ticks_skipped] == 0;
  3. some samples were collected, but nothing was printed (actual number of filtered samples does not really matter): printed_samples == 0. So, all you need is to count printed_samples.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. Quick question though: is it safe to assume that skipped_sample_count var if computed based on the above logic, is the same as _failures[-ticks_skipped]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apangin updated the PR btw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these are different things. A failure of type ticks_skipped is counted when profiler was unable to record a sample at all (failed to acquire a lock).

trace == NULL is an extremely rare situation when dump is requested while profiler is running. It denotes a race when a new stack trace is being recorded concurrently with dumping.

counter == 0 is when a stack trace has been found in the CallTraceStorage, but has no counted samples associated with it. For example, when dumping results of live object profiling, a stack trace is recorded on object allocation, but it does not count unless this object survives throughout profiling session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, thank you for the explanation. I will dive deep into the code once more now with this information to gain a better understanding of the code.

PS: In the current diff, we check !out.good() for the dumpFlameGraph() method as well. It seems redundant to me (you can correct me if I am wrong) as we don't really interact with it. Instead, we interact with the FlameGraph class. I did go through the FlameGraph class to see if we can do something similar there but could not really come up with something. Do you have any suggestions around that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flamegraph.dump interacts with out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh! I have no idea how I missed it. Thank you.

src/profiler.cpp Outdated
}
samples.push_back(it->second);
}
logIfNoSamples(args, filtered_sample_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text output already includes the number of collected samples in the summary. I'm not sure if it's worth logging in addition to that. I'd say log matters mostly for collapsed and flamegraph output. Let me know if you think otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prints _total_samples but that does not consider samples excluded because of -I or -X flag right? Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. But this information allows user to identify the cases mentioned above (no collected samples vs. no printed samples).

src/profiler.h Outdated
void stopTimer();
void timerLoop(void* timer_id);

void logIfNoSamples(Arguments& args, u64 excluded_samples) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function is too big to be inlined in the header file. I'd move implementation to .cpp instead.

Copy link
Member

@apangin apangin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. I left several minor comments. They may sound scrupulous, but since you are new to the project, this is to calibrate attitude to the coding style.

src/profiler.cpp Outdated
std::vector<CallTraceSample*> samples;
_call_trace_storage.collectSamples(samples);

u64 printed_sample_count = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small wish: do not remove empty lines that were left for readability purpose.


void Profiler::logSampleCollectionStatus(Arguments& args, u64 printed_samples_count, Writer& out) {
if (!out.good()) {
Log::warn("Output file may be incomplete");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of I/O error, the output is undefined, so return here without printing further messages.

src/profiler.cpp Outdated
if (args._loop) {
return;
}
// If total samples count match the samples which were skipped - we log a message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid comments that just repeat what is written in the code.
IMO, in this case, the code is pretty much self-describing. I'd keep it concise like

    if (_total_samples == _failures[-ticks_skipped]) {
        Log::info("No samples were collected");
    } else if (printed_samples_count == 0) {
        Log::info("All samples were filtered out");
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, although I typically prefer simpler expressions, in this particular case
if (_total_samples - _failures[-ticks_skipped] == 0) would be more obvious to a reader (i.e. "No samples" means that some value is zero).

src/profiler.h Outdated
void stopTimer();
void timerLoop(void* timer_id);

void logSampleCollectionStatus(Arguments& args, u64 printed_samples_count, Writer& out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the function is long but vague. I'd call it something like logEmptyOutput which is closer to its original purpose.

src/profiler.cpp Outdated
f->_self += counter;
printed_sample_count++;
}
logSampleCollectionStatus(args, printed_sample_count, out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be called after flamegraph.dump

@visheshruparelia
Copy link
Contributor Author

Thank you for your contribution. I left several minor comments. They my sound scrupulous, but since you are new to the project, this is to calibrate attitude to the coding style.

Thank you @apangin for the checkstyle suggestions. I have incorporated them in the current PR and will ensure that they are followed in the PRs going forward.

@apangin apangin merged commit b0dde79 into async-profiler:master Mar 14, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants