Log when no samples are collected #1167

visheshruparelia · 2025-03-10T14:29:52Z

Description

Adding a log message when there are no samples collected by the profiler.

Related issues

N/A

Motivation and context

During some UT failure investigations earlier, I noticed that the issue was because there were no samples being collected. It was not obvious by looking just at the Test output that this was the case. Thus, adding this log line to provide insight for any future investigations.

How has this been tested?

The CI builds successfully pass thus ensuring that this is not a breaking change.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

roy-soumadipta · 2025-03-10T14:57:19Z

src/callTraceStorage.cpp

        }
    }
+    if (samples.size() == initialSize) {
+        Log::info("No samples collected");


If it is warning why label it as info?

roy-soumadipta

Changes look fine not adding too much of an overhead, however Andrei is the best person to comment on it. Also I think creating an issue and linking the change to it would be better. Again would like Andrei to weigh in on it.

apangin · 2025-03-10T15:39:21Z

src/callTraceStorage.cpp

    }
+    if (!samplesCollected) {
+        Log::info("No samples collected");
+    }


CallTraceStorage is just a data structure - a kind of specialized hash table. It's not a job of a storage to report UX-related stuff. Furthermore, it does not have enough context to do so: e.g., a JFR recording may have multiple chunks; samples are collected for each chunk, and even though one chunk may contain no samples, other chunks probably do.

It's better to emit a message at the same place where we dump results and where we know for sure that no samples were reported.

Other things to consider:

In continuous profiling mode (i.e. with --loop option), we keep dumping output periodically with no user interaction involved. A log message will be redundant in this case.

If output filter is specified (-I/-X options), output may be empty even when number of collected samples is non-zero. It makes sense to emit a different message in this case, e.g. All %d samples were filtered out.

@apangin this makes total sense. Thank you for your insights, I will make the changes.

Makes sense @apangin, will make the changes.

Some thoughts/questions:

The method dump() in Profiler.cpp class takes care of dumping the output. Now, this method is only invoked once per invocation of async-profiler. So far, this is what I understood by navigating the code. Is this the correct understanding? Or do we have cases where for a single invocation of async profiler, we invoke the dump() method multiple times?

One of the benefits I was getting by adding the log in the collectSamples() method is that all the "dump" methods invoke it so one stop place to log our message. If I add it in the individual dump methods, the logic of logging needs to be added in all those methods. Are we okay with that or did you have something else in mind? (I am thinking of creating a common method which contains the logging logic and invoke it from different dump methods like dumpCollapsed(), dumpText() etc.

@apangin please see the updated PR for reference to the above questions. I dont think the current diff covers JFR based dump option, need to dig deep into that.

JFR is tricky. It contains some useful info even if no samples are collected. JFR may also record multiple event types at the same time.
As mentioned in the review, I think only collapsed and flamegraph/tree output deserve a log message. It's ok to add a utility function to do this logging.

apangin · 2025-03-12T01:09:39Z

src/profiler.cpp

-        if (trace == NULL || excludeTrace(&fn, trace)) continue;
+        if (trace == NULL) continue;
+        if (excludeTrace(&fn, trace)) {
+            filtered_sample_count++;


Comparing filtered_sample_count to _total_samples is not reliable: filtered_sample_count does not take into account samples with trace == NULL or counter == 0.

Why not counting printed samples instead? And simply emit a message if no samples were printed.

@apangin we can do that. I am assuming we count whenever we hit this LOC and compare it with _total_samples. However, that means that we won't be able to print statements like %d samples were filtered out but rather it will just be blanket statement like no samples collected. Another option is we count both: Number of printed samples and number of filtered samples (because of -I or -X). Based on these numbers we can print the both the messages.

Effectively we can have a logic like the following:

Count samples filtered if excludeTrace returns true to a var called filtered_sample_count

Count samples skipped because of trace being NULL or counter = 0to a var called skipped_sample_count

Count samples being printed to the output buffer to a var called printed_sample_count
Then the logIfNoSamples() method can look like this:

void logIfNoSamples() { if (args._loop) return; if (filtered_sample_count > 0 && skipped_sample_count + filtered_sample_count == _total_samples) { print("%d samples were filtered out", filtered_sample_count); return; } if (printed_sample_count == 0) { print("No samples collected"); } }

Let's stay simple. The reason for a message is to let user know that the output is empty expectedly, not because of some I/O error. We may also print a warning if we know there was an I/O error (dumpCollapsed already does this; may be a good idea to merge this into your function).

I propose to identify 3 cases:

there was an I/O error: !out.good();

no samples were collected: _total_samples - _failures[-ticks_skipped] == 0;

some samples were collected, but nothing was printed (actual number of filtered samples does not really matter): printed_samples == 0. So, all you need is to count printed_samples.

WDYT?

Yeah makes sense. Quick question though: is it safe to assume that skipped_sample_count var if computed based on the above logic, is the same as _failures[-ticks_skipped]?

@apangin updated the PR btw.

No, these are different things. A failure of type ticks_skipped is counted when profiler was unable to record a sample at all (failed to acquire a lock).

trace == NULL is an extremely rare situation when dump is requested while profiler is running. It denotes a race when a new stack trace is being recorded concurrently with dumping.

counter == 0 is when a stack trace has been found in the CallTraceStorage, but has no counted samples associated with it. For example, when dumping results of live object profiling, a stack trace is recorded on object allocation, but it does not count unless this object survives throughout profiling session.

Oh I see, thank you for the explanation. I will dive deep into the code once more now with this information to gain a better understanding of the code.

PS: In the current diff, we check !out.good() for the dumpFlameGraph() method as well. It seems redundant to me (you can correct me if I am wrong) as we don't really interact with it. Instead, we interact with the FlameGraph class. I did go through the FlameGraph class to see if we can do something similar there but could not really come up with something. Do you have any suggestions around that?

flamegraph.dump interacts with out.

Ohh! I have no idea how I missed it. Thank you.

apangin · 2025-03-12T01:14:56Z

src/profiler.cpp

+            }
            samples.push_back(it->second);
        }
+        logIfNoSamples(args, filtered_sample_count);


Text output already includes the number of collected samples in the summary. I'm not sure if it's worth logging in addition to that. I'd say log matters mostly for collapsed and flamegraph output. Let me know if you think otherwise.

It prints _total_samples but that does not consider samples excluded because of -I or -X flag right? Am I missing something?

That's correct. But this information allows user to identify the cases mentioned above (no collected samples vs. no printed samples).

apangin · 2025-03-12T01:18:33Z

src/profiler.h

    void stopTimer();
    void timerLoop(void* timer_id);

+    void logIfNoSamples(Arguments& args, u64 excluded_samples) {


I think the function is too big to be inlined in the header file. I'd move implementation to .cpp instead.

apangin

Thank you for your contribution. I left several minor comments. They may sound scrupulous, but since you are new to the project, this is to calibrate attitude to the coding style.

apangin · 2025-03-14T18:54:28Z

src/profiler.cpp

    std::vector<CallTraceSample*> samples;
    _call_trace_storage.collectSamples(samples);
-
+    u64 printed_sample_count = 0;


A small wish: do not remove empty lines that were left for readability purpose.

apangin · 2025-03-14T18:56:21Z

src/profiler.cpp


+void Profiler::logSampleCollectionStatus(Arguments& args, u64 printed_samples_count, Writer& out) {
+    if (!out.good()) {
+        Log::warn("Output file may be incomplete");


In case of I/O error, the output is undefined, so return here without printing further messages.

apangin · 2025-03-14T19:05:23Z

src/profiler.cpp

+    if (args._loop) {
+        return;
+    }
+    // If total samples count match the samples which were skipped - we log a message


Please avoid comments that just repeat what is written in the code.
IMO, in this case, the code is pretty much self-describing. I'd keep it concise like

if (_total_samples == _failures[-ticks_skipped]) { Log::info("No samples were collected"); } else if (printed_samples_count == 0) { Log::info("All samples were filtered out"); }

Btw, although I typically prefer simpler expressions, in this particular case
if (_total_samples - _failures[-ticks_skipped] == 0) would be more obvious to a reader (i.e. "No samples" means that some value is zero).

apangin · 2025-03-14T19:09:45Z

src/profiler.h

    void stopTimer();
    void timerLoop(void* timer_id);

+    void logSampleCollectionStatus(Arguments& args, u64 printed_samples_count, Writer& out);


The name of the function is long but vague. I'd call it something like logEmptyOutput which is closer to its original purpose.

apangin · 2025-03-14T19:30:51Z

src/profiler.cpp

            f->_self += counter;
+            printed_sample_count++;
        }
+        logSampleCollectionStatus(args, printed_sample_count, out);


Should be called after flamegraph.dump

visheshruparelia · 2025-03-14T20:35:34Z

Thank you for your contribution. I left several minor comments. They my sound scrupulous, but since you are new to the project, this is to calibrate attitude to the coding style.

Thank you @apangin for the checkstyle suggestions. I have incorporated them in the current PR and will ensure that they are followed in the PRs going forward.

visheshruparelia added 4 commits March 10, 2025 10:32

add a warn message if no samples are collected

47573f9

Merge branch 'master' into add_warn_message

e192087

add Log header

25adf3e

use info

5aa64c0

visheshruparelia assigned apangin and unassigned apangin Mar 10, 2025

visheshruparelia requested a review from apangin March 10, 2025 14:32

roy-soumadipta reviewed Mar 10, 2025

View reviewed changes

apangin reviewed Mar 10, 2025

View reviewed changes

visheshruparelia added 3 commits March 11, 2025 16:41

move logging out of trace storage DS

9c972ed

remove unsued vars

24db598

verify tests pass in local

4ae92fe

apangin reviewed Mar 12, 2025

View reviewed changes

visheshruparelia added 2 commits March 12, 2025 14:21

revamp logic when counting skipped samples and actually printed samples

b0f48e4

remove whitespace

1d099b3

apangin reviewed Mar 14, 2025

View reviewed changes

refactor code to meet checkstyle standards

c99efb2

apangin merged commit b0dde79 into async-profiler:master Mar 14, 2025
10 checks passed

Log when no samples are collected #1167

Log when no samples are collected #1167

Uh oh!

Conversation

visheshruparelia commented Mar 10, 2025

Description

Related issues

Motivation and context

How has this been tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roy-soumadipta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

visheshruparelia Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

visheshruparelia Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apangin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

visheshruparelia commented Mar 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

visheshruparelia Mar 11, 2025 •

edited

Loading

visheshruparelia Mar 12, 2025 •

edited

Loading

apangin left a comment •

edited

Loading