-
Notifications
You must be signed in to change notification settings - Fork 937
Add time_nanos and duration_nanos to OTLP profiles with test #1413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time_nanos and duration_nanos to OTLP profiles with test #1413
Conversation
60c24f6 to
52a154e
Compare
a9f84a0 to
52a154e
Compare
|
It is not obvious to me how the fact that profile is dumped for the first time affects Example 1: User starts profiling (e.g. using an agent, but this does not matter), then, after a few seconds, runs Example 2: Consider an agent that restarts profiling session and after 30 seconds calls |
|
Yeah, I thought about such a case. I'm not sure there's a clean solution to this problem. You could look things from this perspective: dump formats may support the possibility to tell the consumer whether it's a cumulative or a delta dump. OTLP does, so we set it. But if the first dump happens to use a format that does not support it, the information is simply lost. This seems more like misuse on the user side, which could also happen in other ways: what if the user forgets to export the packet containing the first dump after a restart? Regarding your second example, I see just one flaw in the reasoning: profiles produced by Async-Profiler itself won't adhere to the protocol. The existence of an external agent handling the flow is implicitly assumed, which makes this solution a bit less convenient. Other options we considered:
|
|
My point is, If I stop profiler and then run it again after 1 hour - why it should be treated as "delta" from the previous report, if there was a long gap when no data was collected at all? Or if the agent submits 3 packets: cumulative, delta, cumulative - how will receiver recognize that the last To me, a scenario which you've described as "misuse" is completely valid. Imagine I've configured an agent to report profiles to a remote server, but don't see data in the UI. To check if the issue is on the collection side, I run I believe the following approach makes up a simple and unambiguous solution:
The example in profiles.proto describes exactly our case, if we replace "fault" with "restart". |
|
Yes, I see now that I'll also check with the OTEL profiling community, maybe this aggregation temporality thing could be removed from the protocol altogether, or its semantics clarified. The eBPF profiler does not use it. |
|
@apangin are you ok with changing |
src/profiler.cpp
Outdated
| #include "profiler.h" | ||
| #include "perfEvents.h" | ||
| #include "ctimer.h" | ||
| #include <time.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last minor thing: do not interleave standard headers with project header files.
The recommended #include order is the following:
<standard c++ headers>
<standard c headers>
"currentModule.h"
"other headers"
all in alphabetical order.
|
All good now, thanks |
Description
Implemented proper temporal aggregation support where consecutive dumps maintain the same time_nanos (session start) but show increasing duration_nanos, while profiler restarts generate new time_nanos values. Converted _start_time to nanosecond precision.
Related issues
Not available
Motivation and context
This change ensures proper temporal data interpretation across profiler sessions by providing precise timing information that allows OTLP consumers to distinguish between different profiling periods.
How has this been tested?
Added a new test
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.