KEMBAR78
#1007: Optimize wall clock profiling by apangin · Pull Request #1008 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

@apangin
Copy link
Member

@apangin apangin commented Sep 25, 2024

Description

Optimize wall clock profiling by skipping redundant samples when we know thread CPU time has not changed since the last sample.

Related issues

Resolves #1007.

Motivation and context

Make wall clock profiling safe for production use, even for applications with thousands of threads.

How has this been tested?

Run benchmarks with new wall clock mechanism and legacy wall clock (nobatch) and compare flame graphs.
Verify the number of ExecutionSample and WallClockSample events manually.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@apangin apangin merged commit f53bfd4 into master Sep 26, 2024
2 checks passed
@apangin apangin deleted the wall-optimization branch September 26, 2024 17:44
@petrbouda
Copy link

Hi, this is a great improvement! :)

Screenshot 2024-10-29 at 15 29 37

Does this mean that a correct handling of CPU and WALL events is something like this:
CPU: jdk.ExecutionSamples events and filter out all events except from STATE_DEFAULT
WALL: profiler.WallClockSample events -> cummulate over samples field

There is no need for any kind of mapping between jdk.ExecutionSamples and profiler.WallClockSample, am I right?

Thanks, Petr

@petrbouda
Copy link

Does it make sense to record other events than STATE_DEFAULT into jdk.ExecutionSample ?
Screenshot 2024-10-29 at 15 52 19

@apangin
Copy link
Member Author

apangin commented Nov 2, 2024

@petrbouda The meaning of jdk.ExecutionSample remains unchanged: it may represent both CPU and Wall clock events depending on state field as before. profiler.WallClockSample is an additional way to represent multiple wall clock samples as one event.

JfrReader transparently converts both event types to ExecutionSample.

I agree it would be clearer to leave jdk.ExecutionSample only for CPU events and profiler.WallClockSample for Wall clock events. I will probably change this in the future.

@petrbouda
Copy link

Hi Andrei, thank you for your answer. I've already looked into JfrReader to understand your parsing logic.
Preparing support in Jeffrey :) ... this is a cool feature, thanks a lot!

@apangin
Copy link
Member Author

apangin commented Nov 3, 2024

It was actually trivial: I modified profiler to always emit different events for CPU vs. Wall clock samples.
Now wall clock engine writes profiler.WallClockSample only, and CPU engine writes jdk.ExecutionSample with STATE_DEFAULT.

Note: if nobatch option is specified, both CPU and Wall clock profiler emit jdk.ExecutionSample as before.

@petrbouda
Copy link

Cool! It makes recordings much smaller 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize wall clock profiling

2 participants