KEMBAR78
Optimize wall clock profiling · Issue #1007 · async-profiler/async-profiler · GitHub
Skip to content

Optimize wall clock profiling #1007

@apangin

Description

@apangin

Describe the feature

Wall clock profiling mode (-e wall) is currently implemented by a dedicated thread periodically sampling all application threads, regardless of their status. This means, if an application has 1000 threads and wall clock profiler runs with 100ms interval, profiler will record 10K samples per second, sending 10K signals and writing about 150kB data every second. This may cause notable performance overhead, both in CPU usage (signals are expensive) and in JFR file size.

In wall clock mode, most of the sampled threads are sleeping, e.g., idle thread pools waiting for incoming tasks. Signals will awake them, causing even more overhead by letting kernel manage threads that would be otherwise intact.

We can significantly reduce impact of wall clock profiling if we skip sampling threads that are known to be idle.

Use Case

  • Lower wall clock profiling overhead for applications with thousands of threads.
  • Make smaller wall clock intervals feasible, thus improving profile accuracy.
  • Reduce size of JFR recordings when wall clock mode is enabled.

Proposed Solution

After sampling a thread in IDLE state, record its total CPU usage.
Next time, if CPU usage of this thread has not changed, we will not sample it again, considering the thread stays IDLE at the very same location we've seen it before. We will start sampling this thread again if its CPU time changes or after 1000 skipped iterations, whatever occurs first.

Introduce new JFR event profiler.WallClockSample having all the fields of jdk.ExecutionSample event plus a new int samples field containing a counter for skipped samples. Instead of recording 500 similar ExecutionSample events for a sleeping thread, we will issue only one ExecutionSample event and one WallClockSample event with samples=499.
Amend JfrReader to seamlessly handle WallClockSample events as if there was a single event per sample as before. Nothing should change from user's perspective; all implementation details should be encapsulated.

If for some reason previous behavior is desired, wall clock optimization can be disabled by adding nobatch profiler argument.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions