Optimize wall clock profiling

### Describe the feature

Wall clock profiling mode (`-e wall`) is currently implemented by a dedicated thread periodically sampling *all* application threads, regardless of their status. This means, if an application has 1000 threads and wall clock profiler runs with 100ms interval, profiler will record 10K samples per second, sending 10K signals and writing about 150kB data every second. This may cause notable performance overhead, both in CPU usage (signals are expensive) and in JFR file size.

In wall clock mode, most of the sampled threads are sleeping, e.g., idle thread pools waiting for incoming tasks. Signals will awake them, causing even more overhead by letting kernel manage threads that would be otherwise intact.

We can significantly reduce impact of wall clock profiling if we skip sampling threads that are known to be idle.

### Use Case

- Lower wall clock profiling overhead for applications with thousands of threads.
- Make smaller wall clock intervals feasible, thus improving profile accuracy.
- Reduce size of JFR recordings when wall clock mode is enabled.

### Proposed Solution

After sampling a thread in IDLE state, record its total CPU usage.
Next time, if CPU usage of this thread has not changed, we will not sample it again, considering the thread stays IDLE at the very same location we've seen it before. We will start sampling this thread again if its CPU time changes or after 1000 skipped iterations, whatever occurs first.

Introduce new JFR event `profiler.WallClockSample` having all the fields of `jdk.ExecutionSample` event plus a new `int samples` field containing a counter for skipped samples. Instead of recording 500 similar `ExecutionSample` events for a sleeping thread, we will issue only one `ExecutionSample` event and one `WallClockSample` event with samples=499.
Amend `JfrReader` to seamlessly handle `WallClockSample` events as if there was a single event per sample as before. Nothing should change from user's perspective; all implementation details should be encapsulated.

If for some reason previous behavior is desired, wall clock optimization can be disabled by adding `nobatch` profiler argument.

### Acknowledgements

- [X] I may be able to implement this feature request
- [X] This feature might incur a breaking change

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize wall clock profiling #1007

Describe the feature

Use Case

Proposed Solution

Acknowledgements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize wall clock profiling #1007

Description

Describe the feature

Use Case

Proposed Solution

Acknowledgements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions