-
Notifications
You must be signed in to change notification settings - Fork 937
Description
Background
There are two commonly used profiling modes: cpu that samples actively running threads and wall that gets stack traces of all threads, whether they are running, sleeping or blocked. cpu mode works best for highlighting hot spots, while wall can be useful for analyzing latency issues.
Wall clock mode can replace CPU profiling to some extent. With jfr output, async-profiler records thread state for each execution sample: STATE_RUNNABLE or STATE_SLEEPING. If we filter only STATE_RUNNABLE samples, we will get a CPU-like profile. This can be achieved by using --cpu option in jfr2flame converter. However, there are several downsides of this approach:
- kernel stack traces are not available in wall clock mode;
wallmode generally has higher overhead thancpu;- this approach is less accurate comparing to perf_events based sampling.
Proposal
This is a proposal to make joint cpu and wall profiling possible. Async-profiler already supports profiling multiple events together (cpu+alloc+lock), if the output format is set to jfr. We will add an option to include wall in the set of profiled events. The interval of wall clock profiling can be set independently of cpu interval, e.g.
-e cpu -i 10ms --wall 200ms -f out.jfr
Agent arguments equivalent:
event=cpu,interval=10ms,wall=200ms,file=out.jfr
Joint output
cpu and wall can be profiled together only with jfr output.
Both modes produce events of the same jdk.ExecutionSample type. Events coming from wall profiling have STATE_RUNNABLE or STATE_SLEEPING thread state. Events from other modes now have STATE_DEFAULT thread state.
There is no more --cpu option in jfr2flame converter. It has been replaced with --state option that accepts a comma separated list of filtered event types. E.g.
--state defaultextracts execution samples produced by cpu (perf) engine;--state runnable,sleepingshows only wall clock samples.
Notes
- The new feature has effect on Linux only. CPU profiling engine on macOS is already based on the wall clock mode - there is no additional gain comparing to the existing
-e walloption. - Enabling
wallprofiling in addition tocpuresults in higher overhead. It makes sense to increasewallinterval when profiling both events together.