-
Notifications
You must be signed in to change notification settings - Fork 937
Description
Background
There are two commonly used profiling modes: cpu
that samples actively running threads and wall
that gets stack traces of all threads, whether they are running, sleeping or blocked. cpu
mode works best for highlighting hot spots, while wall
can be useful for analyzing latency issues.
Wall clock mode can replace CPU profiling to some extent. With jfr
output, async-profiler records thread state for each execution sample: STATE_RUNNABLE
or STATE_SLEEPING
. If we filter only STATE_RUNNABLE
samples, we will get a CPU-like profile. This can be achieved by using --cpu
option in jfr2flame
converter. However, there are several downsides of this approach:
- kernel stack traces are not available in wall clock mode;
wall
mode generally has higher overhead thancpu
;- this approach is less accurate comparing to perf_events based sampling.
Proposal
This is a proposal to make joint cpu
and wall
profiling possible. Async-profiler already supports profiling multiple events together (cpu+alloc+lock), if the output format is set to jfr
. We will add an option to include wall
in the set of profiled events. The interval of wall clock profiling can be set independently of cpu
interval, e.g.
-e cpu -i 10ms --wall 200ms -f out.jfr
Agent arguments equivalent:
event=cpu,interval=10ms,wall=200ms,file=out.jfr
Joint output
cpu
and wall
can be profiled together only with jfr
output.
Both modes produce events of the same jdk.ExecutionSample
type. Events coming from wall
profiling have STATE_RUNNABLE
or STATE_SLEEPING
thread state. Events from other modes now have STATE_DEFAULT
thread state.
There is no more --cpu
option in jfr2flame
converter. It has been replaced with --state
option that accepts a comma separated list of filtered event types. E.g.
--state default
extracts execution samples produced by cpu (perf) engine;--state runnable,sleeping
shows only wall clock samples.
Notes
- The new feature has effect on Linux only. CPU profiling engine on macOS is already based on the wall clock mode - there is no additional gain comparing to the existing
-e wall
option. - Enabling
wall
profiling in addition tocpu
results in higher overhead. It makes sense to increasewall
interval when profiling both events together.