KEMBAR78
Profile CPU + Wall clock together · Issue #740 · async-profiler/async-profiler · GitHub
Skip to content

Profile CPU + Wall clock together #740

@apangin

Description

@apangin

Background

There are two commonly used profiling modes: cpu that samples actively running threads and wall that gets stack traces of all threads, whether they are running, sleeping or blocked. cpu mode works best for highlighting hot spots, while wall can be useful for analyzing latency issues.

Wall clock mode can replace CPU profiling to some extent. With jfr output, async-profiler records thread state for each execution sample: STATE_RUNNABLE or STATE_SLEEPING. If we filter only STATE_RUNNABLE samples, we will get a CPU-like profile. This can be achieved by using --cpu option in jfr2flame converter. However, there are several downsides of this approach:

  • kernel stack traces are not available in wall clock mode;
  • wall mode generally has higher overhead than cpu;
  • this approach is less accurate comparing to perf_events based sampling.

Proposal

This is a proposal to make joint cpu and wall profiling possible. Async-profiler already supports profiling multiple events together (cpu+alloc+lock), if the output format is set to jfr. We will add an option to include wall in the set of profiled events. The interval of wall clock profiling can be set independently of cpu interval, e.g.

-e cpu -i 10ms --wall 200ms -f out.jfr

Agent arguments equivalent:

event=cpu,interval=10ms,wall=200ms,file=out.jfr

Joint output

cpu and wall can be profiled together only with jfr output.

Both modes produce events of the same jdk.ExecutionSample type. Events coming from wall profiling have STATE_RUNNABLE or STATE_SLEEPING thread state. Events from other modes now have STATE_DEFAULT thread state.

There is no more --cpu option in jfr2flame converter. It has been replaced with --state option that accepts a comma separated list of filtered event types. E.g.

  • --state default extracts execution samples produced by cpu (perf) engine;
  • --state runnable,sleeping shows only wall clock samples.

Notes

  1. The new feature has effect on Linux only. CPU profiling engine on macOS is already based on the wall clock mode - there is no additional gain comparing to the existing -e wall option.
  2. Enabling wall profiling in addition to cpu results in higher overhead. It makes sense to increase wall interval when profiling both events together.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions