-
Notifications
You must be signed in to change notification settings - Fork 937
Add cpu sampling #1286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cpu sampling #1286
Conversation
src/frameName.cpp
Outdated
case BCI_CPU: { | ||
int cpu = (int)(uintptr_t)frame.method_id; | ||
char buf[32]; | ||
snprintf(buf, sizeof(buf), "%d", cpu-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative idea I had was to make the line in the beginning of this function (Instead of having the +1 offset)
if (frame.method_id == NULL) {
be
if(frrame.bci != BCI_CPU && frame.method_id == NULL)
so it will skip this if method_id is zero, not familiar with the codebase though so not sure if that would have a bad side effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd set some bit instead - to make sure method_id
is never NULL and to distinguish it from BCI_THREAD, which uses the same encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like just set the left most bit then unset it when we are reading it? that works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm what if I shift left by one and set, then shift right when reading? don't have to make assumptions on the size of a pointer on the system
src/arguments.cpp
Outdated
CASE("inverted") | ||
_inverted = true; | ||
|
||
CASE("record-cpu") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a section for FlameGraph options as the comment above states.
I'd put new option next to sched
, which is kind of similar behavior-wise.
src/frameName.cpp
Outdated
case BCI_CPU: { | ||
int cpu = (int)(uintptr_t)frame.method_id; | ||
char buf[32]; | ||
snprintf(buf, sizeof(buf), "%d", cpu-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd set some bit instead - to make sure method_id
is never NULL and to distinguish it from BCI_THREAD, which uses the same encoding.
src/frameName.cpp
Outdated
int cpu = (int)(uintptr_t)frame.method_id; | ||
char buf[32]; | ||
snprintf(buf, sizeof(buf), "%d", cpu-1); | ||
return _str.assign("[cpu").append(buf).append("]").c_str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why building string part-by-part, if you can format the whole thing in the above snprintf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originally I was using std::to_string() and appending, will change it to snprintf
src/perfEvents_linux.cpp
Outdated
#ifdef PERF_ATTR_SIZE_VER5 | ||
if (_cstack == CSTACK_LBR) { | ||
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK | PERF_SAMPLE_REGS_USER; | ||
attr.sample_type |= PERF_SAMPLE_BRANCH_STACK | PERF_SAMPLE_REGS_USER | PERF_SAMPLE_CPU; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked this actually works? It seems that the field is always added, but the reader only parses it when _record_cpu
is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah this is from when i was hacking it together not sure why I left it in (it should only be enabled in the if statement block below
src/profiler.cpp
Outdated
// +1 is to prevent the method_id from being read as a null pointer | ||
// when cpu = 0 | ||
num_frames += makeFrame(frames + num_frames, BCI_CPU, java_ctx.cpu + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix indentation.
const void* pc; | ||
uintptr_t sp; | ||
uintptr_t fp; | ||
u64 cpu; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change it to 64-bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ring.next() returns u64 so i thought it was nicer to just keep it as the type that was originally returned. Though I don't think it makes much of a difference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for the explanation
src/stackWalker.h
Outdated
this->cpu = 0; | ||
} | ||
|
||
void set(const void* pc, uintptr_t sp, uintptr_t fp, u64 cpu) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this function used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not, since it seems like we have a function to set all fields in that class I thought it would make sense to add another function with the option of passing CPU for completion sake/future usage. Can remove if you feel its not needed
src/perfEvents_linux.cpp
Outdated
u64 cpu = ring.next(); | ||
java_ctx->cpu = cpu; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use 4 spaces indent.
IMO, local variable is redundant here, just set java_ctx->cpu = ring.next();
src/profiler.cpp
Outdated
num_frames += makeFrame(frames + num_frames, BCI_ERROR, OS::schedPolicy(0)); | ||
} | ||
if (_add_cpu_frame) { | ||
num_frames += makeFrame(frames + num_frames, BCI_CPU, (java_ctx.cpu << 1) | 0x1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of setting a higher bit instead. In this case, there will be no need for shift, and frame values for BCI_CPU will not clash with values for BCI_THREAD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked this approach as it prevents as errors from occurring from mistakes in casting, changes in intermediary datatypes and different pointer/int sizes on various platforms (i.e one of them being smaller than whatever nth bit we set) in practice just setting the 16th bit would probably have the same effect so I can do that
Im curious what you mean by "clashing". they are differentiated by the bci enum, but is there some problem with them being the same value in the method_id pointer? maybe I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same value for method_id can be a problem as long as MethodMap
uses jmethodID
as a key.
_add_event_frame = args._output != OUTPUT_JFR; | ||
_add_thread_frame = args._threads && args._output != OUTPUT_JFR; | ||
_add_sched_frame = args._sched; | ||
_add_cpu_frame = args._record_cpu; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since record_cpu
is only supported with PerfEvents, do you want to return an error when record_cpu
is requested for any other engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that sounds like a good idea
src/stackWalker.h
Outdated
this->pc = pc; | ||
this->sp = sp; | ||
this->fp = fp; | ||
this->cpu = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to overwrite cpu
value set earlier. Did you test different stack walking modes (fp, dwarf, vm, default)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm yeah default,vm,vmx work but fp, dwarf do not, think i'll just remove setting it as 0
src/arguments.cpp
Outdated
// filter=FILTER - thread filter | ||
// threads - profile different threads separately | ||
// sched - group threads by scheduling policy | ||
// record-cpu - record which cpu a sample was taken on (default: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind updating ProfilingOptions.md
doc as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Can you also add a sanity test for --record-cpu
? Ideally, all new features should be covered by at least one test. Alternatively, you may modify one of existing --target-cpu
tests by adding --record-cpu
to it.
Signed-off-by: JugadK <68133163+JugadK@users.noreply.github.com>
test/test/cpu/CpuTests.java
Outdated
public void perfEventsRecordCpuEventsCount(TestProcess p) throws Exception { | ||
|
||
Output outRightCpu = p.profile("-d 2 -e cpu-clock -i 100ms --total -o collapsed --record-cpu"); | ||
assertCloseTo(outRightCpu.total(), 2_000_000_000, "perf_events total should match profiling duration"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not actually test the feature.
I suggest you pin the process to CPU 1 and check that output contains [CPU-1]
, but not [CPU-0]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah thought you mean't by "sanity test" just something that sees if it does not crash, will do
All good now. Thank you for the contribution! |
Description
Adds a "cpu" virtual frame at the bottom of each stack when one passes in --record-cpu as a flag
Related issues
#1240
Motivation and context
Discussed in the issue, but I want to be able to see which cpu a sample was taken on, this is done through enabling PERF_EVENT_CPU for the perf profile when --record-cpu is passed as a flag. This is read from the ring buffer into StackContext which is made into a virtual frame at the bottom of the stack (Similar to --sched).
How has this been tested?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.