KEMBAR78
Reduce resonance between methods in CPU smoke test by Baraa-Hasheesh · Pull Request #1465 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

@Baraa-Hasheesh
Copy link
Contributor

Description

It was observed that it's possible for either method1 or method2 not to be sampled in certain runs
When checking the logs it was noticed that the sample distribution was 50% on method3 & the other 50% either method1 or method2.

This change makes the loop counter for each method unique to avoid possible synchronization between them & method3

Related issues

N/A

Motivation and context

reduce the possibility of coordination of timing between the methods

How has this been tested?

Test was re-run 100s of times on various systems including ones where the failure was observed


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.


private static void method1() {
for (int i = 0; i < 1000000; i++) {
for (int i = 0; i < 314159; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this number improve the likelihood of getting samples in all methods?

If there's a resonance problem, how about we let all three run in parallel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it reduces possible resonance hence reducing the possibility of all 3 methods synchronizing hence increasing the probability of sampling all

As discussed with @apangin internally that would defeat the intention of the test, which is that hot methods should be sampled even if the duration of each call is less than the sampling duration

Running all 3 in parallel would make them in practice have higher run time than the sampling duration & completely change the nature of the test from a single threaded app to a multi threaded one

Copy link
Contributor

@fandreuz fandreuz Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how you choose the numbers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in person, thanks

@apangin
Copy link
Member

apangin commented Sep 8, 2025

What about method3? Should it's duration also be changed? (maybe 3 or 7 milliseconds)

@Baraa-Hasheesh
Copy link
Contributor Author

What about method3? Should it's duration also be changed? (maybe 3 or 7 milliseconds)

Sure let me re-run the tests with the new duration to make sure everything is working as expected

@Baraa-Hasheesh
Copy link
Contributor Author

What about method3? Should it's duration also be changed? (maybe 3 or 7 milliseconds)

@apangin neither values of 7 & 3 are stable with the current iteration counts, I would recommend keeping it as 10

@apangin
Copy link
Member

apangin commented Sep 8, 2025

What does it mean it's not stable? And how does 10ms duration make it "more stable", given that it correlates with the sampling interval?

@Baraa-Hasheesh
Copy link
Contributor Author

What does it mean it's not stable? And how does 10ms duration make it "more stable", given that it correlates with the sampling interval?

@apangin I test the stability of a given option by simply re-running the same test 100s of times, then measure how likely it's to fail

In the case of using 7,3ms the failure rate was quite high,
Both never managed to go a full 100 runs without failing

As for why 10ms is more stable, it would require an actual full analysis of how the CPU executes the java byte code & create some models to predict when samples would happen, & I honestly didn't see the value of going through that given that the current numbers were observed to be quite stable on multiple systems

@krk
Copy link
Contributor

krk commented Sep 8, 2025

Are these values tuned to a specific CPU type/size?

@Baraa-Hasheesh
Copy link
Contributor Author

Are these values tuned to a specific CPU type/size?

I tried to run this changes across multiple CPU types & sizes & didn't observe any instabilities on the configuration I ran against

@apangin
Copy link
Member

apangin commented Sep 9, 2025

neither values of 7 & 3 are stable with the current iteration counts, I would recommend keeping it as 10

This is contrary to the declared intention of this PR, which is supposed to reduce correlation with the sampling interval. You are saying that the test is more stable when method3 runs for 10ms, which is exactly the sampling interval. This sounds counterintuitive.

I don't mind skipping time consuming analysis if test behavior matches expectation, but here I see the opposite. I've always said that the purpose of a test is not to "pass", but to validate assumptions. My assumption here is, if a test runs 3 methods for a comparable amount of time each, all 3 methods should be visible in a profile. Where am I wrong? What makes you sure that it's a test issue rather than a profiler bug or a JVM bug?

@Baraa-Hasheesh
Copy link
Contributor Author

This is contrary to the declared intention of this PR, which is supposed to reduce correlation with the sampling interval

The intention here is reduce possible synchronization of all 3 methods together as a single unit with the sampling interval.
Not to approach each method individually

I would describe as reducing the chance that all 3 methods come to an agreement to actively prevent one of them from being sampled from the profiler by coordinating their run time.

My assumption here is, if a test runs 3 methods for a comparable amount of time each, all 3 methods should be visible in a profile

This is a fair assumption but kinda goes against what the test actually runs, as the methods do not strictly run for comparable time to one another in a controlled manner,

Method1 & 2 can take different times to complete depending on the system they run on (CPU resources, JDK version & optimizations)
While method3 is written with a strict minimum run time (10ms Currently)

For example let's assume we change method3 to run for 3ms & method2/1 have a combined run time of 7ms on a random system. (total run time will be 3 + 7 = 10ms = sampling duration)
We would end-up with all 3 methods (as a single unit) having a synchronization against the 10ms sampling time which might cause sampling issues.

What makes you sure that it's a test issue rather than a profiler bug or a JVM bug

If it's a JVM bug it would have been viewable in the generated byte code, (For example one method is completely optimized out)
As for If it's a AGCT bug, that's quite unlikely.

Technically speaking it's a limitation of the profiler due to the realities of how sampling is collected.
=> The profiler collects the samples via a time based approach (for CPU samples), so it's possible for that approach to produce some biased sampling results for strictly periodic/consistent work loads as we do here in this test.

While this is is indeed a limitation of the profiler, I wouldn't personally consider it a bug per say


Now, regarding this test,
It does expose a real limitation of the profiler when profiling strictly periodic/consistent work loads
This issue is indeed true & can happen on all profiling engines/modes of the profiler
=> For example, nativemem tracing with interval of 10000bytes, & having two methods back to back running in a loop with methodA allocating 9999bytes & methodB allocating 1 byte (MethodA will never be sampled)

If this is to be considered a bug/limitation that needs to be fixed, that would need to be discussed in depth & don't believe that this PR is the proper place to handle that discussion.

For this test in particular to reduce the chance of failures that may happen due to possible biased sampling of periodic/consistent work loads we should re-write this test to no longer be strictly periodic/consistent,
In other words introduce some randomness to shift the run time for each method at each iteration

IMO that would make it a better smoke test for async-profiler, as most real applications don't have strictly periodic/consistent loads,
& would reduce the possibility of time bias for the whole work load as a single unit.

What are you thoughts about this?

@Baraa-Hasheesh Baraa-Hasheesh marked this pull request as draft September 10, 2025 16:10
@Baraa-Hasheesh Baraa-Hasheesh marked this pull request as ready for review September 17, 2025 16:19
Signed-off-by: Bara' Hasheesh <bara.hasheesh@gmail.com>
@apangin apangin merged commit 07b3e74 into async-profiler:master Sep 17, 2025
20 checks passed
krk pushed a commit to krk/async-profiler that referenced this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants