Wall interference #1417

Baraa-Hasheesh · 2025-07-28T11:07:18Z

Description

If an external actor interrupts the sleep inside the wall engine that would result in incorrect sampling frequency,
This can be easily observed by starting two different profilers in wall mode or simply manually signaling the process to interrupt the sleep

This is a simple change to make the sleep uninterruptible,
I've avoided declaring this as a common function to allow interruption when the profiler is stopped when stop is called (need to check _running)

Related issues

N/A

Motivation and context

avoid changing the sampling frequency when an external actor interrupts the profiler sleep

How has this been tested?

manual testing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

apangin · 2025-08-11T00:05:28Z

src/wallClock.cpp


+void WallClock::uninterruptibleSleep(u64 nanos) {
+    struct timespec ts = {(time_t)(nanos / 1000000000), (long)(nanos % 1000000000)};
+    while (nanosleep(&ts, &ts) < 0 && errno == EINTR && _running);


Have you read BUGS section of the nanosleep man page? I think it describes exactly this case.

Just to confirm you mean this bug right?

If a program that catches signals and uses nanosleep() receives signals at a very high rate, then scheduling delays and rounding errors in the kernel's calculation of the sleep interval and the returned remain value mean that the remain value may steadily increase on successive restarts of the nanosleep() call. To avoid such problems, use clock_nanosleep(2) with the TIMER_ABSTIME flag to sleep to an absolute deadline.

-> As I don't believe we should care about the Linux 2.4 problem

Regarding the "very high rate" it sounds pretty vague & I don't know if async-profiler maximum rate is considered "very high" (async-profiler has a minimum sleep that basically caps the max rate)

But it does make sense to change the logic for external actors other than the profiler

Yeah, I also don't know how high the rate should be to see the effect of the bug - maybe it's not an issue at all, but at least it's worth checking.

I did find that using nanosleep could potentially lead to an infinite loop, as for some reason the remaining time is reported as a value higher than the originally requested sleep time when the interruption happens too quickly (Yup 🤦)
=> I noticed the issue happens when the interruption happens in periods less than 60 micro seconds
=> Please note if you try to make the sleep uninterruptible while (nanosleep(&ts, &ts) < 0 && errno == EINTR) , the remainining time could potentially keep increasing if the interruption keep happening too frequently

I would vote to change to the clock_nanosleep function as the documentation suggests, as the TIMER_ABSTIME has build in protection against such possible cases

If flags is TIMER_ABSTIME, then t is interpreted as an absolute time as measured by the clock, clockid. If t is less than or equal to the current value of the clock, then clock_nanosleep() returns immediately without suspending the calling thread.

=> I will check what's the macOS equivalent/needed changes

I did my testing using the following code

#include <cstdio> #include <time.h> #include <pthread.h> #include <unistd.h> #include <signal.h> typedef unsigned long long u64; void sigio_handler(int signum) { } typedef struct { pthread_t thread_id; u64 time; } thread_args_t; u64 nanotime() { struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); return (u64)ts.tv_sec * 1000000000 + ts.tv_nsec; } void* sleepyThread(void *args) { while (true) { u64 start = nanotime(); struct timespec ts = {1, 0}; nanosleep(&ts, &ts); u64 sleptTime = nanotime() - start; u64 remaining = (ts.tv_sec * 1000000000ULL) + ts.tv_nsec; fprintf(stderr, "Slept for %llu, Remaining = %llu\n", sleptTime, remaining); } } void* alarmThread(void* args) { thread_args_t thread_args = *(thread_args_t*)args; u64 time = thread_args.time; u64 start = nanotime(); while (true) { if (nanotime() - start >= time) { start = nanotime(); pthread_kill(thread_args.thread_id, SIGIO); } } } int main() { signal(SIGIO, sigio_handler); pthread_t thread1; pthread_create(&thread1, NULL, sleepyThread, NULL); thread_args_t args; args.thread_id = thread1; args.time = 1000 * 60; // Change to adjust how frequently we signal the thread pthread_t thread2; pthread_create(&thread2, NULL, alarmThread, &args); pthread_join(thread1, NULL); pthread_join(thread2, NULL); }

The nanosleep issue doesn't seem to really effect macOS from my testing

The Apple documentation doesn't reference similar issues https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/nanosleep.2.html

Updated here ba1f5c5

Baraa-Hasheesh · 2025-08-13T07:50:40Z

src/j9WallClock.cpp

 }

+void J9WallClock::uninterruptibleSleep(u64 nanos) const {
+#ifdef __APPLE__


@apangin I did play around with creating this function in the os files
But decided against it for 2 main reasons:

I want to avoid creating an uninterruptible sleep in a common location as that looks like a possible function that could be miss used in future developments

As mentioned previously we should check the _running flag, while it's possible to pass to a theoretical common function I didn't like any of the implementations I thought of (lambda, pass by reference)

I see two issues with your current implementation:

non-trivial piece of code is copy-pasted verbatim across two places;

platform-specific #ifdefs in shared code.

Individually they are not too terrible, but overall solution becomes not very nice.

I agree that passing a pointer to _running flag is not ideal, but IMO it's better than the current approach.
I'd propose to move uninterruptibleSleep to os_<platform>.cpp with a pointer to the running flag as an argument. In the future, if we need uninterruptibleSleep somewhere else, we can generify this with a function argument, but until then, let's keep it simple.

Updated here 1c9f6c7

Signed-off-by: Andrei Pangin <1749416+apangin@users.noreply.github.com>

Baraa-Hasheesh added 2 commits July 28, 2025 09:47

prevent external interrupts from effecting the wall sampling frequency

e5254ff

includes

682452b

Baraa-Hasheesh marked this pull request as draft July 28, 2025 11:54

clang-tidy

0a64589

Baraa-Hasheesh marked this pull request as ready for review July 28, 2025 14:48

apangin reviewed Aug 11, 2025

View reviewed changes

Merge branch 'master' into wall-interference

aed498e

Baraa-Hasheesh marked this pull request as draft August 12, 2025 13:45

change logic to use clock_nanosleep for linux

ba1f5c5

Baraa-Hasheesh commented Aug 13, 2025

View reviewed changes

Baraa-Hasheesh marked this pull request as ready for review August 13, 2025 07:53

Baraa-Hasheesh and others added 4 commits August 13, 2025 13:38

create common function in os files

1c9f6c7

macos

598f7a9

Add comment about nanosleep bug

f6205fa

Signed-off-by: Andrei Pangin <1749416+apangin@users.noreply.github.com>

change condition order to avoid sleep on wakeup

dfeeb5f

apangin merged commit 0e551b0 into async-profiler:master Aug 13, 2025
20 checks passed

fandreuz mentioned this pull request Aug 19, 2025

WallTests.cpuWall fails sometimes #1332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wall interference #1417

Wall interference #1417

Uh oh!

Baraa-Hasheesh commented Jul 28, 2025 •

edited

Loading

Uh oh!

apangin Aug 11, 2025

Uh oh!

Baraa-Hasheesh Aug 11, 2025

Uh oh!

apangin Aug 11, 2025

Uh oh!

Baraa-Hasheesh Aug 12, 2025 •

edited

Loading

Uh oh!

Baraa-Hasheesh Aug 12, 2025

Uh oh!

Baraa-Hasheesh Aug 13, 2025

Uh oh!

Baraa-Hasheesh Aug 13, 2025 •

edited

Loading

Uh oh!

apangin Aug 13, 2025 •

edited

Loading

Uh oh!

Baraa-Hasheesh Aug 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Wall interference #1417

Wall interference #1417

Uh oh!

Conversation

Baraa-Hasheesh commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Motivation and context

How has this been tested?

Uh oh!

apangin Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Baraa-Hasheesh Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

apangin Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Baraa-Hasheesh Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Baraa-Hasheesh Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Baraa-Hasheesh Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Baraa-Hasheesh Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apangin Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Baraa-Hasheesh Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Baraa-Hasheesh commented Jul 28, 2025 •

edited

Loading

Baraa-Hasheesh Aug 12, 2025 •

edited

Loading

Baraa-Hasheesh Aug 13, 2025 •

edited

Loading

apangin Aug 13, 2025 •

edited

Loading