-
Notifications
You must be signed in to change notification settings - Fork 937
Wall interference #1417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wall interference #1417
Conversation
src/wallClock.cpp
Outdated
|
||
void WallClock::uninterruptibleSleep(u64 nanos) { | ||
struct timespec ts = {(time_t)(nanos / 1000000000), (long)(nanos % 1000000000)}; | ||
while (nanosleep(&ts, &ts) < 0 && errno == EINTR && _running); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you read BUGS section of the nanosleep
man page? I think it describes exactly this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm you mean this bug right?
If a program that catches signals and uses nanosleep() receives
signals at a very high rate, then scheduling delays and rounding
errors in the kernel's calculation of the sleep interval and the
returned remain value mean that the remain value may steadily
increase on successive restarts of the nanosleep() call. To avoid
such problems, use clock_nanosleep(2) with the TIMER_ABSTIME flag
to sleep to an absolute deadline.
-> As I don't believe we should care about the Linux 2.4 problem
Regarding the "very high rate" it sounds pretty vague & I don't know if async-profiler maximum rate is considered "very high" (async-profiler has a minimum sleep that basically caps the max rate)
But it does make sense to change the logic for external actors other than the profiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I also don't know how high the rate should be to see the effect of the bug - maybe it's not an issue at all, but at least it's worth checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did find that using nanosleep
could potentially lead to an infinite loop, as for some reason the remaining time is reported as a value higher than the originally requested sleep time when the interruption happens too quickly (Yup 🤦)
=> I noticed the issue happens when the interruption happens in periods less than 60 micro seconds
=> Please note if you try to make the sleep uninterruptible while (nanosleep(&ts, &ts) < 0 && errno == EINTR)
, the remainining time could potentially keep increasing if the interruption keep happening too frequently
I would vote to change to the clock_nanosleep
function as the documentation suggests, as the TIMER_ABSTIME
has build in protection against such possible cases
If flags is TIMER_ABSTIME, then t is interpreted as an absolute
time as measured by the clock, clockid. If t is less than or
equal to the current value of the clock, then clock_nanosleep()
returns immediately without suspending the calling thread.
=> I will check what's the macOS equivalent/needed changes
I did my testing using the following code
#include <cstdio>
#include <time.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
typedef unsigned long long u64;
void sigio_handler(int signum) {
}
typedef struct {
pthread_t thread_id;
u64 time;
} thread_args_t;
u64 nanotime() {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return (u64)ts.tv_sec * 1000000000 + ts.tv_nsec;
}
void* sleepyThread(void *args) {
while (true) {
u64 start = nanotime();
struct timespec ts = {1, 0};
nanosleep(&ts, &ts);
u64 sleptTime = nanotime() - start;
u64 remaining = (ts.tv_sec * 1000000000ULL) + ts.tv_nsec;
fprintf(stderr, "Slept for %llu, Remaining = %llu\n", sleptTime, remaining);
}
}
void* alarmThread(void* args) {
thread_args_t thread_args = *(thread_args_t*)args;
u64 time = thread_args.time;
u64 start = nanotime();
while (true) {
if (nanotime() - start >= time) {
start = nanotime();
pthread_kill(thread_args.thread_id, SIGIO);
}
}
}
int main() {
signal(SIGIO, sigio_handler);
pthread_t thread1;
pthread_create(&thread1, NULL, sleepyThread, NULL);
thread_args_t args;
args.thread_id = thread1;
args.time = 1000 * 60; // Change to adjust how frequently we signal the thread
pthread_t thread2;
pthread_create(&thread2, NULL, alarmThread, &args);
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nanosleep issue doesn't seem to really effect macOS from my testing
The Apple documentation doesn't reference similar issues https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/nanosleep.2.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here ba1f5c5
src/j9WallClock.cpp
Outdated
} | ||
|
||
void J9WallClock::uninterruptibleSleep(u64 nanos) const { | ||
#ifdef __APPLE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apangin I did play around with creating this function in the os
files
But decided against it for 2 main reasons:
-
I want to avoid creating an uninterruptible sleep in a common location as that looks like a possible function that could be miss used in future developments
-
As mentioned previously we should check the
_running
flag, while it's possible to pass to a theoretical common function I didn't like any of the implementations I thought of (lambda, pass by reference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two issues with your current implementation:
- non-trivial piece of code is copy-pasted verbatim across two places;
- platform-specific #ifdefs in shared code.
Individually they are not too terrible, but overall solution becomes not very nice.
I agree that passing a pointer to _running
flag is not ideal, but IMO it's better than the current approach.
I'd propose to move uninterruptibleSleep
to os_<platform>.cpp
with a pointer to the running flag as an argument. In the future, if we need uninterruptibleSleep
somewhere else, we can generify this with a function argument, but until then, let's keep it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here 1c9f6c7
Signed-off-by: Andrei Pangin <1749416+apangin@users.noreply.github.com>
Description
If an external actor interrupts the
sleep
inside thewall
engine that would result in incorrect sampling frequency,This can be easily observed by starting two different profilers in wall mode or simply manually signaling the process to interrupt the sleep
This is a simple change to make the sleep uninterruptible,
I've avoided declaring this as a common function to allow interruption when the profiler is stopped when
stop
is called (need to check _running)Related issues
N/A
Motivation and context
avoid changing the sampling frequency when an external actor interrupts the profiler sleep
How has this been tested?
manual testing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.