-
Notifications
You must be signed in to change notification settings - Fork 937
add asprof_get_thread_local_data #1169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add asprof_get_thread_local_data #1169
Conversation
r? @apangin |
49e8c0d
to
ef09cd5
Compare
ef09cd5
to
06e1490
Compare
06e1490
to
d66a089
Compare
src/profiler.cpp
Outdated
|
||
uintptr_t counter = (uintptr_t)pthread_getspecific(sample_counter_key); | ||
counter++; | ||
pthread_setspecific(sample_counter_key, (void*)counter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a value has not been previously set (i.e. pthread_getspecific
returned 0
), pthread_setspecific
may allocate. Therefore, calling it in a signal handler is unsafe.
I suggest pthread_setspecific
is called just once per thread at a safe point. It will be used to install a pointer to the variable to be incremented. Besides signal safety, I see additional benefits of this approach:
- A counter will be incremented only for those threads that have "subscribed" to the event by installing a thread-local pointer.
- If an application caches the pointer, it can query the counter efficiently without making an API call. This is especially useful for Java, since it will be possible to read the counter directly in Java without JNI overhead.
- In the future, this can be easily extended to pass additional context information between async-profiler and an appication via the same thread-local.
The question remains, who will call pthread_setspecific
and when.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turned out you can do it in asprof_get_sample_counter
since the API lets you
Tested the new version locally |
docs/ProfilingNonJavaApplications.md
Outdated
// Gets the thread-local sample counter, which increments (not necessarily by 1) every time a signal handler is run. | ||
DLLEXPORT uint64_t asprof_get_sample_counter(void); | ||
typedef uint64_t (*asprof_get_sample_counter_t)(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this would be an experimental (unstable) API? No need to document it then... or there should be an explicit notice.
src/profiler.cpp
Outdated
} | ||
} | ||
|
||
DLLEXPORT uint64_t asprof_get_sample_counter(void) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation of asprof_
APIs should be in asprof.cpp.
src/profiler.cpp
Outdated
|
||
u64 Profiler::recordSample(void* ucontext, u64 counter, EventType event_type, Event* event) { | ||
atomicInc(_total_samples); | ||
incrementSampleCounter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking where the right place to increment the counter will be.
The counter is only relevant with JFR output, isn't it? Then recordEvent
(or the places where it is called) seems a better alternative. Or can you imagine a different use case?
Note that some recorded events (e.g. free
calls in native memory profiling mode) bypass Profiler::recordSample
.
src/asprof.h
Outdated
// `asprof_get_sample_counter` on a given thread will of course not reset the counter. | ||
// | ||
// This function is *not* async-signal safe. | ||
DLLEXPORT uint64_t asprof_get_sample_counter(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea: what do you think about asprof_get_tls
function instead? It will return a pointer to the entire thread local area, which besides the counter may include other context-related information.
I'll certainly need something like this for Java API.
14e37be
to
23b56b4
Compare
eb61136
to
f21c648
Compare
f21c648
to
717e697
Compare
waiting for review |
src/asprof.cpp
Outdated
// calling `malloc` in weird places. | ||
// | ||
// Using std::atomic instead of "native" std::call_once to ensure signal safety. | ||
static std::atomic<pthread_key_t> _profiler_data_key {(pthread_key_t)-1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation-wise, your previous revision was cleaner, IMO. Why have you switched to a lazy initialization of pthread_key? This seems to add unnecessary complication to ensure thread safety etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I couldn't find a good non-lazy place to initialize it
- people might call asprof_get_thread_local_data concurrently with initializing async-profiler, I want that to work.
src/asprof.cpp
Outdated
|
||
// Initialize the *global* profiler data key. | ||
static pthread_key_t asprof_init_global_profiler_data_key() { | ||
const std::lock_guard<std::mutex> lock(_init_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async-profiler has its own Mutex
and MutexLocker
classes. I'm generally cautious in adopting more libstdc++ features: libstdc++ caused many issues in the past, so I even wanted to get rid of it entirely, see #872.
src/asprof.h
Outdated
// This function can return NULL in case of an allocation failure. | ||
// | ||
// This function is *not* async-signal safe. | ||
DLLEXPORT asprof_unstable_thread_local_data *asprof_unstable_get_thread_local_data(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've explicitly documented the function as unstable in the code and in the docs, which is totally fine; adding unstable
in the name is going to be too much ) The name is already too long.
src/asprofPrivate.h
Outdated
|
||
#include "asprof.h" | ||
|
||
// Private functions for the native API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "private" header looks somewhat artificial design-wise. There is no real division into public/private headers in the sources.
It's true that asprof.h
lists publicly exported functions, and it's OK for other parts of the profiler to use these functions, but if some source needs to call an internal function, then this function has likely nothing to do with asprof*
API.
In your particular case, FlightRecorder
is the only caller of asprofIncrementThreadLocalSampleCounter
, so why can't this function be a part of FlightRecorder
then? Or perhaps a member of Profiler
? When we have more thread local data in the future, it may be reasonable to extract these functions to a separate threadLocal.{h,cpp}
or tls.{h,cpp}
.
src/asprofPrivate.h
Outdated
// If the counter is not already initialized, this function is a no-op. | ||
// | ||
// This function is signal-safe and safe to call racily with initialization. | ||
void asprofIncrementThreadLocalSampleCounter(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With rare exceptions, non-local top-level functions are discouraged in async-profiler.
Static classes generally serve as namespaces. Actual C++ namespaces would be probably better, but that was a historical choice similar to the coding style adopted in HotSpot.
280d44f
to
03fefd2
Compare
98cd605
to
e5d8eea
Compare
cc72eb3
to
0b9b856
Compare
src/asprof.h
Outdated
// This function is *not* async-signal-safe. However, it is safe to call concurrently | ||
// with async-profiler operations, including initialization. | ||
DLLEXPORT asprof_thread_local_data *asprof_get_thread_local_data(void); | ||
typedef asprof_thread_local_data *(*asprof_unstable_get_thread_local_data_t)(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unstable
in the name remained
e4147f9
to
efaff2b
Compare
efaff2b
to
9823966
Compare
Description
Add support to
asprof_get_sample_counter
to allow native code to detect when a sample happened.Motivation and context
Currently, there is no easy way for code to attach metadata about a sample. This is especially annoying for large multi-tenant applications since knowing the tenant that triggered a sample makes it easier to find out what usage pattern is involved, or to attach pollcatch metadata to catch long Tokio poll times.
Uploading metadata for every request in the system is likely to cause overload. This change allows attaching metadata only for requests that have been sampled.
How has this been tested?
This has been tested locally using a version of pollcatch.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.