KEMBAR78
add asprof_get_thread_local_data by arielb1 · Pull Request #1169 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

arielb1
Copy link
Collaborator

@arielb1 arielb1 commented Mar 11, 2025

Description

Add support to asprof_get_sample_counter to allow native code to detect when a sample happened.

Motivation and context

Currently, there is no easy way for code to attach metadata about a sample. This is especially annoying for large multi-tenant applications since knowing the tenant that triggered a sample makes it easier to find out what usage pattern is involved, or to attach pollcatch metadata to catch long Tokio poll times.

Uploading metadata for every request in the system is likely to cause overload. This change allows attaching metadata only for requests that have been sampled.

How has this been tested?

This has been tested locally using a version of pollcatch.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@arielb1
Copy link
Collaborator Author

arielb1 commented Mar 11, 2025

r? @apangin

@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch from 49e8c0d to ef09cd5 Compare March 11, 2025 13:08
@arielb1 arielb1 requested a review from apangin March 11, 2025 13:08
@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch from ef09cd5 to 06e1490 Compare March 11, 2025 15:42
@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch from 06e1490 to d66a089 Compare March 11, 2025 15:42
src/profiler.cpp Outdated

uintptr_t counter = (uintptr_t)pthread_getspecific(sample_counter_key);
counter++;
pthread_setspecific(sample_counter_key, (void*)counter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a value has not been previously set (i.e. pthread_getspecific returned 0), pthread_setspecific may allocate. Therefore, calling it in a signal handler is unsafe.

I suggest pthread_setspecific is called just once per thread at a safe point. It will be used to install a pointer to the variable to be incremented. Besides signal safety, I see additional benefits of this approach:

  • A counter will be incremented only for those threads that have "subscribed" to the event by installing a thread-local pointer.
  • If an application caches the pointer, it can query the counter efficiently without making an API call. This is especially useful for Java, since it will be possible to read the counter directly in Java without JNI overhead.
  • In the future, this can be easily extended to pass additional context information between async-profiler and an appication via the same thread-local.

The question remains, who will call pthread_setspecific and when.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turned out you can do it in asprof_get_sample_counter since the API lets you

@arielb1
Copy link
Collaborator Author

arielb1 commented Mar 12, 2025

Tested the new version locally

// Gets the thread-local sample counter, which increments (not necessarily by 1) every time a signal handler is run.
DLLEXPORT uint64_t asprof_get_sample_counter(void);
typedef uint64_t (*asprof_get_sample_counter_t)(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this would be an experimental (unstable) API? No need to document it then... or there should be an explicit notice.

src/profiler.cpp Outdated
}
}

DLLEXPORT uint64_t asprof_get_sample_counter(void) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of asprof_ APIs should be in asprof.cpp.

src/profiler.cpp Outdated

u64 Profiler::recordSample(void* ucontext, u64 counter, EventType event_type, Event* event) {
atomicInc(_total_samples);
incrementSampleCounter();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking where the right place to increment the counter will be.
The counter is only relevant with JFR output, isn't it? Then recordEvent (or the places where it is called) seems a better alternative. Or can you imagine a different use case?
Note that some recorded events (e.g. free calls in native memory profiling mode) bypass Profiler::recordSample.

src/asprof.h Outdated
// `asprof_get_sample_counter` on a given thread will of course not reset the counter.
//
// This function is *not* async-signal safe.
DLLEXPORT uint64_t asprof_get_sample_counter(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: what do you think about asprof_get_tls function instead? It will return a pointer to the entire thread local area, which besides the counter may include other context-related information.
I'll certainly need something like this for Java API.

@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch from 14e37be to 23b56b4 Compare March 16, 2025 15:51
@arielb1 arielb1 changed the title add asprof_get_sample_counter add asprof_unstable_get_thread_local_data Mar 16, 2025
@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch 3 times, most recently from eb61136 to f21c648 Compare March 16, 2025 15:53
@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch from f21c648 to 717e697 Compare March 17, 2025 10:34
@arielb1
Copy link
Collaborator Author

arielb1 commented Mar 18, 2025

waiting for review

src/asprof.cpp Outdated
// calling `malloc` in weird places.
//
// Using std::atomic instead of "native" std::call_once to ensure signal safety.
static std::atomic<pthread_key_t> _profiler_data_key {(pthread_key_t)-1};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation-wise, your previous revision was cleaner, IMO. Why have you switched to a lazy initialization of pthread_key? This seems to add unnecessary complication to ensure thread safety etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I couldn't find a good non-lazy place to initialize it
  2. people might call asprof_get_thread_local_data concurrently with initializing async-profiler, I want that to work.

src/asprof.cpp Outdated

// Initialize the *global* profiler data key.
static pthread_key_t asprof_init_global_profiler_data_key() {
const std::lock_guard<std::mutex> lock(_init_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async-profiler has its own Mutex and MutexLocker classes. I'm generally cautious in adopting more libstdc++ features: libstdc++ caused many issues in the past, so I even wanted to get rid of it entirely, see #872.

src/asprof.h Outdated
// This function can return NULL in case of an allocation failure.
//
// This function is *not* async-signal safe.
DLLEXPORT asprof_unstable_thread_local_data *asprof_unstable_get_thread_local_data(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've explicitly documented the function as unstable in the code and in the docs, which is totally fine; adding unstable in the name is going to be too much ) The name is already too long.


#include "asprof.h"

// Private functions for the native API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "private" header looks somewhat artificial design-wise. There is no real division into public/private headers in the sources.
It's true that asprof.h lists publicly exported functions, and it's OK for other parts of the profiler to use these functions, but if some source needs to call an internal function, then this function has likely nothing to do with asprof* API.
In your particular case, FlightRecorder is the only caller of asprofIncrementThreadLocalSampleCounter, so why can't this function be a part of FlightRecorder then? Or perhaps a member of Profiler? When we have more thread local data in the future, it may be reasonable to extract these functions to a separate threadLocal.{h,cpp} or tls.{h,cpp}.

// If the counter is not already initialized, this function is a no-op.
//
// This function is signal-safe and safe to call racily with initialization.
void asprofIncrementThreadLocalSampleCounter(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With rare exceptions, non-local top-level functions are discouraged in async-profiler.
Static classes generally serve as namespaces. Actual C++ namespaces would be probably better, but that was a historical choice similar to the coding style adopted in HotSpot.

@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch 3 times, most recently from 280d44f to 03fefd2 Compare March 19, 2025 12:56
@arielb1 arielb1 changed the title add asprof_unstable_get_thread_local_data add asprof_get_thread_local_data Mar 20, 2025
@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch 2 times, most recently from 98cd605 to e5d8eea Compare March 20, 2025 10:50
@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch 2 times, most recently from cc72eb3 to 0b9b856 Compare March 20, 2025 11:27
src/asprof.h Outdated
// This function is *not* async-signal-safe. However, it is safe to call concurrently
// with async-profiler operations, including initialization.
DLLEXPORT asprof_thread_local_data *asprof_get_thread_local_data(void);
typedef asprof_thread_local_data *(*asprof_unstable_get_thread_local_data_t)(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unstable in the name remained

@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch 3 times, most recently from e4147f9 to efaff2b Compare March 20, 2025 15:07
@arielb1 arielb1 force-pushed the asprof-get-sample-counter branch from efaff2b to 9823966 Compare March 20, 2025 15:08
@apangin apangin merged commit 7152ba0 into async-profiler:master Mar 20, 2025
9 checks passed
krk pushed a commit to krk/async-profiler that referenced this pull request Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants