KEMBAR78
Add nativemem test with dlopen after profiler start. by krk · Pull Request #1243 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

@krk
Copy link
Contributor

@krk krk commented Apr 17, 2025

Description

Parse symbols and hook functions of the main executable, when profiler is not loaded via LD_PRELOAD.

Without this fix, profiler is unable to hook non-java executables if dlopen is called after starting the profiler.

Related issues

Issue was introduced in #1220 as dlerror may not return NULL for the main executable.

Thanks for the repro in #1241 @Baraa-Hasheesh and for discussions @fandreuz and @Baraa-Hasheesh.

Fixes #1233 and #1241.

How has this been tested?

make test-java TESTS=nativemem


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

// Main executable will return NULL from handle, we need to parse it anyway.
// dlopen is not required in this case as the main exe cannot be unloaded.
bool isMainExe = handle == NULL && exePath != NULL && strcmp(lib.file, exePath) == 0;
bool dlopenSuccess = handle != NULL || dlerror_str == NULL || OS::isMusl();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need OS::isMusl() anymore? The only situation when this was appropriate is for the main executable, in which case isMainExe is true anyways


#include <stdlib.h>

__attribute__((visibility("default"))) void* call_malloc(size_t size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could reuse Java_test_nativemem_Native_malloc from jnimalloc.c instead of creating a new file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be confusing to the reader. We do not use JNI but method name looks like JNI and has this signature:

JNIEXPORT jlong JNICALL Java_test_nativemem_Native_malloc(JNIEnv* env, jclass clazz, jlong size)

as opposed to what we need:

void* call_malloc(size_t size)

Comment on lines 28 to 30
return test.nameSuffix().isEmpty()
? className() + '.' + m.getName()
: className() + '.' + m.getName() + ' ' + test.nameSuffix();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return test.nameSuffix().isEmpty()
? className() + '.' + m.getName()
: className() + '.' + m.getName() + ' ' + test.nameSuffix();
return String.format("%s.%s %s", className(), m.getName(), test.nameSuffix()).trim();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kerem's version is better: it's more obvious and far more efficient (although the latter is not important here, let's not support bad practice).

#define ASSERT_NO_DLERROR() \
err = dlerror(); \
if (err != NULL) { \
printf("%s\n", err); \
Copy link
Contributor

@fandreuz fandreuz Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stderr? Same for L21 and L39

#include <stdlib.h>
#include <string.h>

#define ASSERT_NO_DLERROR() \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of #define over a function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can exit directly, without checking the return value of a new function. I used macros here to reduce repetition.

Copy link
Member

@apangin apangin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for root causing the issue.

buf[size] = 0;
exePath = buf;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a simpler and more efficient way to find main executable:

    const void* main_phdr = NULL;
    dl_iterate_phdr([](struct dl_phdr_info* info, size_t size, void* data) {
        *(const void**)data = info->dlpi_phdr;
        return 1;
    }, &main_phdr);

Then check if main_phdr >= lib.image_base && main_phdr < lib.map_end

Calling dl_iterate_phdr should be fine as long as there are no allocations inside.


// Main executable will return NULL from handle, we need to parse it anyway.
// dlopen is not required in this case as the main exe cannot be unloaded.
bool isMainExe = handle == NULL && exePath != NULL && strcmp(lib.file, exePath) == 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snake_case for variable names.

@Test(os = Os.LINUX, sh = "%testbin/profile_with_dlopen dlopen_first %f.jfr", output = true, env = {"LD_LIBRARY_PATH=build/test/lib:build/lib"}, nameSuffix = "dlopen_first")
@Test(os = Os.LINUX, sh = "%testbin/profile_with_dlopen profile_first %f.jfr", output = true, env = {"LD_LIBRARY_PATH=build/test/lib:build/lib"}, nameSuffix = "profile_first")
@Test(os = Os.LINUX, sh = "LD_PRELOAD=%lib %testbin/profile_with_dlopen dlopen_first %f.jfr", output = true, env = {"LD_LIBRARY_PATH=build/test/lib:build/lib"}, nameSuffix = "dlopen_first with LD_PRELOAD")
@Test(os = Os.LINUX, sh = "LD_PRELOAD=%lib %testbin/profile_with_dlopen profile_first %f.jfr", output = true, env = {"LD_LIBRARY_PATH=build/test/lib:build/lib"}, nameSuffix = "profile_first with LD_PRELOAD")
Copy link
Contributor

@Baraa-Hasheesh Baraa-Hasheesh Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a new test that checks the dlopen patching on both MacOs & Linux
As this issue of not patching the main executable is also relevant to MacOs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Macos support is "best effort", will leave it out from the current PR.

array->add(cc);
}

free(exePath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this safe? as in some cases it points toward the buf[PATH_MAX] which is defined in the stack not the heap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This will be deleted in the new version.

exit(1); \
}

asprof_error_str_t _asprof_error_str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this global variable, you can define it in the main function

@Baraa-Hasheesh
Copy link
Contributor

@krk @apangin @fandreuz

The above fix doesn't fully fix the issue of patching libs,
Due to how the dynamic linker work & common optimization it does -> if you do the following steps

void* lib1 = dlopen("libcallsmalloc.so", RTLD_NOW);
dlclose(lib1);
void2 lib2 = dlopen("libcallsmalloc.so", RTLD_NOW);

You will actually find that the lib1 & lib2 point to the same address even though the shared objects were deleted & recreated

=> you can try this in the new test by doing the following
image

This means due to the Async-profiler built in optimization of skipping already patched libs the second dlopen will never be patched as it will be seen as already patched from the first dlopen

This is quite an old issue (Even the old logic before #1220 has the issue) that could require significant changes to address

  • Hooking the dlclose calls & actually removing the patches from the async profiler cache if reference counts reaches 0 for a lib.
  • We could need locking logic as it's possible race conditions may happen when dlclose-ing & dlopen-ing from multiple threads.

Interesting edge cases will be handling the RTLD_NODELETE flag as even if the reference counter reaches 0, there is no need to cleanup the async-profiler local cache as the shared objects are never unloaded.

@Baraa-Hasheesh
Copy link
Contributor

Creating a new issue here #1245

// dlopen is not required in this case as the main exe cannot be unloaded.
bool isMainExe = handle == NULL && main_phdr >= lib.image_base && main_phdr < lib.map_end;
bool dlopenSuccess = handle != NULL || dlerror_str == NULL;
if (isMainExe || dlopenSuccess) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

bool is_main_exe = main_phdr >= lib.image_base && main_phdr < lib.map_end;
if (handle != NULL || dlerror() == NULL || is_main_exe) {

return className() + '.' + m.getName();
return test.nameSuffix().isEmpty()
? className() + '.' + m.getName()
: className() + '.' + m.getName() + ' ' + test.nameSuffix();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testName() is also a name of a log subdirectory; spaces in directory names are not convenient.

@apangin apangin merged commit fa417c8 into async-profiler:master Apr 19, 2025
10 checks passed
krk added a commit to krk/async-profiler that referenced this pull request Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NativememTests.malloc_plt_dyn failing "sometimes"

4 participants