KEMBAR78
Fix `NativememTests#dlopenCustomLib` on Alpine by fandreuz · Pull Request #1254 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

fandreuz
Copy link
Contributor

@fandreuz fandreuz commented Apr 23, 2025

Description

This PR fixes NativememTests#dlopenCustomLib on Alpine.

Hooks::init includes the following three lines:

_orig_pthread_create = ADDRESS_OF(pthread_create);
_orig_pthread_exit = ADDRESS_OF(pthread_exit);
_orig_dlopen = ADDRESS_OF(dlopen);

ADDRESS_OF is defined as follows:

#define ADDRESS_OF(sym) ({ \
    void* addr = dlsym(RTLD_NEXT, #sym); \
    addr != NULL ? (sym##_t)addr : sym;  \
})

and ignores dlerror, which could then leak in subsequent parts of the code, even though sym is eventually patched using the original symbol rather than the one resolved by dlsym. This is why the ASSERT_NO_DLERROR at L48 fails with the following error:

Symbol not found: dlopen

A similar problem happens a bit below, in the ASSERT_NO_DLERROR after (call_malloc_t)dlsym(lib, "call_malloc"):

Library libjvm.so is not already loaded

This is coming from Profiler::start -> VM::tryAttach, which does not find any libjvm.so (as expected) and does not reset the dlerror.

Related issues

#1226

Motivation and context

Tests introduced in #1243 do not work properly on Alpine, presumably because dlerror is not used in the same way in glibc and musl.

How has this been tested?

docker run \
   -it --privileged --rm \
   -v $(pwd):/async-profiler amazoncorretto:11-alpine-jdk \
   sh -c 'apk add --no-cache musl-dev util-linux make gcc g++ linux-headers && \
      cd /async-profiler && \
      rm -rf build && \
      make -j test-java TESTS=jfr'

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@fandreuz
Copy link
Contributor Author

Should we instead make sure dlerror() is reset appropriately by Async-Profiler? I believe this kind of failures could also happen in different parts of the code, and is not specific to this test

@fandreuz fandreuz requested review from apangin and krk April 23, 2025 10:31
@fandreuz fandreuz added the test label Apr 23, 2025
@fandreuz
Copy link
Contributor Author

@apangin
Copy link
Member

apangin commented Apr 23, 2025

I think a better fix (and a more canonical way to check for errors) will be not to call dlerror() unless the previous function fails, i.e. returns NULL.

@fandreuz
Copy link
Contributor Author

fandreuz commented Apr 24, 2025

I think a better fix (and a more canonical way to check for errors) will be not to call dlerror() unless the previous function fails, i.e. returns NULL.

If you mean that we should not always call dlerror() in the test code, the dlerror buffer can be full even with no detectable failure for the code calling Async-Profiler API, or even with no failure at all, e.g. when calling asprof_execute for a non-JVM process. Am I missing something?

@krk
Copy link
Contributor

krk commented Apr 24, 2025

Should we instead make sure dlerror() is reset appropriately by Async-Profiler?

I would think we can log such errors, or ignore them purposefully by calling dlerror from async-profiler. Ideally, returning from asprof_execute should not leave a dlerror that was already handled or ignored.

@fandreuz
Copy link
Contributor Author

@krk @apangin moved the fix in Async-Profiler main source set: f5ae5ce

@apangin
Copy link
Member

apangin commented Apr 24, 2025

I mean, calling dlerror() alone is not a proper way of checking errors. dlerror() should be only called if the function that you are testing fails. So I'd modify ASSERT_NO_DLERROR() and its usages instead.

@krk
Copy link
Contributor

krk commented Apr 24, 2025

How do you determine if a dl function is failed, without calling dlerror? The man example always calls dlerror: https://linux.die.net/man/3/dlopen

@apangin
Copy link
Member

apangin commented Apr 24, 2025

How do you determine if a dl function is failed, without calling dlerror?

By checking return value. dlopen and dlsym return NULL when fail. If they return non-null, they've succeeded, even though dlerror may return junk.

@krk
Copy link
Contributor

krk commented Apr 24, 2025

That sounds good, but in the example we can see that dlerror is called regardless of dlsym return value:

   *(void **) (&cosine) = dlsym(handle, "cos");

   if ((error = dlerror()) != NULL)  {
        fprintf(stderr, "%s\n", error);
        exit(EXIT_FAILURE);
    }

@fandreuz
Copy link
Contributor Author

@krk @apangin what's the conclusion here?

@krk
Copy link
Contributor

krk commented Apr 25, 2025

We can change the tests not to be too strict about dlerror return values, as long as dlopen/dlsym functions do not return null.

@apangin apangin merged commit f5fd5b0 into async-profiler:master Apr 28, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants