-
Notifications
You must be signed in to change notification settings - Fork 937
Description
Describe the bug
If the needed imports happens to be residing in rela.dyn and rela.plt both, not only in one of them, async-profiler fails to start with the message [ERROR] Could not set pthread hook.
This is becauseElfParser::parseDynamicSection() does not parse rela.dyn if it sees that rela.plt is present in the shared library.
Please see the Additional Information/Context.
Expected vs. actual behavior
Expected Behavior
The async-profiler to be run normally.
Actual Behavior
Error stating async-profiler couldn't set pthread hook
$ docker run async-profiler-liberica:latest
[ERROR] Could not set pthread hook
|\ _,,,--,,_
/,`.-'`' ._ \-;;,_
_______ __|,4- ) )_ .;.(__`'-'__ ___ __ _ ___ _______
| | '---''(_/._)-'(_\_) | | | | | | | | |
| _ | ___|_ _| | | | | |_| | | | __ _ _
| |_| | |___ | | | | | | | | | | \ \ \ \
| ___| ___| | | | _| |___| | _ | | _| \ \ \ \
| | | |___ | | | |_| | | | | | | |_ ) ) ) )
|___| |_______| |___| |_______|_______|___|_| |__|___|_______| / / / /
==================================================================/_/_/_/
:: Built with Spring Boot :: 3.3.5
2024-11-25T11:23:56.744Z INFO 1 --- [ main] o.s.s.petclinic.PetClinicApplication : Starting PetClinicApplication v3.3.0 using Java 21.0.5 with PID 1 (/app/petclinic.jar started by root in /app)
2024-11-25T11:23:56.747Z INFO 1 --- [ main] o.s.s.petclinic.PetClinicApplication : No active profile set, falling back to 1 default profile: "default"
2024-11-25T11:23:57.162Z INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2024-11-25T11:23:57.187Z INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 21 ms. Found 2 JPA repository interfaces.
2024-11-25T11:23:57.553Z INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port 8080 (http)
2024-11-25T11:23:57.563Z INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2024-11-25T11:23:57.563Z INFO 1 --- [ main] o.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/10.1.31]
2024-11-25T11:23:57.593Z INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2024-11-25T11:23:57.594Z INFO 1 --- [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 827 ms
2024-11-25T11:23:57.715Z INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2024-11-25T11:23:57.817Z INFO 1 --- [ main] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Added connection conn0: url=jdbc:h2:mem:be5f7d7d-a36a-4953-a8cf-b39f70b5d7cb user=SA
2024-11-25T11:23:57.818Z INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
2024-11-25T11:23:57.903Z INFO 1 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [name: default]
2024-11-25T11:23:57.939Z INFO 1 --- [ main] org.hibernate.Version : HHH000412: Hibernate ORM core version 6.5.3.Final
2024-11-25T11:23:57.957Z INFO 1 --- [ main] o.h.c.internal.RegionFactoryInitiator : HHH000026: Second-level cache disabled
2024-11-25T11:23:58.079Z INFO 1 --- [ main] o.s.o.j.p.SpringPersistenceUnitInfo : No LoadTimeWeaver setup: ignoring JPA class transformer
2024-11-25T11:23:58.525Z INFO 1 --- [ main] o.h.e.t.j.p.i.JtaPlatformInitiator : HHH000489: No JTA platform available (set 'hibernate.transaction.jta.platform' to enable JTA platform integration)
2024-11-25T11:23:58.526Z INFO 1 --- [ main] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default'
2024-11-25T11:23:58.711Z INFO 1 --- [ main] o.s.d.j.r.query.QueryEnhancerFactory : Hibernate is in classpath; If applicable, HQL parser will be used.
2024-11-25T11:23:59.282Z INFO 1 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 14 endpoints beneath base path '/actuator'
2024-11-25T11:23:59.320Z INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path '/'
2024-11-25T11:23:59.326Z INFO 1 --- [ main] o.s.s.petclinic.PetClinicApplication : Started PetClinicApplication in 2.758 seconds (process running for 3.015)
^C2024-11-25T11:24:00.807Z INFO 1 --- [ionShutdownHook] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2024-11-25T11:24:00.808Z INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2024-11-25T11:24:00.810Z INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
Reproduction Steps
-
Build the image(see Dockerfile below):
docker build -t async-profiler-liberica . -
Run the container:
docker run async-profiler-liberica:latest
FROM bellsoft/liberica-runtime-container:jdk-21-stream-musl as builder
WORKDIR /app
RUN apk add libstdc++ git make g++ linux-headers
RUN git clone https://github.com/spring-projects/spring-petclinic.git \
&& cd spring-petclinic \
&& ./gradlew -x test clean build
RUN git clone https://github.com/async-profiler/async-profiler.git \
&& cd async-profiler \
&& git checkout b174feb \
&& make \
&& mkdir -p /app/profiler \
&& cp -r build/* /app/profiler
FROM bellsoft/liberica-runtime-container:jre-21-stream-musl
WORKDIR /app
EXPOSE 8082
RUN apk add libstdc++
COPY --from=builder /app/spring-petclinic/build/libs/spring-petclinic-*.jar /app/petclinic.jar
COPY --from=builder /app/profiler /app/profiler
ENTRYPOINT ["java", "-agentpath:/app/profiler/lib/libasyncProfiler.so=start,event=cpu,file=/tmp/profile.html", \
"-jar", "-XX:MaxRAMPercentage=80.0", "/app/petclinic.jar"]Additional Information/Context
More info about the issue
When starting the profiler, Profiler::start() calls perfEvents:start(). This then in turn checks if thread hooks are set up properly and then returns:
return lib != NULL && (_pthread_entry = lib->findImport(im_pthread_setspecific)) != NULL;In order it to return true it checks if Java libjvm.so(in this case) is defined and if import [im_pthread_setspecific](https://github.com/async-profiler/async-profiler/blob/master/src/codeCache.h#L25) in _imports is not null.
In lib->findImport(im_pthread_setspecific), it fails to find pthread_setspecific because it was not added previously in CodeCache::addImport(). I will describe this next.
When the async-profiler gets first started, these calls are made in the following order: VM::init()->Profiler::updateSymbols()->Symbols::parseLibraries()...->ElfParser::parseDynamicSection()->CodeCache::addImport().
Important thing is parseDynamicSection() calls addImport() in these lines. (_cc->addimport(...)):
const char* base = this->base();
if (jmprel != NULL && pltrelsz != 0) {
// Parse .rela.plt table
for (size_t offs = 0; offs < pltrelsz; offs += relent) {
ElfRelocation* r = (ElfRelocation*)(jmprel + offs);
ElfSymbol* sym = (ElfSymbol*)(symtab + ELF_R_SYM(r->r_info) * syment);
if (sym->st_name != 0) {
_cc->addImport((void**)(base + r->r_offset), strtab + sym->st_name);
}
}
//Here if the first conditional (jmprel != NULL && pltrelsz != 0) evaluates to true,
//it skips the .rela.dyn table, since the code below will be unreachable
} else if (rel != NULL && relsz != 0) {
// Shared library was built without PLT (-fno-plt)
// Relocation entries have been moved from .rela.plt to .rela.dyn
for (size_t offs = relcount * relent; offs < relsz; offs += relent) {
ElfRelocation* r = (ElfRelocation*)(rel + offs);
if (ELF_R_TYPE(r->r_info) == R_GLOB_DAT) {
ElfSymbol* sym = (ElfSymbol*)(symtab + ELF_R_SYM(r->r_info) * syment);
if (sym->st_name != 0) {
_cc->addImport((void**)(base + r->r_offset), strtab + sym->st_name);
}
}
}
}Specifically, lines 394 and 405, for .rela.ptl and .rela.dyn respectively. This works fine when all entries in the addImport() function is defined in the one relocation table i.e .rela.plt or in .rela.dyn exclusively. But when the both tables has entries, it fails to import entries from the .rela.dyn. See the comment in the code.
Sample gdb session that demos the issue
Here's a gdb session which I tried to demonstrate the issue. In _imports, there's no pthread_setspecific entry, hence setupThreadHook() fails.:
(gdb) b setupThreadHook()
Breakpoint 2 at 0x7fffedcca58a: file src/cpuEngine.cpp, line 58.
(gdb) n
62 if (!VM::loaded()) {
...
CpuEngine::setupThreadHook () at src/cpuEngine.cpp:77
77 return lib != NULL && (_pthread_entry = lib->findImport(im_pthread_setspecific)) != NULL;
(gdb) s
CodeCache::findImport (this=0x7fffecf2f250, id=im_pthread_setspecific) at src/codeCache.cpp:207
207 if (!_imports_patchable) {
(gdb) s
211 return _imports[id];
(gdb) p _imports
$1 = {0x7ffff7e5b758 <dlopen@got[plt]>, 0x7ffff7e5b4e0 <pthread_create@got[plt]>, 0x0, 0x0, 0x7ffff7e5b4b0 <poll@got[plt]>}
(gdb)
Relocation table of the libjvm.so
The aforementioned relocation tables of the liberica java:
$ cd /usr/lib/jvm/jdk-21.0.5-bellsoft-x86_64/lib/server/
$ readelf libjvm.so --relocs|grep -e "Relocation s" -e 'dlopen' -e 'pthread_setspecific' -e 'pthread_exit' -e 'pthread_create' -e 'poll'
Relocation section '.rela.dyn' at offset 0x5dc8 contains 114095 entries:
00000156df10 00d000000006 R_X86_64_GLOB_DAT 0000000000000000 pthread_setspecific + 0
Relocation section '.rela.plt' at offset 0x2a2630 contains 210 entries:
00000156d4b0 000b00000007 R_X86_64_JUMP_SLO 0000000000000000 poll + 0
00000156d4e0 001300000007 R_X86_64_JUMP_SLO 0000000000000000 pthread_create + 0
00000156d758 007500000007 R_X86_64_JUMP_SLO 0000000000000000 dlopen + 0We see that pthread_setspefic is on rela.dyn and poll, pthread_create and dlopen is on rela.plt
Async-profiler version
commit b174feb, from branch master
Environment details
async-profilerrunning inside a docker container. SeeDockerfileabove- OS: Alpaquita(Alpine derivative) Linux 6.12.1
- JDK version: 21
- C flavor: musl
- CPU architecture: x86_64
$ java --version
openjdk 21.0.5 2024-10-15 LTS
OpenJDK Runtime Environment (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM (build 21.0.5+11-LTS, mixed mode)