KEMBAR78
Profiler fails to start if both `rela.plt` and `rela.dyn` have needed imports · Issue #1060 · async-profiler/async-profiler · GitHub
Skip to content

Profiler fails to start if both rela.plt and rela.dyn have needed imports #1060

@swivelmargarita

Description

@swivelmargarita

Describe the bug

If the needed imports happens to be residing in rela.dyn and rela.plt both, not only in one of them, async-profiler fails to start with the message [ERROR] Could not set pthread hook.

This is becauseElfParser::parseDynamicSection() does not parse rela.dyn if it sees that rela.plt is present in the shared library.

Please see the Additional Information/Context.

Expected vs. actual behavior

Expected Behavior

The async-profiler to be run normally.

Actual Behavior

Error stating async-profiler couldn't set pthread hook

$ docker run async-profiler-liberica:latest
[ERROR] Could not set pthread hook


              |\      _,,,--,,_
             /,`.-'`'   ._  \-;;,_
  _______ __|,4-  ) )_   .;.(__`'-'__     ___ __    _ ___ _______
 |       | '---''(_/._)-'(_\_)   |   |   |   |  |  | |   |       |
 |    _  |    ___|_     _|       |   |   |   |   |_| |   |       | __ _ _
 |   |_| |   |___  |   | |       |   |   |   |       |   |       | \ \ \ \
 |    ___|    ___| |   | |      _|   |___|   |  _    |   |      _|  \ \ \ \
 |   |   |   |___  |   | |     |_|       |   | | |   |   |     |_    ) ) ) )
 |___|   |_______| |___| |_______|_______|___|_|  |__|___|_______|  / / / /
 ==================================================================/_/_/_/

:: Built with Spring Boot :: 3.3.5


2024-11-25T11:23:56.744Z  INFO 1 --- [           main] o.s.s.petclinic.PetClinicApplication     : Starting PetClinicApplication v3.3.0 using Java 21.0.5 with PID 1 (/app/petclinic.jar started by root in /app)
2024-11-25T11:23:56.747Z  INFO 1 --- [           main] o.s.s.petclinic.PetClinicApplication     : No active profile set, falling back to 1 default profile: "default"
2024-11-25T11:23:57.162Z  INFO 1 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2024-11-25T11:23:57.187Z  INFO 1 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 21 ms. Found 2 JPA repository interfaces.
2024-11-25T11:23:57.553Z  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port 8080 (http)
2024-11-25T11:23:57.563Z  INFO 1 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2024-11-25T11:23:57.563Z  INFO 1 --- [           main] o.apache.catalina.core.StandardEngine    : Starting Servlet engine: [Apache Tomcat/10.1.31]
2024-11-25T11:23:57.593Z  INFO 1 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2024-11-25T11:23:57.594Z  INFO 1 --- [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 827 ms
2024-11-25T11:23:57.715Z  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2024-11-25T11:23:57.817Z  INFO 1 --- [           main] com.zaxxer.hikari.pool.HikariPool        : HikariPool-1 - Added connection conn0: url=jdbc:h2:mem:be5f7d7d-a36a-4953-a8cf-b39f70b5d7cb user=SA
2024-11-25T11:23:57.818Z  INFO 1 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2024-11-25T11:23:57.903Z  INFO 1 --- [           main] o.hibernate.jpa.internal.util.LogHelper  : HHH000204: Processing PersistenceUnitInfo [name: default]
2024-11-25T11:23:57.939Z  INFO 1 --- [           main] org.hibernate.Version                    : HHH000412: Hibernate ORM core version 6.5.3.Final
2024-11-25T11:23:57.957Z  INFO 1 --- [           main] o.h.c.internal.RegionFactoryInitiator    : HHH000026: Second-level cache disabled
2024-11-25T11:23:58.079Z  INFO 1 --- [           main] o.s.o.j.p.SpringPersistenceUnitInfo      : No LoadTimeWeaver setup: ignoring JPA class transformer
2024-11-25T11:23:58.525Z  INFO 1 --- [           main] o.h.e.t.j.p.i.JtaPlatformInitiator       : HHH000489: No JTA platform available (set 'hibernate.transaction.jta.platform' to enable JTA platform integration)
2024-11-25T11:23:58.526Z  INFO 1 --- [           main] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default'
2024-11-25T11:23:58.711Z  INFO 1 --- [           main] o.s.d.j.r.query.QueryEnhancerFactory     : Hibernate is in classpath; If applicable, HQL parser will be used.
2024-11-25T11:23:59.282Z  INFO 1 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 14 endpoints beneath base path '/actuator'
2024-11-25T11:23:59.320Z  INFO 1 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path '/'
2024-11-25T11:23:59.326Z  INFO 1 --- [           main] o.s.s.petclinic.PetClinicApplication     : Started PetClinicApplication in 2.758 seconds (process running for 3.015)
^C2024-11-25T11:24:00.807Z  INFO 1 --- [ionShutdownHook] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2024-11-25T11:24:00.808Z  INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown initiated...
2024-11-25T11:24:00.810Z  INFO 1 --- [ionShutdownHook] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Shutdown completed.

Reproduction Steps

  1. Build the image(see Dockerfile below):
    docker build -t async-profiler-liberica .

  2. Run the container:
    docker run async-profiler-liberica:latest

FROM bellsoft/liberica-runtime-container:jdk-21-stream-musl as builder

WORKDIR /app

RUN apk add libstdc++ git make g++ linux-headers

RUN git clone https://github.com/spring-projects/spring-petclinic.git \
    && cd spring-petclinic \
    && ./gradlew -x test clean build

RUN git clone https://github.com/async-profiler/async-profiler.git \
    && cd async-profiler \
    &&  git checkout b174feb \
    && make \
    && mkdir -p /app/profiler \
    && cp -r build/* /app/profiler


FROM bellsoft/liberica-runtime-container:jre-21-stream-musl
WORKDIR /app
EXPOSE 8082
RUN apk add libstdc++

COPY --from=builder /app/spring-petclinic/build/libs/spring-petclinic-*.jar /app/petclinic.jar
COPY --from=builder /app/profiler /app/profiler

ENTRYPOINT ["java", "-agentpath:/app/profiler/lib/libasyncProfiler.so=start,event=cpu,file=/tmp/profile.html", \
    "-jar", "-XX:MaxRAMPercentage=80.0", "/app/petclinic.jar"]

Additional Information/Context

More info about the issue

When starting the profiler, Profiler::start() calls perfEvents:start(). This then in turn checks if thread hooks are set up properly and then returns:

return lib != NULL && (_pthread_entry = lib->findImport(im_pthread_setspecific)) != NULL;

In order it to return true it checks if Java libjvm.so(in this case) is defined and if import [im_pthread_setspecific](https://github.com/async-profiler/async-profiler/blob/master/src/codeCache.h#L25) in _imports is not null.

In lib->findImport(im_pthread_setspecific), it fails to find pthread_setspecific because it was not added previously in CodeCache::addImport(). I will describe this next.

When the async-profiler gets first started, these calls are made in the following order: VM::init()->Profiler::updateSymbols()->Symbols::parseLibraries()...->ElfParser::parseDynamicSection()->CodeCache::addImport().
Important thing is parseDynamicSection() calls addImport() in these lines. (_cc->addimport(...)):

const char* base = this->base();
if (jmprel != NULL && pltrelsz != 0) {
	// Parse .rela.plt table
	for (size_t offs = 0; offs < pltrelsz; offs += relent) {
		ElfRelocation* r = (ElfRelocation*)(jmprel + offs);
		ElfSymbol* sym = (ElfSymbol*)(symtab + ELF_R_SYM(r->r_info) * syment);
		if (sym->st_name != 0) {
			_cc->addImport((void**)(base + r->r_offset), strtab + sym->st_name);
		}
	}
//Here if the first conditional (jmprel != NULL && pltrelsz != 0) evaluates to true, 
//it skips the .rela.dyn table, since the code below will be unreachable
} else if (rel != NULL && relsz != 0) { 
	// Shared library was built without PLT (-fno-plt)
	// Relocation entries have been moved from .rela.plt to .rela.dyn
	for (size_t offs = relcount * relent; offs < relsz; offs += relent) {
		ElfRelocation* r = (ElfRelocation*)(rel + offs);
		if (ELF_R_TYPE(r->r_info) == R_GLOB_DAT) {
			ElfSymbol* sym = (ElfSymbol*)(symtab + ELF_R_SYM(r->r_info) * syment);
			if (sym->st_name != 0) {
				_cc->addImport((void**)(base + r->r_offset), strtab + sym->st_name);
			}
	   }
	}
}

Specifically, lines 394 and 405, for .rela.ptl and .rela.dyn respectively. This works fine when all entries in the addImport() function is defined in the one relocation table i.e .rela.plt or in .rela.dyn exclusively. But when the both tables has entries, it fails to import entries from the .rela.dyn. See the comment in the code.

Sample gdb session that demos the issue

Here's a gdb session which I tried to demonstrate the issue. In _imports, there's no pthread_setspecific entry, hence setupThreadHook() fails.:

(gdb) b setupThreadHook()
Breakpoint 2 at 0x7fffedcca58a: file src/cpuEngine.cpp, line 58.

(gdb) n
62	   if (!VM::loaded()) {

...


CpuEngine::setupThreadHook () at src/cpuEngine.cpp:77
77	   return lib != NULL && (_pthread_entry = lib->findImport(im_pthread_setspecific)) != NULL;
(gdb) s
CodeCache::findImport (this=0x7fffecf2f250, id=im_pthread_setspecific) at src/codeCache.cpp:207
207	   if (!_imports_patchable) {
(gdb) s
211	   return _imports[id];
(gdb) p _imports
$1 = {0x7ffff7e5b758 <dlopen@got[plt]>, 0x7ffff7e5b4e0 <pthread_create@got[plt]>, 0x0, 0x0, 0x7ffff7e5b4b0 <poll@got[plt]>}
(gdb)

Relocation table of the libjvm.so

The aforementioned relocation tables of the liberica java:

$ cd /usr/lib/jvm/jdk-21.0.5-bellsoft-x86_64/lib/server/

$ readelf libjvm.so --relocs|grep -e "Relocation s" -e 'dlopen' -e 'pthread_setspecific' -e 'pthread_exit' -e 'pthread_create' -e 'poll'
Relocation section '.rela.dyn' at offset 0x5dc8 contains 114095 entries:
00000156df10  00d000000006 R_X86_64_GLOB_DAT 0000000000000000 pthread_setspecific + 0
Relocation section '.rela.plt' at offset 0x2a2630 contains 210 entries:
00000156d4b0  000b00000007 R_X86_64_JUMP_SLO 0000000000000000 poll + 0
00000156d4e0  001300000007 R_X86_64_JUMP_SLO 0000000000000000 pthread_create + 0
00000156d758  007500000007 R_X86_64_JUMP_SLO 0000000000000000 dlopen + 0

We see that pthread_setspefic is on rela.dyn and poll, pthread_create and dlopen is on rela.plt

Async-profiler version

commit b174feb, from branch master

Environment details

  • async-profiler running inside a docker container. See Dockerfile above
  • OS: Alpaquita(Alpine derivative) Linux 6.12.1
  • JDK version: 21
  • C flavor: musl
  • CPU architecture: x86_64
$ java --version
openjdk 21.0.5 2024-10-15 LTS
OpenJDK Runtime Environment (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM (build 21.0.5+11-LTS, mixed mode)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions