KEMBAR78
New Algorithms in Java | PDF
Krystian Zybała
@k_zybala
(Modern) New? GC algorithms
in Java
# whoami
• Java Principal Engineer / Java Performance Engineer
• Specialization : JVM, Apache Kafka, Cassandra
• Hobby : JVM Performance, Reactive Systems
• Workshops, Consultations
Twitter: @k_zybala
Blog: https://kzybala.pl
Mail: kontakt@kzybala.pl
Before we start…
Let's hold our hands
and
wait for the next GC cycle
What is GC?
GC is your friend 🤗
Previous options
Previous GC’s algorithms
• SerialGC
• ParallelGC
• CMS - not available in current JDKs builds
• G1GC
How GC works?
The Generational hypothesis
The most of young objects are much more
likely to die than old objects
Eden Survivor 1 Survivor 2 Tenured
New object
Old generation
Young generation
G1
G1
• Low-pause collector - First papers date back to 2004
• Supported since JDK 7u4 (April 2012)
• Replacement for CMS
• Low pauses valued more than max throughput
• Designed to be really ease to tune
• Tuning based on max Stop-The-World pause ( -XX:MaxGCPauseMillis=X)
Default: 250ms
G1
• Default GC since Java 9
• Use regions instead generations
• Dynamic size of region
• Parallel Full GC in Java10(jdk-10-ea+34)
G1
Concurrent
Marking
Concurrent liveness summary
Init Marking Remark Clean up
GC Thread
Application
Thread
Copy or Space reclamation
Ergonomic
bool UseG1GC = false
{product} {default}
bool UseSerialGC = true
{product} {ergonomic}
http://hg.openjdk.java.net/jdk/jdk10/
fi
le/b09e56145e11/src/hotspot/share/runtime/os.cpp#l1606
Heap regions X-Large
Why we need something new?
GC Optimized for Best for Pause
Multi-
threaded
Number
of cores
SerialGC
Memory
Footprint
Single core, small
heaps
( micro-service ? )
Yes No 1
ParallelGC Throughput Batch job Yes Yes 2+
G1GC
Throughput /
Latency Balance
request-response
db integractions
Yes Yes 2+*
ZGC Low latency
request-response
db integractions
Yes Yes 2+*
ShenandoahGC Low latency
request-response
db integractions
Yes Yes 2+*
How does it work?
How does it work?
• Requires meta data storing in object
• Requires JIT support - Load GC barrier or Store GC barrier
Mutator problems
Headers
x = 1
y = 2
z = 1
Object A
Object B
Mutator problems
Headers
x = 1
y = 2
z = 1
Headers
x = 1
y = 2
z = 1
from-space to-space
Object A
Object B
Object A
Object B
Object C
Mutator problems
Headers
x = 1
y = 4
z = 1
Headers
x = 3
y = 2
z = 1
from-space to-space
Application thread 10
Application thread 3 Object A
Object B
Object A
Object B
Shenandoah
Shenandoah
• Available in AdoptOpenJDK builds
• NOT available in Oracle’s OpenJDK builds
• Present in RedHat OpenJDK 8,11+
• Low pause GC
• Concurrent Compaction, Single Generation
Shenandoah
• Available for x86 32-bit and 64-bit
• ARM ports available
• Linux, MacOS, Windows
• Region Based
Shenandoah
1.0
* Maintains a weak to-space invariant
* Load barrier
* Store barrier
Shenandoah
Mark
Class
Field1
Field2
Mark
Class
Field1
Field2
1.0
Object A
Object B Object C
Shenandoah
Mark
Class
Field1
Field2
Mark
Class
Field1
Field2
Mark
Class
Field1
Field2
Mark
Class
Field1
Field2
Fwd Ptr Fwd Ptr
1.0
Object A
Object B Object C
Object A Object B Object C
Shenandoah
Problems
* More memory need ( due to the forward pointer) worst case 50%
common case 5-10%
* Complicated GC barriers ( Store and Load)
Shenandoah
2.0
* Maintains a strong to-space invariant
* Load barrier (LRB - Load Reference Barrier)
Shenandoah
2.0
Mark
Class
Field1
Field2
Mark
Class
Field1
Field2
Object A
Object B Object C
Shenandoah
2.0
Class
Field1
Field2
Mark
Class
Field1
Field2
Fwd Ptr
Mark
Class
Field1
Field2
From Space To Space
Object A
Object B
Object C
Shenandoah
Concurrent
Mark
Concurrent Evacuation
Init Mark Final Mark
Init U-R
Concurrent
Update
References
GC Thread
Application
Thread
Final U-R
Shenandoah
Concurrent
Mark
Concurrent
Evacuation
Concurrent
Update
References
Concurrent
Mark
Concurrent
Evacuation
Concurrent
Update
References
Shenandoah
Marking + Evacuating + Update References
Shenandoah
Mark
Class
Field1
Field2
Evacuation
from-space to-space
Object A
Object B
Shenandoah
Mark
Class
Field1
Field2
Evacuation
from-space to-space
Mark
Class
Field1
Field2
Application Thread copy
Object A
Object B Object B’
Shenandoah
Mark
Class
Field1
Field2
Evacuation
Mark
Class
Field1
Field2
from-space to-space
Mark
Class
Field1
Field2
GC Thread copy
Application Thread copy
Object A
Object B
Object B’’
Object B’
Shenandoah
Evacuation
Mark
Class
Field1
Field2
from-space to-space
Mark
Class
Field1
Field2
Class
Field1
Field2
Fwd Ptr
GC Thread copy
Application Thread copy
Object A
Object B
Object B’’
Object B’
Shenandoah
Evacuation
from-space to-space
Mark
Class
Field1
Field2
Class
Field1
Field2
Fwd Ptr
Object A
Object B Object B’
Shenandoah
Update references
from-space to-space
Mark
Class
Field1
Field2
Class
Field1
Field2
Fwd Ptr
Object A
Object B
Tuning
-Xms -Xmx
Shenandoah Tuning
• -XX:+AlwaysPreTouch
• -XX:+UseLargePages
• -XX:+UseNUMA (not supported yet)
• -XX:-UseBiasedLocking
• -XX:+DisableExplicitGC
ZGC
ZGC
• Sub-millisecond max pause times (less than 10ms) (less than 1ms)
• Pauses are O(1)
• Pause times do not increase with the heap, live-set or root-set size
• Handle heaps ranging from a 8MB to 16TB in size
• ZGC was initially introduced as an experimental feature in JDK 11, and was
declared Production Ready in JDK 15.
ZGC
• Only for Linux x86 64-bit
• compressed class pointers
• no support compress OOPs
• Region based
• ZPages
• Small ( 2MiB - object size up to 256KiB)
• Medium ( 32MiB - object size up to 4MiB)
• Large ( 4+ MiB - object size > 4MiB)
ZGC
• Concurrent
• Region-based
• Compacting
• NUMA-aware
• Using colored pointers
• Using load barriers
ZGC
Uses colored pointers. Color indicates GC metadata
Object address - 44 bits 16 bit
4 color bits
64 bits pointer address
ZGC Multi-Mapping
∂
∂
001<addr>
Virtual memory
Physical memory
010<addr> 100<addr>
ZGC
ZGC uses a load GC barrier
Object someObject = person.name;
<load_barrier>
// String n = person.name;
mov 0x10(%rax), %rbx
// bad color
test %rbx, (0x16)%r15
// If yes, enter slow path
jnz slow_path
ZGC
Concurrent
Mark
Reference processing
Unload classes
Free memory pages
Prepare relocation
Mark start Mark End Relocate
Concurrent
Relocate
Concurrent Remap
GC Thread
Application
Thread
ZGC
Concurrent
Mark
Concurrent
Reference processing
Unload classes
Free memory pages
Prepare relocation
Concurrent
Relocate
Remap
Concurrent
Mark
…
ZGC
ZGC
Relocation
Forwarding table
A -> A1
B -> B1
C -> C1
ZGC
Relocation
Forwarding table
A -> A1
B -> B1
C -> C1
D -> D1
Application thread copy
ZGC
Relocation
Forwarding table
A -> A1
B -> B1
C -> C1
D -> D1
Application thread copy
GC thread copy
ZGC
Relocation Forwarding table
A -> A1
B -> B1
C -> C1
D -> D1
E -> E1
Application thread copy
GC thread copy
ZGC
Remapping
ZGC
Future changed
* Generational
* Sub-millisecond max pause time
* Graal JIT support
Tuning
-XMX<Size>
ZGC Tuning
• -XX:ZAllocationSpikeTolerance=factor
• -XX:ZCollectionInterval=seconds
• -XX:ZFragmentationLimit=percent
• -XX:ZUncommitDelay
Logging
-Xlog:gc (basic)
-Xlog:gc*(details)
Java 19
Project Loom
JEP 376
ZGC Concurrent Thread-Stack Processing
Concurrent Thread-Stack Processing
▪Remove thread-stack processing from ZGC safepoints.
▪Make stack processing lazy, cooperative, concurrent, and incremental.
▪Remove all other per-thread root processing from ZGC safepoints.
▪Provide a mechanism by which other HotSpot subsystems can lazily process stacks.
Conclusions
Conclusions
* Still JDK8 ? - Shenandoah
* JDK 11+ ? - ZGC or Shenandoah
* Nothing is for free - you need more resources for achieve low pause
Is G1 dead?
JEP-423 Region Pinning for G1
JEP-423
• No stalling of threads due to JNI critical regions.
• No additional latency to start a garbage collection due to JNI critical
regions.
• No regressions in GC pause times when no JNI critical regions are
active.
• Minimal regressions in GC pause times when JNI critical regions are
active.
Over 30
major performance improvements
QA
Thank you!

New Algorithms in Java