KEMBAR78
Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorgan | PDF
Advanced Java Data Locality and
Data IPC Transport Solutions:
An Introduction to OpenHFT
ben.cotton@jpmorgan.com
ben.cotton@alumni.rutgers.edu
Jan 27, 2015
For real-time Java deployments
(with the strictest SLAs) the
problem of JVM “Stop the World”
GC activity (on medium-lived on-
Heap objects) is a MONSTROUS
problem.
2
Simple Remedy?
Design Java developments and deployments so that medium-lived
Collections (e.g. that “old dog” HashMap) object instance(s) are taught a
“new trick” …. That “new trick” is simple: take HashMap completely Off-Heap.
The	
  Heap	
  
Good	
  Boy!	
  
	
  
Off-­‐Heap	
  
you	
  go	
  …	
  
3
But What kind of HashMap are we putting Off-Heap?
•  java.util.HashMap ?
•  Collections.synchronizedMap( java.util.HashMap ); ?
•  java.util.concurrent.ConcurrentHashMap ?
•  something entirely different ?
ANSWER: something very different (indeed)!
•  OpenHFT’s Chronicle Map SOLUTION
•  net.openhft.chronicle.map.ChronicleMap
4
What exactly is OpenHFT?
•  100% Open Source
•  Designed to empower Higher Frequency Trading (HFT)
•  https://github.com/OpenHFT (developer source repo)
•  http://www.openhft.net (Products. Services. Training)
•  Provides modules that empower ultra low latency Java deployments to achieve
REAL-TIME compliance (with even their strictest of SLAs)
•  Java-Lang (Marshalling / GC Free De-Marshalling / Thread-SAFE / IPC-SAFE /
Off-Heap/ 64-bit ByteBuffers)
•  Chronicle-Queue (persisted low-latency Queue messaging and Logging)
•  Chroncile-Map (ChronicleMap, Thread-SAFE/IPC-SAFE/Off-Heap)
•  Chronicle-Engine Fast Data Framework.
•  Java-Runtime-Compiler (builds OpenHFT native impl classes – in process –
of user supplied JBI interfaces)
•  Java-Thread-Affinity (allows JVM Threads to be pinned by affinity to specific
OS cpus)
•  TransFIX (ultra low latency FIX engine)
5
What really is OpenHFT?
OpenHFT is a 100% OSS solution that empowers Java developers to deliver
the highest performing and most flexible
•  Data Locality (Optimised memory layout)
And
•  Data IPC Transport (waaay faster than UDP/
TCP)
Capabilities.
6
PART 1
OpenHFT as an Advanced Java Data Locality Provider
(That’s right folks! We’re going Off-Heap)
7
java.util.HashMap
8
Collections.synchronizedMap( java.util.HashMap );
9
10
11
Java Heap Layout: Through the Generations View
12
OpenHFT: Off-Heap ChronicleMap … an Architectural View
13
OpenHFT : Step #1 (PID 1)
14
OpenHFT : Step #2 (PID 1) – thread safe write to off heap
15
OpenHFT : Step #3 (PID 1)
16
OpenHFT : Step #4 (PID 2) – thread safe read (concurrent)
17
OpenHFT : Step #5 (PID 2) update an existing entry (thread safe)
18
OpenHFT : Step #6 (PID 2)
19
OpenHFT : Step #6 (PID 2)
S
OpenHFT code sample demo Summary:
-  MT-SAFE operations
-  IPC-SAFE operations
-  IPC-ATOMIC operations
-  ZERO-COPY (*)
-  GC-Free Marshalling/De-Marshalling
-  Ambition to provide the symmetry of
java.util.concurrent.* API support across native Linux
processes (not just Threads!)
-  Nanosecond transport latency (stay tuned for details)
* (note on Map<K,V > type domain support status).
20
Performance Results: CHM vs. SHM
On Linux 14.04, dual E5-2650 v2 @ 2.60GHz, 128 GB memory, each entry updated 32 times.
ConcurrentHashMap -Xmx110g -Xms110g -verbose:gc
21
Performance Results: CHM vs. SHM
On Linux 14.04, dual E5-2650 v2 @ 2.60GHz, 128 GB memory, each entry updated 32 times.
22
OpenHFT as an Off-Heap JCACHE Provider (e.g. RedHat JDG Infinispan)
23
OpenHFT empowers developers to use OpenJDK and Native Linux OS to
100% protect their medium-lived Java Collections (ChronicleMap) from being
impacted by STW GC pauses.
24
PART 2
OpenHFT as an Advanced Java IPC Transport Provider
(That’s right folks! UDP/TCP now joined by
native Linux /dev/shm IPC)
25
OSI Model of Networking Layers:
26
Java 7 Sockets Direct Protocol: Delivering SDP/IB as a Transport
27
Java 7 Sockets Direct Protocol: Delivering to Java its first RDMA capability
28
Intel iWARP: potential to empower Java 9 with SDP/10gE as a Transport
With SDP/10gE Java 9 will be able to deliver RDMA to the Java Ethernet
masses!
29
OpenHFT as a /dev/shm IPC Transport Provider:
peter.lawrey@higherfrequencytrading.com
“I	
  want	
  to	
  be	
  disrup=ve	
  rather	
  than	
  
rehash	
  or	
  just	
  slightly	
  improve	
  exis=ng	
  
products.”	
  
	
  
	
   	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  08/14/2014	
  
	
  	
  	
  Ah,	
  the	
  glory!	
  
30
OpenHFT as a /dev/shm IPC Transport Provider:
ZERO COPY capability
Advanced BytesMarshallable impl of Externalizable – for CBV Copy tolerant
parts of Liquidity Risk AE.
ZERO GC
CAPACITY LIMITED ONLY BY PHYSICAL RAM
CONSISTENTLY 438% Faster than On JVM Heap Cache<K,V> like operands
JPM Tests show= Mean 350 nano-second /dev/shm latency (ZC
Entry<K,V> transport, RDR_DIM Mock)
DOES NOT SUPPORT FULLY-TRANSITIVE GENERIC V=Object Graph as
Cache<K,V> Operand. Currently {String, primitive} for ZC. NO immediate
Plug-N-Play w/ RDR_DIM Operands used by Liquidity Risk AE
NOT YET ADAPTED w/in RedHat JDG as JSR-107 compliant Cache<K,V>
Operand.
31
PROOF IS IN THE TEST RESULTS: What do we get using OpenHFT vs.
RedHat JDG?
OpenHFT /dev/shm/SharedHashMap<K,V> as operand provider:
To try , run the following command in 2 separated terminals ( (rm /dev/shm/*) Left
player must be started first!):
java 
org.junit.runner.JUnitCore 
net.openhft.collections.fromdocs.com.jpmorgan.pingpong_latency.PingPongPlayerLeft
java 
org.junit.runner.JUnitCore 
net.openhft.collections.fromdocs.com.jpmorgan.pingpong_latency.PingPongPlayerRight
32424: 1 x _bondEntryV.getCoupon() (last _couponL=[5.00 %]) in 37.0 nanos
32425: 1 x _bondEntryV.getCoupon() (last _couponL=[5.00 %]) in 37.5 nanos
32423: 1 x _bondEntryV.getCoupon() (last _couponR=[4.00 %]) in 37.0 nanos
32424: 1 x _bondEntryV.getCoupon() (last _couponR=[4.00 %]) in 31.0 nanos
Full results at
https://github.com/Cotton-Ben/HugeCollections/tree/master/collections/src/
test/java/net/openhft/collections/fromdocs/com/jpmorgan/pingpong_latency
32
OpenHFT /dev/shm/SharedHashMap<K,V> as operand provider:
33
PROOF IS IN THE TEST RESULTS: What do we get using OpenHFT vs.
RedHat JDG?
RedHat JDG and JCACHE<K,V> as operand provider:
To try with a distributed cache, run the following command in separated terminals:
java 
-cp "target/classes:target/dependency/*“ 
org.infinispan.quickstart.clusteredcache.Node 
-d LEFT
java 
-cp "target/classes:target/dependency/*“ 
org.infinispan.quickstart.clusteredcache.Node 
-d RIGHT
counter=[217924] cache.put('369604103',3.000%); took 92,599 nanos
counter=[217925] cache.put('369604103',6.000%); took 90,062 nanos
counter=[42529] fl=[5%] = cache.get('369604103'); took 52,624 nanos
counter=[42530] fl=[6%] = cache.get('369604103'); took 47,981 nanos
( full results at
https://github.com/Cotton-Ben/infinispan-quickstart/tree/master/clustered-
cache )
34
RedHat JDG and JCACHE<K,V> as operand provider (1,000x slower)
35
Bottom Line = Tests by Real-Time Liquidity Risk Technology AggEng team
imperically demonstrate that OpenHFT off-heap over /dev/shm IPC transport is
1,000x faster than RedHat JDG on-heap over UDP OSI-Loopback IPC transport.
IMMEDIATE NEXT STEPS:
NO DOUBT ABOUT IT = We need the OpenHFT off heap capability made
available to us via the RedHat JDG product and its JCACHE API!
Explicit commits from Mircea re: adapting Peter’s OpenHFT SHM as RedHat JDG
interoperable JSR-107 Cache<K,V>. RedHat customer support case?
Explicit commits from Bela re: “short circuiting” all node ßà node transport
resolution to use /dev/shm IPC as transport (instead of TCP/UDP) whenever
possible … RedHat customer support case?
Explicit commits from Peter re: supporting above with OpenHFT as the Off-Heap
provider. JPM retain OpenHFT via support subscription?
Continued commits/time planning re: Ben, Dmitry, Xiao efforts to maintained
Fork’d repo and build sound/complete/confirming tests.
36
THE END
Note: For all things re OpenHFT
Please contact:
Peter.Lawrey@higherfrequencytrading.com
www.openhft.net
Peter.Lawrey@higherfrequencytrading.com
www.openhft.net 37

Shared Memory Performance: Beyond TCP/IP with Ben Cotton, JPMorgan

  • 1.
    Advanced Java DataLocality and Data IPC Transport Solutions: An Introduction to OpenHFT ben.cotton@jpmorgan.com ben.cotton@alumni.rutgers.edu Jan 27, 2015
  • 2.
    For real-time Javadeployments (with the strictest SLAs) the problem of JVM “Stop the World” GC activity (on medium-lived on- Heap objects) is a MONSTROUS problem. 2
  • 3.
    Simple Remedy? Design Javadevelopments and deployments so that medium-lived Collections (e.g. that “old dog” HashMap) object instance(s) are taught a “new trick” …. That “new trick” is simple: take HashMap completely Off-Heap. The  Heap   Good  Boy!     Off-­‐Heap   you  go  …   3
  • 4.
    But What kindof HashMap are we putting Off-Heap? •  java.util.HashMap ? •  Collections.synchronizedMap( java.util.HashMap ); ? •  java.util.concurrent.ConcurrentHashMap ? •  something entirely different ? ANSWER: something very different (indeed)! •  OpenHFT’s Chronicle Map SOLUTION •  net.openhft.chronicle.map.ChronicleMap 4
  • 5.
    What exactly isOpenHFT? •  100% Open Source •  Designed to empower Higher Frequency Trading (HFT) •  https://github.com/OpenHFT (developer source repo) •  http://www.openhft.net (Products. Services. Training) •  Provides modules that empower ultra low latency Java deployments to achieve REAL-TIME compliance (with even their strictest of SLAs) •  Java-Lang (Marshalling / GC Free De-Marshalling / Thread-SAFE / IPC-SAFE / Off-Heap/ 64-bit ByteBuffers) •  Chronicle-Queue (persisted low-latency Queue messaging and Logging) •  Chroncile-Map (ChronicleMap, Thread-SAFE/IPC-SAFE/Off-Heap) •  Chronicle-Engine Fast Data Framework. •  Java-Runtime-Compiler (builds OpenHFT native impl classes – in process – of user supplied JBI interfaces) •  Java-Thread-Affinity (allows JVM Threads to be pinned by affinity to specific OS cpus) •  TransFIX (ultra low latency FIX engine) 5
  • 6.
    What really isOpenHFT? OpenHFT is a 100% OSS solution that empowers Java developers to deliver the highest performing and most flexible •  Data Locality (Optimised memory layout) And •  Data IPC Transport (waaay faster than UDP/ TCP) Capabilities. 6
  • 7.
    PART 1 OpenHFT asan Advanced Java Data Locality Provider (That’s right folks! We’re going Off-Heap) 7
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    Java Heap Layout:Through the Generations View 12
  • 13.
    OpenHFT: Off-Heap ChronicleMap… an Architectural View 13
  • 14.
    OpenHFT : Step#1 (PID 1) 14
  • 15.
    OpenHFT : Step#2 (PID 1) – thread safe write to off heap 15
  • 16.
    OpenHFT : Step#3 (PID 1) 16
  • 17.
    OpenHFT : Step#4 (PID 2) – thread safe read (concurrent) 17
  • 18.
    OpenHFT : Step#5 (PID 2) update an existing entry (thread safe) 18
  • 19.
    OpenHFT : Step#6 (PID 2) 19
  • 20.
    OpenHFT : Step#6 (PID 2) S OpenHFT code sample demo Summary: -  MT-SAFE operations -  IPC-SAFE operations -  IPC-ATOMIC operations -  ZERO-COPY (*) -  GC-Free Marshalling/De-Marshalling -  Ambition to provide the symmetry of java.util.concurrent.* API support across native Linux processes (not just Threads!) -  Nanosecond transport latency (stay tuned for details) * (note on Map<K,V > type domain support status). 20
  • 21.
    Performance Results: CHMvs. SHM On Linux 14.04, dual E5-2650 v2 @ 2.60GHz, 128 GB memory, each entry updated 32 times. ConcurrentHashMap -Xmx110g -Xms110g -verbose:gc 21
  • 22.
    Performance Results: CHMvs. SHM On Linux 14.04, dual E5-2650 v2 @ 2.60GHz, 128 GB memory, each entry updated 32 times. 22
  • 23.
    OpenHFT as anOff-Heap JCACHE Provider (e.g. RedHat JDG Infinispan) 23
  • 24.
    OpenHFT empowers developersto use OpenJDK and Native Linux OS to 100% protect their medium-lived Java Collections (ChronicleMap) from being impacted by STW GC pauses. 24
  • 25.
    PART 2 OpenHFT asan Advanced Java IPC Transport Provider (That’s right folks! UDP/TCP now joined by native Linux /dev/shm IPC) 25
  • 26.
    OSI Model ofNetworking Layers: 26
  • 27.
    Java 7 SocketsDirect Protocol: Delivering SDP/IB as a Transport 27
  • 28.
    Java 7 SocketsDirect Protocol: Delivering to Java its first RDMA capability 28
  • 29.
    Intel iWARP: potentialto empower Java 9 with SDP/10gE as a Transport With SDP/10gE Java 9 will be able to deliver RDMA to the Java Ethernet masses! 29
  • 30.
    OpenHFT as a/dev/shm IPC Transport Provider: peter.lawrey@higherfrequencytrading.com “I  want  to  be  disrup=ve  rather  than   rehash  or  just  slightly  improve  exis=ng   products.”                          08/14/2014        Ah,  the  glory!   30
  • 31.
    OpenHFT as a/dev/shm IPC Transport Provider: ZERO COPY capability Advanced BytesMarshallable impl of Externalizable – for CBV Copy tolerant parts of Liquidity Risk AE. ZERO GC CAPACITY LIMITED ONLY BY PHYSICAL RAM CONSISTENTLY 438% Faster than On JVM Heap Cache<K,V> like operands JPM Tests show= Mean 350 nano-second /dev/shm latency (ZC Entry<K,V> transport, RDR_DIM Mock) DOES NOT SUPPORT FULLY-TRANSITIVE GENERIC V=Object Graph as Cache<K,V> Operand. Currently {String, primitive} for ZC. NO immediate Plug-N-Play w/ RDR_DIM Operands used by Liquidity Risk AE NOT YET ADAPTED w/in RedHat JDG as JSR-107 compliant Cache<K,V> Operand. 31
  • 32.
    PROOF IS INTHE TEST RESULTS: What do we get using OpenHFT vs. RedHat JDG? OpenHFT /dev/shm/SharedHashMap<K,V> as operand provider: To try , run the following command in 2 separated terminals ( (rm /dev/shm/*) Left player must be started first!): java org.junit.runner.JUnitCore net.openhft.collections.fromdocs.com.jpmorgan.pingpong_latency.PingPongPlayerLeft java org.junit.runner.JUnitCore net.openhft.collections.fromdocs.com.jpmorgan.pingpong_latency.PingPongPlayerRight 32424: 1 x _bondEntryV.getCoupon() (last _couponL=[5.00 %]) in 37.0 nanos 32425: 1 x _bondEntryV.getCoupon() (last _couponL=[5.00 %]) in 37.5 nanos 32423: 1 x _bondEntryV.getCoupon() (last _couponR=[4.00 %]) in 37.0 nanos 32424: 1 x _bondEntryV.getCoupon() (last _couponR=[4.00 %]) in 31.0 nanos Full results at https://github.com/Cotton-Ben/HugeCollections/tree/master/collections/src/ test/java/net/openhft/collections/fromdocs/com/jpmorgan/pingpong_latency 32
  • 33.
  • 34.
    PROOF IS INTHE TEST RESULTS: What do we get using OpenHFT vs. RedHat JDG? RedHat JDG and JCACHE<K,V> as operand provider: To try with a distributed cache, run the following command in separated terminals: java -cp "target/classes:target/dependency/*“ org.infinispan.quickstart.clusteredcache.Node -d LEFT java -cp "target/classes:target/dependency/*“ org.infinispan.quickstart.clusteredcache.Node -d RIGHT counter=[217924] cache.put('369604103',3.000%); took 92,599 nanos counter=[217925] cache.put('369604103',6.000%); took 90,062 nanos counter=[42529] fl=[5%] = cache.get('369604103'); took 52,624 nanos counter=[42530] fl=[6%] = cache.get('369604103'); took 47,981 nanos ( full results at https://github.com/Cotton-Ben/infinispan-quickstart/tree/master/clustered- cache ) 34
  • 35.
    RedHat JDG andJCACHE<K,V> as operand provider (1,000x slower) 35
  • 36.
    Bottom Line =Tests by Real-Time Liquidity Risk Technology AggEng team imperically demonstrate that OpenHFT off-heap over /dev/shm IPC transport is 1,000x faster than RedHat JDG on-heap over UDP OSI-Loopback IPC transport. IMMEDIATE NEXT STEPS: NO DOUBT ABOUT IT = We need the OpenHFT off heap capability made available to us via the RedHat JDG product and its JCACHE API! Explicit commits from Mircea re: adapting Peter’s OpenHFT SHM as RedHat JDG interoperable JSR-107 Cache<K,V>. RedHat customer support case? Explicit commits from Bela re: “short circuiting” all node ßà node transport resolution to use /dev/shm IPC as transport (instead of TCP/UDP) whenever possible … RedHat customer support case? Explicit commits from Peter re: supporting above with OpenHFT as the Off-Heap provider. JPM retain OpenHFT via support subscription? Continued commits/time planning re: Ben, Dmitry, Xiao efforts to maintained Fork’d repo and build sound/complete/confirming tests. 36
  • 37.
    THE END Note: Forall things re OpenHFT Please contact: Peter.Lawrey@higherfrequencytrading.com www.openhft.net Peter.Lawrey@higherfrequencytrading.com www.openhft.net 37