KEMBAR78
Java Performance Puzzlers | PDF
JAVA
PERFORMANCE
PUZZLERS
DOUGLAS Q. HAWKINS
LEAD JIT DEVELOPER
AZUL SYSTEMS
@dougqh
dougqh@gmail.com
AGENDA
LOOK AT SOME INTERESTING
OFTEN UNINTUITIVE PERFORMANCE CASES
SEE WHAT WE CAN LEARN FROM THEM
Adding 1..1395 Numbers
Adding 1..1396 Numbers
A
B
C Adding 1..1397 Numbers
1
100
10000
0 50 100 150 200 250 300 350 400 450
logns
iterations
Interpreter 1st JIT 2nd JIT Repeat…
Deoptimize
& Repeat
JMH JAVA MEASUREMENT HARNESS:
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class Benchmark {
@Setup
public void setup() {}
@Benchmark
public void benchmark1() { … }
@Benchmark
public void benchmark2() { … }
…
}
01A
add1395 avgt 10 3.790 ± 0.300 ns/op
add1396 avgt 10 3.784 ± 0.336 ns/op
add1397 avgt 10 2767.567 ± 351.909 ns/op
01A
RESULTS
PERFORMANCE IS
FULL OF SURPRISES.
PERFORMANCE CLIFF:
HUGE METHOD LIMIT
add1395 7993 bytes
add1396 7999 bytes
add1397 8005 bytes
int x = 0;
for ( int i = 1; i <= 1395; ++i ) {
x += i;
}
return x;
A
B
C
int x = 0;
for ( int i = 1; i <= 1396; ++i ) {
x += i;
}
return x;
int x = 0;
for ( int i = 1; i <= 1397; ++i ) {
x += i;
}
return x; 01B
add1395 avgt 10 435.372 ± 19.225 ns/op
add1396 avgt 10 436.955 ± 18.688 ns/op
add1397 avgt 10 434.245 ± 11.635 ns/op
NOW EQUALLY — BAD
01B
System.arraycopy
int[] dest = new int[src.length];
for ( int i = 0; i < src.length; ++i ) {
dest[i] = src[i];
}
array.clone
A
B
C
02A
arrayCopy avgt 10 95598.200 ns/op
cloneArray avgt 10 94848.814 ns/op
manual avgt 10 93309.838 ns/op
02A
RESULTS
100k
int[] original = randomInts(100_000);
int[] copy1 = new int[original.length];
long startTime1 = System.nanoTime();
for ( int i = 0; i < original.length; ++i ) {
copy1[i] = original[i];
}
System.out.printf("copy loop: % 10d ns%n",
System.nanoTime() - startTime1);
int[] copy2 = new int[original.length];
long startTime2 = System.nanoTime();
System.arraycopy(
original, 0,
copy2, 0, original.length);
System.out.printf("arraycopy: % 10d ns%n",
System.nanoTime() - startTime2);
VS.
02B
02B
RESULTS
copy loop: 1646957 ns
arraycopy: 54827 ns
PERFORMANCE IS
FULL OF INTRICACIES.
O(N) != O(N)
Integer.valueOf(x)
new Integer(x)
x
A
B
C
03
public java.lang.Integer valueOf();
Code:
0: aload_0
1: getfield offset:I
4: aload_0
5: getfield nums:[I
8: arraylength
9: if_icmplt 17
12: aload_0
13: iconst_0
14: putfield offset:I
17: aload_0
18: getfield nums:[I
21: aload_0
22: dup
23: getfield offset:I
26: dup_x1
27: iconst_1
28: iadd
29: putfield offset:I
32: iaload
33: invokestatic Integer.valueOf(I)
36: areturn
public java.lang.Integer auto();
Code:
0: aload_0
1: getfield offset:I
4: aload_0
5: getfield nums:[I
8: arraylength
9: if_icmplt 17
12: aload_0
13: iconst_0
14: putfield offset:I
17: aload_0
18: getfield nums:[I
21: aload_0
22: dup
23: getfield offset:I
26: dup_x1
27: iconst_1
28: iadd
29: putfield offset:I
32: iaload
33: invokestatic Integer.valueOf(I)
36: areturn
javap -c
03
public static Integer valueOf(int i) {
if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);
}
static final int low = -128;
// configurable through
// -Djava.lang.Integer.IntegerCache.high
static final int high = 127;
03
@Param({"100", "1000", "10000"})
private int range;
PARAMERIZING JMH
@Setup
public void setUp() {
ThreadLocalRandom random = ThreadLocalRandom.current();
nums = new int[1_000_000];
for ( int i = 0; i < nums.length; ++i ) {
nums[i] = random.nextInt(-range, range);
}
}
03
RESULTSRESULTS
(range)
auto 100 avgt 10 5.132 ± 0.316 ns/op
auto 1000 avgt 10 8.184 ± 1.551 ns/op
auto 10000 avgt 10 6.996 ± 1.401 ns/op
new_ 100 avgt 10 6.328 ± 0.973 ns/op
new_ 1000 avgt 10 6.083 ± 0.651 ns/op
new_ 10000 avgt 10 6.243 ± 1.031 ns/op
valueOf 100 avgt 10 5.096 ± 0.116 ns/op
valueOf 1000 avgt 10 8.488 ± 1.957 ns/op
valueOf 10000 avgt 10 7.155 ± 1.382 ns/op
03
RESULTS
-100 to 100 -1,000 to 1,000 -10,000 to 10,000
new 6.328 ± 0.973 6.083 ± 0.651 6.243 ± 1.031
autobox 5.132 ± 0.316 8.184 ± 1.551 6.996 ± 1.401
valueOf 5.096 ± 0.116 8.488 ± 1.957 7.155 ± 1.382
EVERYTHING MATTERS:
HARDWARE INCLUDED
0
2000000
4000000
6000000
8000000
100 1000 10000 100000 1000000
ints ArrayList LinkedList
http://cr.openjdk.java.net/~shade/scratch/ArrayVsLinked.java
LINKED LIST
ARRAY LIST
ARRAY
O(N) != O(N)
A
B
C
D
E
list.toArray()
list.toArray(new Object[0])
list.toArray(new Object[list.size()])
list.toArray(new String[0])
list.toArray(new String[list.size()])
https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
A
B
C
D
E
list.toArray()
list.toArray(new Object[0])
list.toArray(new Object[list.size()])
list.toArray(new String[0])
list.toArray(new String[list.size()])
SAYS…
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
04
toArray avgt 10 54.084 ± 10.000 ns/op
toArraySized avgt 10 58.555 ± 0.745 ns/op
toArrayUnsized avgt 10 54.025 ± 0.343 ns/op
toStringArraySize avgt 10 154.291 ± 2.060 ns/op
toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op
RESULTS
https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
STRING[] SLOWER THAN OBJECT[]?
Object[] objects = new Integer[20];
objects[0] = “foo”;
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
STRING[] SLOWER THAN OBJECT[]?
Object[] objects = new Integer[20];
objects[0] = “foo”;
Possible Runtime Check
Sometimes JIT Eliminates It
https://shipilev.net/blog/2016/arrays-wisdom-ancients/
TRICKY WHEN ARRAY IS PASSED IN
list.toArray(new String[…]);
OBJECT[] IS COMMONLY USED,
SO SPECIAL CASE.
toArray avgt 10 54.084 ± 10.000 ns/op
toArraySized avgt 10 58.555 ± 0.745 ns/op
toArrayUnsized avgt 10 54.025 ± 0.343 ns/op
toStringArraySize avgt 10 154.291 ± 2.060 ns/op
toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op
IS WRONG.
https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
WHY IS NO ARRAY / UNSIZED FASTER
// allocate
dest = malloc(sizeof(E) * len);
// zero-initialize
for ( int i = 0; i < len; ++i ) {
dest[i] = null;
}
// copy
for ( int i = 0; i < len; ++i ) {
dest[i] = src[i];
}
Dead
Stores
Integer.toString(NUM)
“” + NUM
A
B
C
StringBuilder builder = new StringBuilder();
builder.append(NUM);
builder.toString();
05A
static final int NUM =
ThreadLocalRandom.current().nextInt()
builder avgt 10 47.688 ± 3.866 ns/op
concat avgt 10 33.118 ± 0.840 ns/op
toString avgt 10 41.105 ± 2.005 ns/op
05A
javap -c
public java.lang.String concat();
Code:
0: new StringBuilder
3: dup
4: invokespecial StringBuilder."<init>":()V
7: ldc ""
9: invokevirtual StringBuilder.append(LString;)
12: getstatic NUM:I
15: invokevirtual StringBuilder.append(I)
18: invokevirtual StringBuilder.toString()
21: areturn
05A
public java.lang.String concat() {
return new StringBuilder().
append(“”).
append(NUM).
toString();
}
IN JAVA...
05B
return new StringBuilder().
append(“”).
append(NUM).
toString();
StringBuilder builder =
new StringBuilder();
builder.append(NUM);
VS.
05B
return builder.toString();
05B
RESULTS
builder avgt 10 41.122 ± 1.688 ns/op
concat avgt 10 35.173 ± 3.092 ns/op
concatLikeBuilder avgt 10 32.536 ± 2.302 ns/op
COMPILERS ARE
GLORIFIED REGEX-ES.
static final int NUM = 1 << 20
builder avgt 10 40.734 ± 0.997 ns/op
concat avgt 10 3.343 ± 0.205 ns/op
toString avgt 10 32.877 ± 0.450 ns/op
05C
public java.lang.String concat();
Code:
0: ldc “1048576”
2: areturn
public java.lang.String concat() {
return “1048576”;
}
javap -c
05C
CONSTANT FOLDING & PROPAGATION
static final int NUM = 1 << 20
static final int NUM = 1_048_576
constant
fold
constant
propagate
“” + NUM
“” + 1_048_576
constant
fold
“1048576”
05C
PERFORMANCE
COMPOSES
UNINTUITIVELY.
String str = “”;
for ( int i = 0; i < 100; ++i ) {
str += i;
}
A
B
StringBuilder builder = new StringBuilder();
for ( int i = 0; i < 100; ++i ) {
builder.append(i);
}
06
RESULTS
builder avgt 10 979.112 ± 59.893 ns/op
concat avgt 10 2661.941 ± 135.251 ns/op
05
FAST IN ONE CONTEXT
CAN BE SLOW IN ANOTHER.
RESULTS
speed
objects
allocated*
memory
consumed*
concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes
builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes
* Memory usage measured separately with Caliper
06
RESULTS
speed
objects
allocated*
memory
consumed*
concat 2661.941 ± 135.251 ns/op 400 objects 71800 bytes
builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes
sizedBuilder 887.717 ± 62.569 ns/op 4 objects 1464 bytes
* Memory usage measured separately with Caliper
06
APPLIES TO COLLECTIONS, TOO.
HashSet 16
HashMap 16
Hashtable 16
LinkedList 1
ArrayList 10
Vector 10
StringBuilder 16
StringBuffer 16
http://www.slideshare.net/cnbailey/memory-efficient-java
O(N)?
obj.invoke with 1 type
obj.invoke with 2 type
A
B
C obj.invoke with 3 types
(monomoprhic)
(bimorphic)
(trimorphic)
(megamorphic)
07A
THERE IS A CLIFF
AT 3 TYPES.
https://shipilev.net/blog/2015/black-magic-method-dispatch/
func.apply(x);
if ( func.getClass() == Square.class ) {
x * x;
} else if ( func.getClass() == Cube.class ) {
x * x * x
} else {
…
}
func.apply(x);
<= 2 types > 2 types
@Setup
public void setup() {
for ( int i = 0; i < 20_000; ++i ) {
if ( morphism >= 1 ) func = new Square();
call();
if ( morphism >=2 ) func = new Cube();
call();
if ( morphism >= 3 ) …
call();
if ( morphism >= 4 ) …
call();
}
// regardless of morphism --
// use Square in the end
func = new Square();
}
@Benchmark
public int call() {
int x = nums[index];
index =
(index + 1) % nums.length;
return func.apply(x);
}
call 1 avgt 10 8.120 ± 0.103 ns/op
call 2 avgt 10 8.225 ± 0.113 ns/op
call 3 avgt 10 8.170 ± 0.329 ns/op
call 4 avgt 10 8.189 ± 0.241 ns/op
RESULTS
07A
(morphism)
NO CLIFF?
@Benchmark
public int callLoop() {
int sum = 0;
for ( int x: nums ) {
sum += call(x);
}
return sum;
}
public int call(int x) {
return func.apply(x);
}
07B
https://shipilev.net/blog/2015/black-magic-method-dispatch/
07B
RESULTS
callLoop 1 avgt 10 4079.187 ± 269.537 ns/op
callLoop 2 avgt 10 6090.224 ± 209.573 ns/op
callLoop 3 avgt 10 20508.673 ± 18484.645 ns/op
callLoop 4 avgt 10 20271.124 ± 17767.914 ns/op
(morphism)
PERFORMANCE
ISN’T ADDITIVE.
FAST + FAST = SLOW
PERFORMANCE IS FULL OF SURPRISES.
UNINTUITIVE
INTRICACIES
SENSITIVITIES
NON-OBVIOUS CLIFFS
NOT ADDITIVE
PERFORMANCE IS FULL OF SURPRISES.
DON’T WORRY *TOO* MUCH.
JUST WRITE CLEAN CODE.
ONLY WORRY ABOUT THE HOTTEST CODE,
IMPROVE AND MEASURE *CAREFULLY*.
REMEMBER BEST WAY TO ADD 1…1396
int x = 0;
x += 1; x += 2; x += 3; x += 4; x += 5; x += 6; x += 7; x += 8; x += 9; x += 10;
x += 11; x += 12; x += 13; x += 14; x += 15; x += 16; x += 17; x += 18; x += 19; x += 20;
x += 21; x += 22; x += 23; x += 24; x += 25; x += 26; x += 27; x += 28; x += 29; x += 30;
x += 31; x += 32; x += 33; x += 34; x += 35; x += 36; x += 37; x += 38; x += 39; x += 40;
x += 41; x += 42; x += 43; x += 44; x += 45; x += 46; x += 47; x += 48; x += 49; x += 50;
x += 51; x += 52; x += 53; x += 54; x += 55; x += 56; x += 57; x += 58; x += 59; x += 60;
x += 61; x += 62; x += 63; x += 64; x += 65; x += 66; x += 67; x += 68; x += 69; x += 70;
x += 71; x += 72; x += 73; x += 74; x += 75; x += 76; x += 77; x += 78; x += 79; x += 80;
x += 81; x += 82; x += 83; x += 84; x += 85; x += 86; x += 87; x += 88; x += 89; x += 90;
x += 91; x += 92; x += 93; x += 94; x += 95; x += 96; x += 97; x += 98; x += 99; x += 100;
x += 101; x += 102; x += 103; x += 104; x += 105; x += 106; x += 107; x += 108; x += 109; x += 110;
x += 111; x += 112; x += 113; x += 114; x += 115; x += 116; x += 117; x += 118; x += 119; x += 120;
x += 121; x += 122; x += 123; x += 124; x += 125; x += 126; x += 127; x += 128; x += 129; x += 130;
x += 131; x += 132; x += 133; x += 134; x += 135; x += 136; x += 137; x += 138; x += 139; x += 140;
x += 141; x += 142; x += 143; x += 144; x += 145; x += 146; x += 147; x += 148; x += 149; x += 150;
x += 151; x += 152; x += 153; x += 154; x += 155; x += 156; x += 157; x += 158; x += 159; x += 160;
x += 161; x += 162; x += 163; x += 164; x += 165; x += 166; x += 167; x += 168; x += 169; x += 170;
x += 171; x += 172; x += 173; x += 174; x += 175; x += 176; x += 177; x += 178; x += 179; x += 180;
x += 181; x += 182; x += 183; x += 184; x += 185; x += 186; x += 187; x += 188; x += 189; x += 190;
x += 191; x += 192; x += 193; x += 194; x += 195; x += 196; x += 197; x += 198; x += 199; x += 200;
x += 201; x += 202; x += 203; x += 204; x += 205; x += 206; x += 207; x += 208; x += 209; x += 210;
int n = 1395;
return n * (n + 1) / 2;A
B
C
int n = 1396;
return n * (n + 1) / 2;
int n = 1397;
return n * (n + 1) / 2;
2.515 ± 0.043 ns/op
2.532 ± 0.089 ns/op
2.580 ± 0.042 ns/op
01C
REFERENCES
ALEKSEY SHIPILËV
http://shipilev.net/
PSYCHOMATIC LOBOTOMY SAW
http://psy-lob-saw.blogspot.com/
MECHANICAL SYMPATHY
http://mechanical-sympathy.blogspot.com/
JAVA SPECIALIST NEWSLETTER
http://www.javaspecialists.eu/

Java Performance Puzzlers

  • 1.
    JAVA PERFORMANCE PUZZLERS DOUGLAS Q. HAWKINS LEADJIT DEVELOPER AZUL SYSTEMS @dougqh dougqh@gmail.com
  • 2.
    AGENDA LOOK AT SOMEINTERESTING OFTEN UNINTUITIVE PERFORMANCE CASES SEE WHAT WE CAN LEARN FROM THEM
  • 3.
    Adding 1..1395 Numbers Adding1..1396 Numbers A B C Adding 1..1397 Numbers
  • 4.
    1 100 10000 0 50 100150 200 250 300 350 400 450 logns iterations Interpreter 1st JIT 2nd JIT Repeat… Deoptimize & Repeat
  • 5.
    JMH JAVA MEASUREMENTHARNESS: @State(Scope.Benchmark) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class Benchmark { @Setup public void setup() {} @Benchmark public void benchmark1() { … } @Benchmark public void benchmark2() { … } … } 01A
  • 6.
    add1395 avgt 103.790 ± 0.300 ns/op add1396 avgt 10 3.784 ± 0.336 ns/op add1397 avgt 10 2767.567 ± 351.909 ns/op 01A RESULTS
  • 7.
  • 8.
    PERFORMANCE CLIFF: HUGE METHODLIMIT add1395 7993 bytes add1396 7999 bytes add1397 8005 bytes
  • 9.
    int x =0; for ( int i = 1; i <= 1395; ++i ) { x += i; } return x; A B C int x = 0; for ( int i = 1; i <= 1396; ++i ) { x += i; } return x; int x = 0; for ( int i = 1; i <= 1397; ++i ) { x += i; } return x; 01B
  • 10.
    add1395 avgt 10435.372 ± 19.225 ns/op add1396 avgt 10 436.955 ± 18.688 ns/op add1397 avgt 10 434.245 ± 11.635 ns/op NOW EQUALLY — BAD 01B
  • 11.
    System.arraycopy int[] dest =new int[src.length]; for ( int i = 0; i < src.length; ++i ) { dest[i] = src[i]; } array.clone A B C 02A
  • 12.
    arrayCopy avgt 1095598.200 ns/op cloneArray avgt 10 94848.814 ns/op manual avgt 10 93309.838 ns/op 02A RESULTS 100k
  • 13.
    int[] original =randomInts(100_000); int[] copy1 = new int[original.length]; long startTime1 = System.nanoTime(); for ( int i = 0; i < original.length; ++i ) { copy1[i] = original[i]; } System.out.printf("copy loop: % 10d ns%n", System.nanoTime() - startTime1); int[] copy2 = new int[original.length]; long startTime2 = System.nanoTime(); System.arraycopy( original, 0, copy2, 0, original.length); System.out.printf("arraycopy: % 10d ns%n", System.nanoTime() - startTime2); VS. 02B
  • 14.
    02B RESULTS copy loop: 1646957ns arraycopy: 54827 ns
  • 15.
  • 16.
  • 17.
  • 18.
    public java.lang.Integer valueOf(); Code: 0:aload_0 1: getfield offset:I 4: aload_0 5: getfield nums:[I 8: arraylength 9: if_icmplt 17 12: aload_0 13: iconst_0 14: putfield offset:I 17: aload_0 18: getfield nums:[I 21: aload_0 22: dup 23: getfield offset:I 26: dup_x1 27: iconst_1 28: iadd 29: putfield offset:I 32: iaload 33: invokestatic Integer.valueOf(I) 36: areturn public java.lang.Integer auto(); Code: 0: aload_0 1: getfield offset:I 4: aload_0 5: getfield nums:[I 8: arraylength 9: if_icmplt 17 12: aload_0 13: iconst_0 14: putfield offset:I 17: aload_0 18: getfield nums:[I 21: aload_0 22: dup 23: getfield offset:I 26: dup_x1 27: iconst_1 28: iadd 29: putfield offset:I 32: iaload 33: invokestatic Integer.valueOf(I) 36: areturn javap -c 03
  • 19.
    public static IntegervalueOf(int i) { if (i >= IntegerCache.low && i <= IntegerCache.high) return IntegerCache.cache[i + (-IntegerCache.low)]; return new Integer(i); } static final int low = -128; // configurable through // -Djava.lang.Integer.IntegerCache.high static final int high = 127;
  • 20.
    03 @Param({"100", "1000", "10000"}) privateint range; PARAMERIZING JMH @Setup public void setUp() { ThreadLocalRandom random = ThreadLocalRandom.current(); nums = new int[1_000_000]; for ( int i = 0; i < nums.length; ++i ) { nums[i] = random.nextInt(-range, range); } }
  • 21.
    03 RESULTSRESULTS (range) auto 100 avgt10 5.132 ± 0.316 ns/op auto 1000 avgt 10 8.184 ± 1.551 ns/op auto 10000 avgt 10 6.996 ± 1.401 ns/op new_ 100 avgt 10 6.328 ± 0.973 ns/op new_ 1000 avgt 10 6.083 ± 0.651 ns/op new_ 10000 avgt 10 6.243 ± 1.031 ns/op valueOf 100 avgt 10 5.096 ± 0.116 ns/op valueOf 1000 avgt 10 8.488 ± 1.957 ns/op valueOf 10000 avgt 10 7.155 ± 1.382 ns/op
  • 22.
    03 RESULTS -100 to 100-1,000 to 1,000 -10,000 to 10,000 new 6.328 ± 0.973 6.083 ± 0.651 6.243 ± 1.031 autobox 5.132 ± 0.316 8.184 ± 1.551 6.996 ± 1.401 valueOf 5.096 ± 0.116 8.488 ± 1.957 7.155 ± 1.382
  • 23.
  • 24.
    0 2000000 4000000 6000000 8000000 100 1000 10000100000 1000000 ints ArrayList LinkedList http://cr.openjdk.java.net/~shade/scratch/ArrayVsLinked.java
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    A B C D E list.toArray() list.toArray(new Object[0]) list.toArray(new Object[list.size()]) list.toArray(newString[0]) list.toArray(new String[list.size()]) https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 30.
    A B C D E list.toArray() list.toArray(new Object[0]) list.toArray(new Object[list.size()]) list.toArray(newString[0]) list.toArray(new String[list.size()]) SAYS… https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 31.
    toArray avgt 1054.084 ± 10.000 ns/op toArraySized avgt 10 58.555 ± 0.745 ns/op toArrayUnsized avgt 10 54.025 ± 0.343 ns/op toStringArraySize avgt 10 154.291 ± 2.060 ns/op toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op RESULTS https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 32.
    STRING[] SLOWER THANOBJECT[]? Object[] objects = new Integer[20]; objects[0] = “foo”; https://shipilev.net/blog/2016/arrays-wisdom-ancients/
  • 33.
    STRING[] SLOWER THANOBJECT[]? Object[] objects = new Integer[20]; objects[0] = “foo”; Possible Runtime Check Sometimes JIT Eliminates It https://shipilev.net/blog/2016/arrays-wisdom-ancients/
  • 34.
    TRICKY WHEN ARRAYIS PASSED IN list.toArray(new String[…]); OBJECT[] IS COMMONLY USED, SO SPECIAL CASE.
  • 35.
    toArray avgt 1054.084 ± 10.000 ns/op toArraySized avgt 10 58.555 ± 0.745 ns/op toArrayUnsized avgt 10 54.025 ± 0.343 ns/op toStringArraySize avgt 10 154.291 ± 2.060 ns/op toStringArrayUnsized avgt 10 135.603 ± 2.115 ns/op IS WRONG. https://shipilev.net/blog/2016/arrays-wisdom-ancients/ 04
  • 36.
    WHY IS NOARRAY / UNSIZED FASTER // allocate dest = malloc(sizeof(E) * len); // zero-initialize for ( int i = 0; i < len; ++i ) { dest[i] = null; } // copy for ( int i = 0; i < len; ++i ) { dest[i] = src[i]; } Dead Stores
  • 37.
    Integer.toString(NUM) “” + NUM A B C StringBuilderbuilder = new StringBuilder(); builder.append(NUM); builder.toString(); 05A
  • 38.
    static final intNUM = ThreadLocalRandom.current().nextInt() builder avgt 10 47.688 ± 3.866 ns/op concat avgt 10 33.118 ± 0.840 ns/op toString avgt 10 41.105 ± 2.005 ns/op 05A
  • 39.
    javap -c public java.lang.Stringconcat(); Code: 0: new StringBuilder 3: dup 4: invokespecial StringBuilder."<init>":()V 7: ldc "" 9: invokevirtual StringBuilder.append(LString;) 12: getstatic NUM:I 15: invokevirtual StringBuilder.append(I) 18: invokevirtual StringBuilder.toString() 21: areturn 05A
  • 40.
    public java.lang.String concat(){ return new StringBuilder(). append(“”). append(NUM). toString(); } IN JAVA... 05B
  • 41.
    return new StringBuilder(). append(“”). append(NUM). toString(); StringBuilderbuilder = new StringBuilder(); builder.append(NUM); VS. 05B return builder.toString();
  • 42.
    05B RESULTS builder avgt 1041.122 ± 1.688 ns/op concat avgt 10 35.173 ± 3.092 ns/op concatLikeBuilder avgt 10 32.536 ± 2.302 ns/op
  • 43.
  • 44.
    static final intNUM = 1 << 20 builder avgt 10 40.734 ± 0.997 ns/op concat avgt 10 3.343 ± 0.205 ns/op toString avgt 10 32.877 ± 0.450 ns/op 05C
  • 45.
    public java.lang.String concat(); Code: 0:ldc “1048576” 2: areturn public java.lang.String concat() { return “1048576”; } javap -c 05C
  • 46.
    CONSTANT FOLDING &PROPAGATION static final int NUM = 1 << 20 static final int NUM = 1_048_576 constant fold constant propagate “” + NUM “” + 1_048_576 constant fold “1048576” 05C
  • 47.
  • 48.
    String str =“”; for ( int i = 0; i < 100; ++i ) { str += i; } A B StringBuilder builder = new StringBuilder(); for ( int i = 0; i < 100; ++i ) { builder.append(i); } 06
  • 49.
    RESULTS builder avgt 10979.112 ± 59.893 ns/op concat avgt 10 2661.941 ± 135.251 ns/op 05
  • 50.
    FAST IN ONECONTEXT CAN BE SLOW IN ANOTHER.
  • 51.
    RESULTS speed objects allocated* memory consumed* concat 2661.941 ±135.251 ns/op 400 objects 71800 bytes builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes * Memory usage measured separately with Caliper 06
  • 52.
    RESULTS speed objects allocated* memory consumed* concat 2661.941 ±135.251 ns/op 400 objects 71800 bytes builder 979.112 ± 59.893 ns/op 9 objects 1640 bytes sizedBuilder 887.717 ± 62.569 ns/op 4 objects 1464 bytes * Memory usage measured separately with Caliper 06
  • 53.
    APPLIES TO COLLECTIONS,TOO. HashSet 16 HashMap 16 Hashtable 16 LinkedList 1 ArrayList 10 Vector 10 StringBuilder 16 StringBuffer 16 http://www.slideshare.net/cnbailey/memory-efficient-java
  • 54.
  • 55.
    obj.invoke with 1type obj.invoke with 2 type A B C obj.invoke with 3 types (monomoprhic) (bimorphic) (trimorphic) (megamorphic) 07A
  • 56.
    THERE IS ACLIFF AT 3 TYPES.
  • 57.
    https://shipilev.net/blog/2015/black-magic-method-dispatch/ func.apply(x); if ( func.getClass()== Square.class ) { x * x; } else if ( func.getClass() == Cube.class ) { x * x * x } else { … } func.apply(x); <= 2 types > 2 types
  • 58.
    @Setup public void setup(){ for ( int i = 0; i < 20_000; ++i ) { if ( morphism >= 1 ) func = new Square(); call(); if ( morphism >=2 ) func = new Cube(); call(); if ( morphism >= 3 ) … call(); if ( morphism >= 4 ) … call(); } // regardless of morphism -- // use Square in the end func = new Square(); } @Benchmark public int call() { int x = nums[index]; index = (index + 1) % nums.length; return func.apply(x); }
  • 59.
    call 1 avgt10 8.120 ± 0.103 ns/op call 2 avgt 10 8.225 ± 0.113 ns/op call 3 avgt 10 8.170 ± 0.329 ns/op call 4 avgt 10 8.189 ± 0.241 ns/op RESULTS 07A (morphism)
  • 60.
  • 61.
    @Benchmark public int callLoop(){ int sum = 0; for ( int x: nums ) { sum += call(x); } return sum; } public int call(int x) { return func.apply(x); } 07B https://shipilev.net/blog/2015/black-magic-method-dispatch/
  • 62.
    07B RESULTS callLoop 1 avgt10 4079.187 ± 269.537 ns/op callLoop 2 avgt 10 6090.224 ± 209.573 ns/op callLoop 3 avgt 10 20508.673 ± 18484.645 ns/op callLoop 4 avgt 10 20271.124 ± 17767.914 ns/op (morphism)
  • 63.
  • 64.
    PERFORMANCE IS FULLOF SURPRISES. UNINTUITIVE INTRICACIES SENSITIVITIES NON-OBVIOUS CLIFFS NOT ADDITIVE
  • 65.
    PERFORMANCE IS FULLOF SURPRISES. DON’T WORRY *TOO* MUCH. JUST WRITE CLEAN CODE. ONLY WORRY ABOUT THE HOTTEST CODE, IMPROVE AND MEASURE *CAREFULLY*.
  • 66.
    REMEMBER BEST WAYTO ADD 1…1396 int x = 0; x += 1; x += 2; x += 3; x += 4; x += 5; x += 6; x += 7; x += 8; x += 9; x += 10; x += 11; x += 12; x += 13; x += 14; x += 15; x += 16; x += 17; x += 18; x += 19; x += 20; x += 21; x += 22; x += 23; x += 24; x += 25; x += 26; x += 27; x += 28; x += 29; x += 30; x += 31; x += 32; x += 33; x += 34; x += 35; x += 36; x += 37; x += 38; x += 39; x += 40; x += 41; x += 42; x += 43; x += 44; x += 45; x += 46; x += 47; x += 48; x += 49; x += 50; x += 51; x += 52; x += 53; x += 54; x += 55; x += 56; x += 57; x += 58; x += 59; x += 60; x += 61; x += 62; x += 63; x += 64; x += 65; x += 66; x += 67; x += 68; x += 69; x += 70; x += 71; x += 72; x += 73; x += 74; x += 75; x += 76; x += 77; x += 78; x += 79; x += 80; x += 81; x += 82; x += 83; x += 84; x += 85; x += 86; x += 87; x += 88; x += 89; x += 90; x += 91; x += 92; x += 93; x += 94; x += 95; x += 96; x += 97; x += 98; x += 99; x += 100; x += 101; x += 102; x += 103; x += 104; x += 105; x += 106; x += 107; x += 108; x += 109; x += 110; x += 111; x += 112; x += 113; x += 114; x += 115; x += 116; x += 117; x += 118; x += 119; x += 120; x += 121; x += 122; x += 123; x += 124; x += 125; x += 126; x += 127; x += 128; x += 129; x += 130; x += 131; x += 132; x += 133; x += 134; x += 135; x += 136; x += 137; x += 138; x += 139; x += 140; x += 141; x += 142; x += 143; x += 144; x += 145; x += 146; x += 147; x += 148; x += 149; x += 150; x += 151; x += 152; x += 153; x += 154; x += 155; x += 156; x += 157; x += 158; x += 159; x += 160; x += 161; x += 162; x += 163; x += 164; x += 165; x += 166; x += 167; x += 168; x += 169; x += 170; x += 171; x += 172; x += 173; x += 174; x += 175; x += 176; x += 177; x += 178; x += 179; x += 180; x += 181; x += 182; x += 183; x += 184; x += 185; x += 186; x += 187; x += 188; x += 189; x += 190; x += 191; x += 192; x += 193; x += 194; x += 195; x += 196; x += 197; x += 198; x += 199; x += 200; x += 201; x += 202; x += 203; x += 204; x += 205; x += 206; x += 207; x += 208; x += 209; x += 210;
  • 67.
    int n =1395; return n * (n + 1) / 2;A B C int n = 1396; return n * (n + 1) / 2; int n = 1397; return n * (n + 1) / 2; 2.515 ± 0.043 ns/op 2.532 ± 0.089 ns/op 2.580 ± 0.042 ns/op 01C
  • 68.
    REFERENCES ALEKSEY SHIPILËV http://shipilev.net/ PSYCHOMATIC LOBOTOMYSAW http://psy-lob-saw.blogspot.com/ MECHANICAL SYMPATHY http://mechanical-sympathy.blogspot.com/ JAVA SPECIALIST NEWSLETTER http://www.javaspecialists.eu/