KEMBAR78
Unity's Evolving Best Practices | PDF
This slide deck was presented at Unite Berlin 2018.
This offline version includes numerous additional slides, cut
from the original presentation for brevity and/or time.
These extra slides contains more examples and data, but are
not essential for understanding the presentation.
Optimization & Best Practices:
Through The Ages
Ian Dundore
Unity Technologies
This guy again?
Spoiler Alert
• Scripting Performance
• Transforms
• Audio
• Animations
First:
An important message.
Even me. Especially me.
Profile everything.
Remember this?
oops.
• In the specific case of String.Equals, that advice is wrong!
• From a performance perspective, at least.
• For all other string comparisons, it’s right!
• Compare, StartsWith, EndsWith, IndexOf, etc.
• Again, from a performance perspective.
• (Psst! This is documented!)
https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#common-string-comparison-methods-in-net
Let’s test it.
Testing Considerations
• How does the code path differ with different inputs?
• What is the environment around the executing code?
• Runtime
• IL2CPP/Mono? .Net version?
• Hardware
• Pipeline depth, cache size, cache-line length
• # of cores, core affinity settings on threads, throttling
• What exactly is your test measuring?
Your Test Harness Matters!
Profiler.BeginSample(“Test A”);
for (int i=0; i<NUM_TESTS; ++i) {
DoAThing(i);
}
Profiler.EndSample();
int i = 0;
Profiler.BeginSample(“Test B”);
DoAThing(0);
while (i<NUM_TESTS) {
DoAThing(++i);
DoAThing(++i);
DoAThing(++i);
// … repeat a lot …
DoAThing(++i);
}
Profiler.EndSample();
Less Loop OverheadMore Loop Overhead
public bool Equals(String value) {
if (this == null)
throw new NullReferenceException();
if (value == null)
return false;
if (Object.ReferenceEquals(this, value))
return true;
if (this.Length != value.Length)
return false;
return EqualsHelper(this, value);
}
Mono’s String.cs (1)
What does EqualsHelper do?
• Uses unsafe code to pin strings to memory addresses.
• C-style integer comparison of raw bytes of the strings.
• Core is a special cache-friendly loop.
• 64-bit: Step through strings with a stride of 12 bytes.
while (length >= 12)
{
if (*(long*)a != *(long*)b) return false;
if (*(long*)(a+4) != *(long*)(b+4)) return false;
if (*(long*)(a+8) != *(long*)(b+8)) return false;
a += 12; b += 12; length -= 12;
}
public bool Equals(String value, StringComparison comparisonType) {
if (comparisonType < StringComparison.CurrentCulture ||
comparisonType > StringComparison.OrdinalIgnoreCase)
throw new ArgumentException(…);
Contract.EndContractBlock();
if ((Object)this == (Object)value) {
return true;
}
if ((Object)value == null) {
return false;
}
Mono’s String.cs (2)
switch (comparisonType) {
case StringComparison.CurrentCulture:
return (CultureInfo.CurrentCulture.CompareInfo.Compare(this,
value, CompareOptions.None) == 0);
case StringComparison.CurrentCultureIgnoreCase:
return (CultureInfo.CurrentCulture.CompareInfo.Compare(this,
value, CompareOptions.IgnoreCase) == 0);
case StringComparison.InvariantCulture:
return (CultureInfo.InvariantCulture.CompareInfo.Compare(this,
value, CompareOptions.None) == 0);
case StringComparison.InvariantCultureIgnoreCase:
return (CultureInfo.InvariantCulture.CompareInfo.Compare(this,
value, CompareOptions.IgnoreCase) == 0);
Mono’s String.cs (3)
case StringComparison.Ordinal:
if (this.Length != value.Length)
return false;
return EqualsHelper(this, value);
Mono’s String.cs (4)
But wait!
• For non-matching strings, length will often differ.
• But for length-invariant strings, first character usually differs.
• This optimization is found in CompareOrdinal, but not Equals.
public static int CompareOrdinal(String strA, String strB) {
if ((Object)strA == (Object)strB)
return 0;
if (strA == null)
return -1;
if (strB == null)
return 1;
// Most common case, first character is different.
if ((strA.m_firstChar - strB.m_firstChar) != 0)
return strA.m_firstChar - strB.m_firstChar;
return CompareOrdinalHelper(strA, strB);
}
This is getting silly.
public static int CompareOrdinal(String strA, int indexA,
String strB, int indexB, int length) {
if (strA == null || strB == null) {
if ((Object)strA==(Object)strB) { //they're both null;
return 0;
}
return (strA==null)? -1 : 1; //-1 if A is null, 1 if B is null.
}
return nativeCompareOrdinalEx(strA, indexA, strB, indexB, length);
}
An overload that goes almost directly to native code!
Test Design: 4 cases
• Case 1: Two identical strings.
• Case 2: Two strings of random characters of same length.
• Case 3: Two strings of random characters of same length.
• First characters identical, to bypass check in Compare.
• Case 4: Two strings of random characters, different lengths.
• Comparison’s worst case is bounded by the shorter string.
• Constrained range to 15-25 characters to be similar to above tests.
Mono 3.5
Identical Content
Identical Length
Random Content
Identical Length
First Char Equal
Identical Length
Random Content
Random Length
String.Equals 2.97 1.75 1.73 1.30
String.Equals
with Ordinal type
5.87 3.46 3.56 3.39
String.Compare 37.52 33.29 64.66 31.35
String.Compare
with Ordinal type
6.23 3.35 3.35 3.26
CompareOrdinal 5.68 3.10 3.18 2.99
CompareOrdinal
with Indices
5.53 3.30 3.42 3.95
Simple
Hand-Coded
5.46 1.75 2.18 1.40
100,000 comparisons. Timings in milliseconds.
Unity 2018.1.0f2, Windows Standalone, Mono 3.5, Core i7-3500K
Mono 3.5
Identical Content
Identical Length
Random Content
Identical Length
First Char Equal
Identical Length
Random Content
Random Length
String.Equals 3.23 1.80 1.82 1.21
String.Equals
with Ordinal type
3.84 2.13 2.03 1.38
String.Compare 34.72 28.70 63.03 29.74
String.Compare
with Ordinal type
5.16 1.75 2.68 1.65
CompareOrdinal 4.93 1.55 2.21 1.40
CompareOrdinal
with Indices
4.77 3.59 3.59 4.41
Simple
Hand-Coded
4.40 1.66 1.95 1.28
100,000 comparisons. Timings in milliseconds.
Unity 2018.1.0f2, Windows Standalone, Mono 4.6, Core i7-3500K
IL2CPP
Identical Content
Identical Length
Random Content
Identical Length
First Char Equal
Identical Length
Random Content
Random Length
String.Equals 2.61 1.26 1.27 0.95
String.Equals
with Ordinal type
5.38 3.80 3.84 3.66
String.Compare 39.12 29.32 60.56 28.01
String.Compare
with Ordinal type
4.84 3.58 3.62 3.52
CompareOrdinal 4.78 3.55 3.58 3.51
CompareOrdinal
with Indices
4.93 3.71 3.72 4.17
Simple
Hand-Coded
13.83 3.52 3.93 2.16
100,000 comparisons. Timings in milliseconds.
Unity 2018.1.0f2, Windows Standalone, IL2CPP 3.5, Core i7-6700K
IL2CPP
Identical Content
Identical Length
Random Content
Identical Length
First Char Equal
Identical Length
Random Content
Random Length
String.Equals 2.64 1.92 1.93 0.96
String.Equals
with Ordinal type
2.94 2.26 2.73 1.49
String.Compare 40.98 30.61 60.82 29.26
String.Compare
with Ordinal type
3.18 1.46 2.29 1.32
CompareOrdinal 2.99 1.18 2.06 1.12
CompareOrdinal
with Indices
5.56 3.93 4.08 4.41
Simple
Hand-Coded
14.14 3.78 4.14 2.35
100,000 comparisons. Timings in milliseconds.
Unity 2018.1.0f2, Windows Standalone, IL2CPP 4.6, Core i7-6700K
Raw Data
String.Equals/Random on Mono 3.5 = 1
Conclusions & more questions
• String.Equals clearly wins for plain string comparison.
• .NET 4.6 has improvements for String.Compare variants.
• Ordinal comparisons clearly win on culture-sensitive APIs.
• Use String.CompareOrdinal instead of String.Compare.
• Use StringComparison.Ordinal on other String APIs.
• How does this map across platforms?
IL2CPP
Identical Content
Identical Length
Random Content
Identical Length
First Char Equal
Identical Length
Random Content
Random Length
String.Equals 13.48 5.08 5.01 5.26
String.Equals
with Ordinal type
25.42 19.46 19.85 14.16
String.Compare 118.80 128.69 254.30 124.81
String.Compare
with Ordinal type
24.23 11.49 11.57 10.95
CompareOrdinal 23.92 11.09 11.54 10.75
CompareOrdinal
with Indices
23.79 14.76 18.62 15.05
Simple
Hand-Coded
58.02 12.04 21.86 8.13
100,000 comparisons. Timings in milliseconds.
Unity 2018.1.0f2, iOS, IL2CPP 3.5, iPad Mini 3
String.Equals/Random on given platform = 1
Very similar results, in this case.
Another tip!
• See a lot of time going to NullCheck in IL2CPP builds?
• Disable these checks in release builds!
• Works on types, methods & properties.
• Code is in IL2CppSetOptionAttribute.cs, under Unity install folder
[Il2CppSetOption(Option.NullChecks, false)]
public bool MyEquals(String strA, String strB) {
// …
}
IL2CPP
Identical Content
Identical Length
Random Content
Random Length
Normal 58.02 8.13
NullCheck
Disabled
53.02 7.03
100,000 comparisons. Timings in milliseconds.
Unity 2018.1.0f2, iOS, IL2CPP 3.5, iPad Mini 3
Small, but helpful.
Transforms
* no, not this kind of transform
5.3: Discrete Objects
A
B
C
D
Hierarchy
A
D
C
B
Memory
OnTransformChanged
• Internal message, broadcast each time a Transform changes
• Position, rotation, scale, parent, sibling order, etc.
• Tells other components to update their internal state
• PhysX/Box2D, UnityUI, Renderers (AABBs), etc.
• Repeated messages can cause performance problems
• Use Transform.SetPositionAndRotation (5.6+)
5.4+: Contiguous buffers
A
B
C
D
Hierarchy
A
B
C
D
Memory
TransformHierarchy structure
Enter the Dispatch
• TransformChangeDispatch was first introduced in 5.4
• Other systems migrated to use it, slowly.
• Renderers in 5.6
• Animators in 2017.1
• Physics in 2017.2
• RectTransforms in 2017.3
• OnTransformChanged was removed entirely in 2018.1
How Transforms are structured
• 1 TransformHierarchy structure represents a root Transform
• Contains buffers tracking data of all transforms in hierarchy
• TRS, indices for parents & siblings
• Interest bitmask & dirty bitmask
• Internal systems register interest & track state via specific bits
• Physics is one bit, renderer is another bit, etc.
• System walks affected parts of TransformHierarachy structure
• dirtyMask |= -1 & interestMask
When are async changes applied?
• TCD keeps a list of dirty TransformHierarchy pointers
• Systems request list of changed Transforms before running.
• e.g. Before FixedUpdate, before rendering, before animating.
• Use list to update internal system state.
• TCD iterates over list of dirty TransformHierarchies.
• Iterates over all Transforms to check each dirty bit.
Quick Reminder
• Buffer size: Transform.hierarchyCapacity
• Set before mass reparenting operations!
• Reparent & reposition during instantiate!
• GameObject.Instantiate( prefab, parent );
• GameObject.Instantiate( prefab, parent, position, rotation );
Split your hierarchies.
• Changing any Transform marks the whole Hierarchy dirty.
• Dirty hierarchies must be fully examined for change bits.
• Smaller hierarchies = more granular Hierarchy tracking.
• Smaller hierarchies = fewer Transforms to check.
• Fewer roots = more Transforms to check for changes.
• Change checks are jobified, but operate on roots.
Extreme cases.
UnparentedParented
100 Root GameObjects
+ 1000 empty GameObjects
+ 1 Cube w/ Rotation script
100,000 empty GameObjects
100 Cubes w/ Rotation script
A welcome effect.
Parented Unparented
Main Thread 553 ms 32 ms
Worker Threads 139 ms 14 ms
100 Rotating Cubes, 100k Empty GameObjects.
iPad Mini 3. CPU time used over 10 seconds.
This is just checking hierarchies!
Parented Unparented
Main Thread 1.77 ms 0.11 ms
100 Rotating Cubes, 100k Empty GameObjects.
iPad Mini 3. CPU time used over 10 seconds.
“PostLateUpdate.UpdateAllRenderers”
Transforms & Physics: 2017.2+
• 2017.1/older: Physics components were synced to Transforms.
• Each Transform change = expensive update of Physics scene.
• 2017.2/newer: Updates can be delayed to next FixedUpdate.
• Update Physics entities from set of changed Transforms.
• Re-indexing computations are batched.
• This could have side-effects!
• Move a Collider + immediately Raycast towards it? No bueno.
Physics.AutoSyncTransforms
• When true, forces legacy behavior.
• Colliders/Rigidbodies check for syncs on every Physics call.
• Yes, every Raycast, Spherecast, etc.
• Huge performance regression, if you’re not batching updates.
• When false, uses delayed-update behavior.
• Can force updates: Physics.SyncTransforms
• Default value is true in 2017.2 through 2018.2
• 2018.3 is the first version where the default is false.
void Update()
{
float rotAmt = 2f * Time.deltaTime;
Vector3 up = Vector3.up;
if (batched)
{
for(int i = 0; i < NUM_PARENTS; ++i)
rotators[i].Rotate(up, rotAmt);
for(int i = 0; i < NUM_PARENTS; ++i)
Physics.Raycast(Vector3.zero, Random.insideUnitSphere);
}
else
{
for (int i = 0; i < NUM_PARENTS; ++i)
{
rotators[i].Rotate(up, rotAmt);
Physics.Raycast(Vector3.zero, Random.insideUnitSphere);
}
}
}
A test.
“Batched”
“Immediate”
Seriously, a big effect.
Parented
Immediate
Unparented
Immediate
Parented
Batched
Unparented
Batched
Script 4450 ms 4270 ms 1980 ms 882 ms
Physics 1410 ms 1820 ms 1840 ms 1770 ms
100 Rotating Cubes, Rigidbodies, Trigger Box Colliders. 100k Empty GameObjects.
App Framerate: 30. Physics Timestep 0.04 sec.
iPad Mini 3. CPU time used over 10 seconds.
Audio
The Basics
• Unity uses FMOD internally.
• Audio decoding & playback occurs on separate threads.
• Unity supports a handful of codecs.
• PCM
• ADPCM
• Vorbis
• MP3
Audio “Load Type” Setting
• Decompress On Load
• Decoding & file I/O happen at load time only.
• Compressed In Memory
• Decoding happens during playback.
• Streamed
• File I/O & decoding happen during playback.
Every frame…
• Unity iterates over all active Audio Sources.
• Calculates distance to Listener(s).
• FMOD mixes active Audio Sources (“voices”).
• True volume = Volume setting * distance to listener * clip.
• If the clip is compressed, FMOD must decode audio data
• Chooses X loudest voices to mix together.
• X = “Real Voices” audio setting.
Everything is done in software.
• Decoding & mixing are done entirely in software.
• Mixing occurs on the FMOD thread.
• Decoding occurs at loading time or on the FMOD thread.
• All playing voices are evaluated and mixed.
• Max number of voices is controlled by Audio settings.
A trap.
This voice is Muted.
This voice is Active.
This voice will not be heard,
but the Clip must be processed.
A warning.
• AudioSystem.Update is Unity updating the AudioSources which
are submitted to FMOD for playback.
• Audio decoding does not show up in the Unity CPU Profiler.
Check both places!
• Decoding & mixing audio is in the details of the Audio profiler.
Audio CPU usage by codec.
10 Voices 100 Voices 500 Voices
PCM 1.5% 5.0% 5.7%
ADPCM 5.2% 16.6% 11.6%
MP3 13.3% 35.0% 23.3%
Vorbis 12.5% 30.3% 21.2%
Identical AudioClip, multiple AudioSources. MP3 & Vorbis Quality = 100.
WTF?
10 Voices 100 Voices 500 Voices
PCM 1.5% 5.0% 5.7%
ADPCM 5.2% 16.6% 11.6%
MP3 13.3% 35.0% 23.3%
Vorbis 12.5% 30.3% 21.2%
Identical AudioClip, multiple AudioSources. MP3 & Vorbis Quality = 100.
Oh. Profiler interference.
~Test time~ <(^^<) (>^^)>
• Identical 4 minute audio clip, copied 4 times.
• Once per codec under test.
• Varying number of AudioSources.
• Captured CPU time on main & FMOD threads
• Sum of CPU time consumed over 10 seconds real-time
Again.
10 Clips 100 Clips 500 Clips
PCM 95 ms 467 ms 2040 ms
ADPCM 89 ms 474 ms 2070 ms
MP3 84 ms 469 ms 2030 ms
Vorbis 93 ms 473 ms 1990 ms
CPU time on main thread, 10 seconds real-time.
With intensity.
10 Voices 100 Voices 500 Voices
PCM 214 ms 451 ms 634 ms
ADPCM 485 ms 1391 ms 1591 ms
MP3 1058 ms 4061 ms 4167 ms
Vorbis 1161 ms 3408 ms 3629 ms
CPU time on all FMOD threads, 10 seconds real-time.
Principles.
• Avoid having many audio sources set to Mute.
• Disable/Stop instead of Mute, if possible.
• If you can afford the memory overhead, Decompress on Load.
• Best for short clips that are frequently played.
• Avoid playing lots of compressed Clips, especially on mobile.
Or clamp the voice count.
10
Playing Clips
100
Playing Clips
500
Playing Clips
512 VV 318 ms 923 ms 2708 ms
100 VV 304 ms 905 ms 1087 ms
10 VV 315 ms 350 ms 495 ms
1 VV 173 ms 210 ms 361 ms
PCM. CPU time on FMOD + Main threads, 10 seconds real-time.
How, you ask?
public void SetNumVoices(int nv) {
var config = AudioSettings.GetConfiguration();
if(config.numVirtualVoices == nv)
return;
config.numVirtualVoices = nv;
config.numRealVoices = Mathf.Clamp(config.numRealVoices,
1, config.numVirtualVoices);
AudioSettings.Reset(config);
}
Just an example! Probably too simple for real use.
Animation & Animator
Animator
• Formerly called Mecanim.
• Graph of logical states.
• Blends between states.
• States contain animation clips and/or
blend trees.
• Animator component attached to
GameObject
• AnimatorController referenced by
Animator component.
Playables
• Technology underlying Animator & Timeline.
• Generic framework for “stuff that can be played back”.
• Animation clips, audio clips, video clips, etc.
• Docs: https://docs.unity3d.com/Manual/Playables.html
Animation
• Unity’s original animation system.
• Custom code
• Not based on Playables.
• Very simple: plays an animation clip.
• Can crossfade, loop.
Let’s test it.
The Test
• 100 GameObjects with Animator or Animation component
• Animator uses simple AnimatorController: 1 state, looping
• Animation plays back 1 AnimationClip, looping
0 ms
10 ms
20 ms
30 ms
1 100 200 300 400 500 600 700 800
Animation Animator
100 Components, Variable Curve Count, iPad Mini 3
TimeperFrame
How do cores affect it?
0 ms
1 ms
2 ms
3 ms
1 100 200 300 400 500 600 700 800
Animation Animator
100 Components, Variable Curve Count, Win10/Core i7
TimeperFrame Crossover on iPad Mini 3
0 ms
1 ms
2 ms
3 ms
1 100 200 300 400 500
Animation Animator
100 Curves, Variable Component Count, Win10/Core i7
TimeperFrame
Scaling Factors
• Performance is heavily dependent on curve & core count.
• Fewer cores: Animation retains advantage longer.
• More cores: Animator rapidly outperforms Animation.
• Both systems scale linearly as number of Components rises.
• “Best” system determined by target hardware vs curve count.
• Use Animation for simple animations.
• Use Animators for high curve counts or complex scenarios.
0 ms
13 ms
27 ms
40 ms
1 100 200 300 400 500
Animation Animator
100 Curves, Variable Component Count, iPad Mini 3
TimeperFrame
What about “constant” curves?
• Still interpolated at runtime.
• No measurable impact on CPU usage.
• Significant memory/file savings.
• Example: 11kb vs. 3.7kb for 100 position curves (XYZ)
What about Animator’s
cool features?
Be careful with Layers!
• The active state on each layer will be evaluated once per frame.
• Layer Weight does not matter.
• Weight=0? Still evaluated!
• This is to ensure that state is correct.
• Zero-weight layers = waste work
• Use layers sparingly!
(Yes, the docs are wrong.)
The Cost of Layering
1 Layer 2 Layers 3 Layers 4 Layers 5 Layers
Aggregate 1966 ms 2260 ms 2510 ms 2690 ms 2890 ms
Per Frame 10.08 ms 11.77 ms 12.86 ms 14.31 ms 17.65 ms
50 x “Ellen” from 3D Gamekit. Unity 2018.1.0f2.
Main Thread CPU time consumed during 10 Seconds Realtime.
iPad Mini 3.
What about Layer Masks?
Nope.
50 x “Ellen” from 3D Gamekit. Layers 2-5 Masked.
Main Thread CPU time consumed during 10 Seconds Realtime.
Unity 2018.1.0f2. iPad Mini 3.
1 Layer 2 Layers 3 Layers 4 Layers 5 Layers
Unmasked 1966 ms 2260 ms 2510 ms 2690 ms 2890 ms
60/108
Masked
1992 ms 2230 ms 2530 ms 2740 ms 2920 ms
Use the right rig!
• The Humanoid rig runs IK & retargeting calculations.
• The Generic rig does not.
1 Layer 2 Layers 3 Layers 4 Layers 5 Layers
Generic 1966 ms 2260 ms 2510 ms 2690 ms 2890 ms
Humanoid 2775 ms 3210 ms 3510 ms 3730 ms 4020 ms
Identical test to previous slide, different Rig import settings.
The pooling problem
• Animators reset their state when their GameObject is disabled.
• The only workaround? Disable Animator component, not
GameObject.
• Leads to messy side effects, like having to manage other
components (e.g. Colliders/Rigidbodies) manually.
• This made Animator-driven objects difficult to pool.
There’s an API to fix it, now!
• Animator.keepControllerStateOnDisable
• Available in 2018.1+
• If true, Animators do not discard data buffers when their
GameObject is disabled.
• Awesome for pooling!
• Careful of the higher memory usage of disabled Animators!
One Last Thing.
Thank these people!
Danke Schön!
Fragen?
Thank you!
Questions?

Unity's Evolving Best Practices

  • 1.
    This slide deckwas presented at Unite Berlin 2018. This offline version includes numerous additional slides, cut from the original presentation for brevity and/or time. These extra slides contains more examples and data, but are not essential for understanding the presentation.
  • 2.
    Optimization & BestPractices: Through The Ages Ian Dundore Unity Technologies
  • 3.
  • 4.
    Spoiler Alert • ScriptingPerformance • Transforms • Audio • Animations
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    oops. • In thespecific case of String.Equals, that advice is wrong! • From a performance perspective, at least. • For all other string comparisons, it’s right! • Compare, StartsWith, EndsWith, IndexOf, etc. • Again, from a performance perspective. • (Psst! This is documented!) https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#common-string-comparison-methods-in-net
  • 10.
  • 11.
    Testing Considerations • Howdoes the code path differ with different inputs? • What is the environment around the executing code? • Runtime • IL2CPP/Mono? .Net version? • Hardware • Pipeline depth, cache size, cache-line length • # of cores, core affinity settings on threads, throttling • What exactly is your test measuring?
  • 12.
    Your Test HarnessMatters! Profiler.BeginSample(“Test A”); for (int i=0; i<NUM_TESTS; ++i) { DoAThing(i); } Profiler.EndSample(); int i = 0; Profiler.BeginSample(“Test B”); DoAThing(0); while (i<NUM_TESTS) { DoAThing(++i); DoAThing(++i); DoAThing(++i); // … repeat a lot … DoAThing(++i); } Profiler.EndSample(); Less Loop OverheadMore Loop Overhead
  • 13.
    public bool Equals(Stringvalue) { if (this == null) throw new NullReferenceException(); if (value == null) return false; if (Object.ReferenceEquals(this, value)) return true; if (this.Length != value.Length) return false; return EqualsHelper(this, value); } Mono’s String.cs (1)
  • 14.
    What does EqualsHelperdo? • Uses unsafe code to pin strings to memory addresses. • C-style integer comparison of raw bytes of the strings. • Core is a special cache-friendly loop. • 64-bit: Step through strings with a stride of 12 bytes. while (length >= 12) { if (*(long*)a != *(long*)b) return false; if (*(long*)(a+4) != *(long*)(b+4)) return false; if (*(long*)(a+8) != *(long*)(b+8)) return false; a += 12; b += 12; length -= 12; }
  • 15.
    public bool Equals(Stringvalue, StringComparison comparisonType) { if (comparisonType < StringComparison.CurrentCulture || comparisonType > StringComparison.OrdinalIgnoreCase) throw new ArgumentException(…); Contract.EndContractBlock(); if ((Object)this == (Object)value) { return true; } if ((Object)value == null) { return false; } Mono’s String.cs (2)
  • 16.
    switch (comparisonType) { caseStringComparison.CurrentCulture: return (CultureInfo.CurrentCulture.CompareInfo.Compare(this, value, CompareOptions.None) == 0); case StringComparison.CurrentCultureIgnoreCase: return (CultureInfo.CurrentCulture.CompareInfo.Compare(this, value, CompareOptions.IgnoreCase) == 0); case StringComparison.InvariantCulture: return (CultureInfo.InvariantCulture.CompareInfo.Compare(this, value, CompareOptions.None) == 0); case StringComparison.InvariantCultureIgnoreCase: return (CultureInfo.InvariantCulture.CompareInfo.Compare(this, value, CompareOptions.IgnoreCase) == 0); Mono’s String.cs (3)
  • 17.
    case StringComparison.Ordinal: if (this.Length!= value.Length) return false; return EqualsHelper(this, value); Mono’s String.cs (4)
  • 18.
    But wait! • Fornon-matching strings, length will often differ. • But for length-invariant strings, first character usually differs. • This optimization is found in CompareOrdinal, but not Equals. public static int CompareOrdinal(String strA, String strB) { if ((Object)strA == (Object)strB) return 0; if (strA == null) return -1; if (strB == null) return 1; // Most common case, first character is different. if ((strA.m_firstChar - strB.m_firstChar) != 0) return strA.m_firstChar - strB.m_firstChar; return CompareOrdinalHelper(strA, strB); }
  • 19.
    This is gettingsilly. public static int CompareOrdinal(String strA, int indexA, String strB, int indexB, int length) { if (strA == null || strB == null) { if ((Object)strA==(Object)strB) { //they're both null; return 0; } return (strA==null)? -1 : 1; //-1 if A is null, 1 if B is null. } return nativeCompareOrdinalEx(strA, indexA, strB, indexB, length); } An overload that goes almost directly to native code!
  • 20.
    Test Design: 4cases • Case 1: Two identical strings. • Case 2: Two strings of random characters of same length. • Case 3: Two strings of random characters of same length. • First characters identical, to bypass check in Compare. • Case 4: Two strings of random characters, different lengths. • Comparison’s worst case is bounded by the shorter string. • Constrained range to 15-25 characters to be similar to above tests.
  • 21.
    Mono 3.5 Identical Content IdenticalLength Random Content Identical Length First Char Equal Identical Length Random Content Random Length String.Equals 2.97 1.75 1.73 1.30 String.Equals with Ordinal type 5.87 3.46 3.56 3.39 String.Compare 37.52 33.29 64.66 31.35 String.Compare with Ordinal type 6.23 3.35 3.35 3.26 CompareOrdinal 5.68 3.10 3.18 2.99 CompareOrdinal with Indices 5.53 3.30 3.42 3.95 Simple Hand-Coded 5.46 1.75 2.18 1.40 100,000 comparisons. Timings in milliseconds. Unity 2018.1.0f2, Windows Standalone, Mono 3.5, Core i7-3500K
  • 22.
    Mono 3.5 Identical Content IdenticalLength Random Content Identical Length First Char Equal Identical Length Random Content Random Length String.Equals 3.23 1.80 1.82 1.21 String.Equals with Ordinal type 3.84 2.13 2.03 1.38 String.Compare 34.72 28.70 63.03 29.74 String.Compare with Ordinal type 5.16 1.75 2.68 1.65 CompareOrdinal 4.93 1.55 2.21 1.40 CompareOrdinal with Indices 4.77 3.59 3.59 4.41 Simple Hand-Coded 4.40 1.66 1.95 1.28 100,000 comparisons. Timings in milliseconds. Unity 2018.1.0f2, Windows Standalone, Mono 4.6, Core i7-3500K
  • 23.
    IL2CPP Identical Content Identical Length RandomContent Identical Length First Char Equal Identical Length Random Content Random Length String.Equals 2.61 1.26 1.27 0.95 String.Equals with Ordinal type 5.38 3.80 3.84 3.66 String.Compare 39.12 29.32 60.56 28.01 String.Compare with Ordinal type 4.84 3.58 3.62 3.52 CompareOrdinal 4.78 3.55 3.58 3.51 CompareOrdinal with Indices 4.93 3.71 3.72 4.17 Simple Hand-Coded 13.83 3.52 3.93 2.16 100,000 comparisons. Timings in milliseconds. Unity 2018.1.0f2, Windows Standalone, IL2CPP 3.5, Core i7-6700K
  • 24.
    IL2CPP Identical Content Identical Length RandomContent Identical Length First Char Equal Identical Length Random Content Random Length String.Equals 2.64 1.92 1.93 0.96 String.Equals with Ordinal type 2.94 2.26 2.73 1.49 String.Compare 40.98 30.61 60.82 29.26 String.Compare with Ordinal type 3.18 1.46 2.29 1.32 CompareOrdinal 2.99 1.18 2.06 1.12 CompareOrdinal with Indices 5.56 3.93 4.08 4.41 Simple Hand-Coded 14.14 3.78 4.14 2.35 100,000 comparisons. Timings in milliseconds. Unity 2018.1.0f2, Windows Standalone, IL2CPP 4.6, Core i7-6700K
  • 25.
  • 26.
  • 27.
    Conclusions & morequestions • String.Equals clearly wins for plain string comparison. • .NET 4.6 has improvements for String.Compare variants. • Ordinal comparisons clearly win on culture-sensitive APIs. • Use String.CompareOrdinal instead of String.Compare. • Use StringComparison.Ordinal on other String APIs. • How does this map across platforms?
  • 28.
    IL2CPP Identical Content Identical Length RandomContent Identical Length First Char Equal Identical Length Random Content Random Length String.Equals 13.48 5.08 5.01 5.26 String.Equals with Ordinal type 25.42 19.46 19.85 14.16 String.Compare 118.80 128.69 254.30 124.81 String.Compare with Ordinal type 24.23 11.49 11.57 10.95 CompareOrdinal 23.92 11.09 11.54 10.75 CompareOrdinal with Indices 23.79 14.76 18.62 15.05 Simple Hand-Coded 58.02 12.04 21.86 8.13 100,000 comparisons. Timings in milliseconds. Unity 2018.1.0f2, iOS, IL2CPP 3.5, iPad Mini 3
  • 29.
    String.Equals/Random on givenplatform = 1 Very similar results, in this case.
  • 30.
    Another tip! • Seea lot of time going to NullCheck in IL2CPP builds? • Disable these checks in release builds! • Works on types, methods & properties. • Code is in IL2CppSetOptionAttribute.cs, under Unity install folder [Il2CppSetOption(Option.NullChecks, false)] public bool MyEquals(String strA, String strB) { // … }
  • 31.
    IL2CPP Identical Content Identical Length RandomContent Random Length Normal 58.02 8.13 NullCheck Disabled 53.02 7.03 100,000 comparisons. Timings in milliseconds. Unity 2018.1.0f2, iOS, IL2CPP 3.5, iPad Mini 3 Small, but helpful.
  • 32.
    Transforms * no, notthis kind of transform
  • 33.
  • 34.
    OnTransformChanged • Internal message,broadcast each time a Transform changes • Position, rotation, scale, parent, sibling order, etc. • Tells other components to update their internal state • PhysX/Box2D, UnityUI, Renderers (AABBs), etc. • Repeated messages can cause performance problems • Use Transform.SetPositionAndRotation (5.6+)
  • 35.
  • 36.
    Enter the Dispatch •TransformChangeDispatch was first introduced in 5.4 • Other systems migrated to use it, slowly. • Renderers in 5.6 • Animators in 2017.1 • Physics in 2017.2 • RectTransforms in 2017.3 • OnTransformChanged was removed entirely in 2018.1
  • 37.
    How Transforms arestructured • 1 TransformHierarchy structure represents a root Transform • Contains buffers tracking data of all transforms in hierarchy • TRS, indices for parents & siblings • Interest bitmask & dirty bitmask • Internal systems register interest & track state via specific bits • Physics is one bit, renderer is another bit, etc. • System walks affected parts of TransformHierarachy structure • dirtyMask |= -1 & interestMask
  • 38.
    When are asyncchanges applied? • TCD keeps a list of dirty TransformHierarchy pointers • Systems request list of changed Transforms before running. • e.g. Before FixedUpdate, before rendering, before animating. • Use list to update internal system state. • TCD iterates over list of dirty TransformHierarchies. • Iterates over all Transforms to check each dirty bit.
  • 40.
    Quick Reminder • Buffersize: Transform.hierarchyCapacity • Set before mass reparenting operations! • Reparent & reposition during instantiate! • GameObject.Instantiate( prefab, parent ); • GameObject.Instantiate( prefab, parent, position, rotation );
  • 41.
    Split your hierarchies. •Changing any Transform marks the whole Hierarchy dirty. • Dirty hierarchies must be fully examined for change bits. • Smaller hierarchies = more granular Hierarchy tracking. • Smaller hierarchies = fewer Transforms to check. • Fewer roots = more Transforms to check for changes. • Change checks are jobified, but operate on roots.
  • 42.
    Extreme cases. UnparentedParented 100 RootGameObjects + 1000 empty GameObjects + 1 Cube w/ Rotation script 100,000 empty GameObjects 100 Cubes w/ Rotation script
  • 43.
    A welcome effect. ParentedUnparented Main Thread 553 ms 32 ms Worker Threads 139 ms 14 ms 100 Rotating Cubes, 100k Empty GameObjects. iPad Mini 3. CPU time used over 10 seconds.
  • 44.
    This is justchecking hierarchies! Parented Unparented Main Thread 1.77 ms 0.11 ms 100 Rotating Cubes, 100k Empty GameObjects. iPad Mini 3. CPU time used over 10 seconds. “PostLateUpdate.UpdateAllRenderers”
  • 45.
    Transforms & Physics:2017.2+ • 2017.1/older: Physics components were synced to Transforms. • Each Transform change = expensive update of Physics scene. • 2017.2/newer: Updates can be delayed to next FixedUpdate. • Update Physics entities from set of changed Transforms. • Re-indexing computations are batched. • This could have side-effects! • Move a Collider + immediately Raycast towards it? No bueno.
  • 46.
    Physics.AutoSyncTransforms • When true,forces legacy behavior. • Colliders/Rigidbodies check for syncs on every Physics call. • Yes, every Raycast, Spherecast, etc. • Huge performance regression, if you’re not batching updates. • When false, uses delayed-update behavior. • Can force updates: Physics.SyncTransforms • Default value is true in 2017.2 through 2018.2 • 2018.3 is the first version where the default is false.
  • 47.
    void Update() { float rotAmt= 2f * Time.deltaTime; Vector3 up = Vector3.up; if (batched) { for(int i = 0; i < NUM_PARENTS; ++i) rotators[i].Rotate(up, rotAmt); for(int i = 0; i < NUM_PARENTS; ++i) Physics.Raycast(Vector3.zero, Random.insideUnitSphere); } else { for (int i = 0; i < NUM_PARENTS; ++i) { rotators[i].Rotate(up, rotAmt); Physics.Raycast(Vector3.zero, Random.insideUnitSphere); } } } A test. “Batched” “Immediate”
  • 48.
    Seriously, a bigeffect. Parented Immediate Unparented Immediate Parented Batched Unparented Batched Script 4450 ms 4270 ms 1980 ms 882 ms Physics 1410 ms 1820 ms 1840 ms 1770 ms 100 Rotating Cubes, Rigidbodies, Trigger Box Colliders. 100k Empty GameObjects. App Framerate: 30. Physics Timestep 0.04 sec. iPad Mini 3. CPU time used over 10 seconds.
  • 49.
  • 50.
    The Basics • Unityuses FMOD internally. • Audio decoding & playback occurs on separate threads. • Unity supports a handful of codecs. • PCM • ADPCM • Vorbis • MP3
  • 51.
    Audio “Load Type”Setting • Decompress On Load • Decoding & file I/O happen at load time only. • Compressed In Memory • Decoding happens during playback. • Streamed • File I/O & decoding happen during playback.
  • 52.
    Every frame… • Unityiterates over all active Audio Sources. • Calculates distance to Listener(s). • FMOD mixes active Audio Sources (“voices”). • True volume = Volume setting * distance to listener * clip. • If the clip is compressed, FMOD must decode audio data • Chooses X loudest voices to mix together. • X = “Real Voices” audio setting.
  • 53.
    Everything is donein software. • Decoding & mixing are done entirely in software. • Mixing occurs on the FMOD thread. • Decoding occurs at loading time or on the FMOD thread. • All playing voices are evaluated and mixed. • Max number of voices is controlled by Audio settings.
  • 54.
    A trap. This voiceis Muted. This voice is Active. This voice will not be heard, but the Clip must be processed.
  • 55.
    A warning. • AudioSystem.Updateis Unity updating the AudioSources which are submitted to FMOD for playback. • Audio decoding does not show up in the Unity CPU Profiler.
  • 56.
    Check both places! •Decoding & mixing audio is in the details of the Audio profiler.
  • 57.
    Audio CPU usageby codec. 10 Voices 100 Voices 500 Voices PCM 1.5% 5.0% 5.7% ADPCM 5.2% 16.6% 11.6% MP3 13.3% 35.0% 23.3% Vorbis 12.5% 30.3% 21.2% Identical AudioClip, multiple AudioSources. MP3 & Vorbis Quality = 100.
  • 58.
    WTF? 10 Voices 100Voices 500 Voices PCM 1.5% 5.0% 5.7% ADPCM 5.2% 16.6% 11.6% MP3 13.3% 35.0% 23.3% Vorbis 12.5% 30.3% 21.2% Identical AudioClip, multiple AudioSources. MP3 & Vorbis Quality = 100.
  • 59.
  • 60.
    ~Test time~ <(^^<)(>^^)> • Identical 4 minute audio clip, copied 4 times. • Once per codec under test. • Varying number of AudioSources. • Captured CPU time on main & FMOD threads • Sum of CPU time consumed over 10 seconds real-time
  • 61.
    Again. 10 Clips 100Clips 500 Clips PCM 95 ms 467 ms 2040 ms ADPCM 89 ms 474 ms 2070 ms MP3 84 ms 469 ms 2030 ms Vorbis 93 ms 473 ms 1990 ms CPU time on main thread, 10 seconds real-time.
  • 62.
    With intensity. 10 Voices100 Voices 500 Voices PCM 214 ms 451 ms 634 ms ADPCM 485 ms 1391 ms 1591 ms MP3 1058 ms 4061 ms 4167 ms Vorbis 1161 ms 3408 ms 3629 ms CPU time on all FMOD threads, 10 seconds real-time.
  • 63.
    Principles. • Avoid havingmany audio sources set to Mute. • Disable/Stop instead of Mute, if possible. • If you can afford the memory overhead, Decompress on Load. • Best for short clips that are frequently played. • Avoid playing lots of compressed Clips, especially on mobile.
  • 64.
    Or clamp thevoice count. 10 Playing Clips 100 Playing Clips 500 Playing Clips 512 VV 318 ms 923 ms 2708 ms 100 VV 304 ms 905 ms 1087 ms 10 VV 315 ms 350 ms 495 ms 1 VV 173 ms 210 ms 361 ms PCM. CPU time on FMOD + Main threads, 10 seconds real-time.
  • 65.
    How, you ask? publicvoid SetNumVoices(int nv) { var config = AudioSettings.GetConfiguration(); if(config.numVirtualVoices == nv) return; config.numVirtualVoices = nv; config.numRealVoices = Mathf.Clamp(config.numRealVoices, 1, config.numVirtualVoices); AudioSettings.Reset(config); } Just an example! Probably too simple for real use.
  • 66.
  • 67.
    Animator • Formerly calledMecanim. • Graph of logical states. • Blends between states. • States contain animation clips and/or blend trees. • Animator component attached to GameObject • AnimatorController referenced by Animator component.
  • 68.
    Playables • Technology underlyingAnimator & Timeline. • Generic framework for “stuff that can be played back”. • Animation clips, audio clips, video clips, etc. • Docs: https://docs.unity3d.com/Manual/Playables.html
  • 69.
    Animation • Unity’s originalanimation system. • Custom code • Not based on Playables. • Very simple: plays an animation clip. • Can crossfade, loop.
  • 70.
  • 71.
    The Test • 100GameObjects with Animator or Animation component • Animator uses simple AnimatorController: 1 state, looping • Animation plays back 1 AnimationClip, looping
  • 72.
    0 ms 10 ms 20ms 30 ms 1 100 200 300 400 500 600 700 800 Animation Animator 100 Components, Variable Curve Count, iPad Mini 3 TimeperFrame
  • 73.
    How do coresaffect it?
  • 74.
    0 ms 1 ms 2ms 3 ms 1 100 200 300 400 500 600 700 800 Animation Animator 100 Components, Variable Curve Count, Win10/Core i7 TimeperFrame Crossover on iPad Mini 3
  • 75.
    0 ms 1 ms 2ms 3 ms 1 100 200 300 400 500 Animation Animator 100 Curves, Variable Component Count, Win10/Core i7 TimeperFrame
  • 76.
    Scaling Factors • Performanceis heavily dependent on curve & core count. • Fewer cores: Animation retains advantage longer. • More cores: Animator rapidly outperforms Animation. • Both systems scale linearly as number of Components rises. • “Best” system determined by target hardware vs curve count. • Use Animation for simple animations. • Use Animators for high curve counts or complex scenarios.
  • 77.
    0 ms 13 ms 27ms 40 ms 1 100 200 300 400 500 Animation Animator 100 Curves, Variable Component Count, iPad Mini 3 TimeperFrame
  • 78.
    What about “constant”curves? • Still interpolated at runtime. • No measurable impact on CPU usage. • Significant memory/file savings. • Example: 11kb vs. 3.7kb for 100 position curves (XYZ)
  • 79.
  • 80.
    Be careful withLayers! • The active state on each layer will be evaluated once per frame. • Layer Weight does not matter. • Weight=0? Still evaluated! • This is to ensure that state is correct. • Zero-weight layers = waste work • Use layers sparingly! (Yes, the docs are wrong.)
  • 81.
    The Cost ofLayering 1 Layer 2 Layers 3 Layers 4 Layers 5 Layers Aggregate 1966 ms 2260 ms 2510 ms 2690 ms 2890 ms Per Frame 10.08 ms 11.77 ms 12.86 ms 14.31 ms 17.65 ms 50 x “Ellen” from 3D Gamekit. Unity 2018.1.0f2. Main Thread CPU time consumed during 10 Seconds Realtime. iPad Mini 3.
  • 82.
  • 83.
    Nope. 50 x “Ellen”from 3D Gamekit. Layers 2-5 Masked. Main Thread CPU time consumed during 10 Seconds Realtime. Unity 2018.1.0f2. iPad Mini 3. 1 Layer 2 Layers 3 Layers 4 Layers 5 Layers Unmasked 1966 ms 2260 ms 2510 ms 2690 ms 2890 ms 60/108 Masked 1992 ms 2230 ms 2530 ms 2740 ms 2920 ms
  • 84.
    Use the rightrig! • The Humanoid rig runs IK & retargeting calculations. • The Generic rig does not. 1 Layer 2 Layers 3 Layers 4 Layers 5 Layers Generic 1966 ms 2260 ms 2510 ms 2690 ms 2890 ms Humanoid 2775 ms 3210 ms 3510 ms 3730 ms 4020 ms Identical test to previous slide, different Rig import settings.
  • 85.
    The pooling problem •Animators reset their state when their GameObject is disabled. • The only workaround? Disable Animator component, not GameObject. • Leads to messy side effects, like having to manage other components (e.g. Colliders/Rigidbodies) manually. • This made Animator-driven objects difficult to pool.
  • 86.
    There’s an APIto fix it, now! • Animator.keepControllerStateOnDisable • Available in 2018.1+ • If true, Animators do not discard data buffers when their GameObject is disabled. • Awesome for pooling! • Careful of the higher memory usage of disabled Animators!
  • 87.
  • 88.
  • 89.