KEMBAR78
Introduction to Data Oriented Design | PPT
Introduction to Data-Oriented Design
So what is this Data-Oriented Design?
It’s about on shifting focus to how data is read and written
Why should we care?
Performance
A read from memory takes ~600 cycles at 3.2 GHz
A read from memory takes 40 cycles at 300 MHz
Performance Disks (Blu-ray/DVD/HDD) Main Memory L2 Cache L1 Cache CPU / Registers Latency :( 600 cycles 40 cycles 1 – 2 cycles
Multithreading Cannot  multithread without knowing how data is touched Adding locks always protects  data  not  code Object Read? Write? Object update() Object Read? Write? Read? Write?
Offloading to co-unit If data is unknown  hard/impossible  to run on co-unit ? SPU/GPU/APU ?
Better design Data focus  can  lead to isolated, self-contained, interchangeable pieces of data and code This  can  make it easier to test data and code in isolation
Example - OOD class Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } }
Example - OOD class Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } } icache-miss
Example - OOD class Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } } icache-miss data-miss
Example - OOD class Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } } icache-miss data-miss Unused cached data
Example - OOD class Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } } icache-miss data-miss Unused cached data Very hard to optimize!
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots)
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 ~20 cycles
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles iCache – 600 m_position – 600 m_mod - 600 aimDir – 100
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100
Example - OOD void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 7680
Example - DOD
Example - DOD Design ”back to front” and focus on the output data
Example - DOD Design ”back to front” and focus on the output data Then add the  minimal  amount of data needed to do the transform to create the correct output
Example - DOD void updateAims(float* aimDir,const AimingData* aim,   Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } }
Example - DOD void updateAims(float* aimDir,const AimingData* aim,   Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } What has changed?
Example - DOD void updateAims(float* aimDir,const AimingData* aim,   Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs What has changed?
Example - DOD void updateAims(float* aimDir,const AimingData* aim,   Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs Write to linear array What has changed?
Example - DOD void updateAims(float* aimDir,const AimingData* aim,   Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs Write to linear array Loop over all the data What has changed?
Example - DOD void updateAims(float* aimDir,const AimingData* aim,   Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs Write to linear array Loop over all the data Actual code unchanged What has changed?
Example - DOD void updateAims(float* aimDir,const AimingData* aim,   Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs Write to linear array Loop over all the data Actual code unchanged What has changed? Code separated
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } }
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 ~20 cycles
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles
Example - DOD void updateAims(float* aimDir, const AimingData* aim,  Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles 1980
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one  128 byte cache line
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one  128 byte cache line
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one  128 byte cache line
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one  128 byte cache line
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one  128 byte cache line
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one  128 byte cache line
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one  128 byte cache line
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one  128 byte cache line
Data layout OOD vs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one  128 byte cache line
Its all about memory
Its all about memory Optimize for data first then code
Its all about memory Optimize for data first then code Most code is likely bound by memory access
Its all about memory Optimize for data first then code Most code is likely bound by memory access Not everything needs to be an object
Remember
Remember We are doing games, we know our data.
Remember We are doing games, we know our data. Pre-format. Source data and native data doesn’t need to be the same
Example: Area Triggers
Example: Area Triggers position position position position next position position position position next position position position position next Source data  (Linked List)
Example: Area Triggers position position position position next position position position position next position position position position next Source data  (Linked List) Native Data  (Array) position position position position position position position position position position position position position position count
Example: Culling System
Example: Culling System Old System
Example: Culling System Old System New System (Linear arrays and brute force)
Example: Culling System Old System New System (Linear arrays and brute force) 3x faster, 1/5 code size, simpler
Data Oriented Design Delivers:
Better Performance
Often simpler code
More parallelizable code
Questions?
Links Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)  http://gamesfromwithin.com/data-oriented-design Practical Examples in Data Oriented Design  http://bitsquid.blogspot.com/2010/05/practical-examples-in-data-oriented.html The Latency Elephant  http://seven-degrees-of-freedom.blogspot.com/2009/10/latency-elephant.html Pitfalls of Object Oriented Programming  http://seven-degrees-of-freedom.blogspot.com/2009/12/pitfalls-of-object-oriented-programming.html Insomniac R&D  http://www.insomniacgames.com/research_dev CellPerformance
Image credits Cat image:  http://icanhascheezburger.com/2007/06/24/uninterested-cat  photo by: Arinn  capped and submitted by: Andy Playstation 3 and Playstation 2 Copyright to Sony Computer Entertainment Xbox 360 Image Copyright to Microsoft “ WTF” Code quality image: Copyright by  Thom Holwerda   http://www.osnews.com/comics

Introduction to Data Oriented Design

  • 1.
  • 2.
    So what isthis Data-Oriented Design?
  • 3.
    It’s about onshifting focus to how data is read and written
  • 4.
  • 5.
  • 6.
    A read frommemory takes ~600 cycles at 3.2 GHz
  • 7.
    A read frommemory takes 40 cycles at 300 MHz
  • 8.
    Performance Disks (Blu-ray/DVD/HDD)Main Memory L2 Cache L1 Cache CPU / Registers Latency :( 600 cycles 40 cycles 1 – 2 cycles
  • 9.
    Multithreading Cannot multithread without knowing how data is touched Adding locks always protects data not code Object Read? Write? Object update() Object Read? Write? Read? Write?
  • 10.
    Offloading to co-unitIf data is unknown hard/impossible to run on co-unit ? SPU/GPU/APU ?
  • 11.
    Better design Datafocus can lead to isolated, self-contained, interchangeable pieces of data and code This can make it easier to test data and code in isolation
  • 12.
    Example - OODclass Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } }
  • 13.
    Example - OODclass Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } } icache-miss
  • 14.
    Example - OODclass Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } } icache-miss data-miss
  • 15.
    Example - OODclass Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } } icache-miss data-miss Unused cached data
  • 16.
    Example - OODclass Bot { ... Vec3 m_position; ... float m_mod; ... float m_aimDirection; ... void updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } } icache-miss data-miss Unused cached data Very hard to optimize!
  • 17.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots)
  • 18.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600
  • 19.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600
  • 20.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600
  • 21.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 ~20 cycles
  • 22.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles
  • 23.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles iCache – 600 m_position – 600 m_mod - 600 aimDir – 100
  • 24.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100
  • 25.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100
  • 26.
    Example - OODvoid updateAim(Vec3 target) { m_aimDirection = dot3(m_position, target) * m_mod; } Lets say we call this code 4 times (4 diffrent Bots) iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 ~20 cycles iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 iCache – 600 m_position – 600 m_mod - 600 aimDir – 100 7680
  • 27.
  • 28.
    Example - DODDesign ”back to front” and focus on the output data
  • 29.
    Example - DODDesign ”back to front” and focus on the output data Then add the minimal amount of data needed to do the transform to create the correct output
  • 30.
    Example - DODvoid updateAims(float* aimDir,const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } }
  • 31.
    Example - DODvoid updateAims(float* aimDir,const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } What has changed?
  • 32.
    Example - DODvoid updateAims(float* aimDir,const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs What has changed?
  • 33.
    Example - DODvoid updateAims(float* aimDir,const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs Write to linear array What has changed?
  • 34.
    Example - DODvoid updateAims(float* aimDir,const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs Write to linear array Loop over all the data What has changed?
  • 35.
    Example - DODvoid updateAims(float* aimDir,const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs Write to linear array Loop over all the data Actual code unchanged What has changed?
  • 36.
    Example - DODvoid updateAims(float* aimDir,const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i],target) * aim->mod[i]; } } Only read needed inputs Write to linear array Loop over all the data Actual code unchanged What has changed? Code separated
  • 37.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } }
  • 38.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600
  • 39.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600
  • 40.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600
  • 41.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 ~20 cycles
  • 42.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles
  • 43.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles
  • 44.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles
  • 45.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles
  • 46.
    Example - DODvoid updateAims(float* aimDir, const AimingData* aim, Vec3 target, uint count) { for (uint i = 0; i < count; ++i) { aimDir[i] = dot3(aim->positions[i], target) * aim->mod[i]; } } iCache – 600 positions – 600 mod - 600 aimDir – 100 ~20 cycles 1980
  • 47.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0
  • 48.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one 128 byte cache line
  • 49.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one 128 byte cache line
  • 50.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one 128 byte cache line
  • 51.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one 128 byte cache line
  • 52.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one 128 byte cache line
  • 53.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one 128 byte cache line
  • 54.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one 128 byte cache line
  • 55.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one 128 byte cache line
  • 56.
    Data layout OODvs DOD pos0 mod0 aimDir0 pos0 Pos1 mod1 aimDir1 pos0 pos0 pos0 pos1 pos1 pos1 pos1 pos2 pos2 pos2 pos2 pos3 pos3 pos3 pos3 mod0 mod1 mod2 mod3 aimDir0 aimDir1 aimDir2 aimDir3 pos0 pos0 pos0 Each color block is one 128 byte cache line
  • 57.
  • 58.
    Its all aboutmemory Optimize for data first then code
  • 59.
    Its all aboutmemory Optimize for data first then code Most code is likely bound by memory access
  • 60.
    Its all aboutmemory Optimize for data first then code Most code is likely bound by memory access Not everything needs to be an object
  • 61.
  • 62.
    Remember We aredoing games, we know our data.
  • 63.
    Remember We aredoing games, we know our data. Pre-format. Source data and native data doesn’t need to be the same
  • 64.
  • 65.
    Example: Area Triggersposition position position position next position position position position next position position position position next Source data (Linked List)
  • 66.
    Example: Area Triggersposition position position position next position position position position next position position position position next Source data (Linked List) Native Data (Array) position position position position position position position position position position position position position position count
  • 67.
  • 68.
  • 69.
    Example: Culling SystemOld System New System (Linear arrays and brute force)
  • 70.
    Example: Culling SystemOld System New System (Linear arrays and brute force) 3x faster, 1/5 code size, simpler
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
    Links Data-Oriented Design(Or Why You Might Be Shooting Yourself in The Foot With OOP) http://gamesfromwithin.com/data-oriented-design Practical Examples in Data Oriented Design http://bitsquid.blogspot.com/2010/05/practical-examples-in-data-oriented.html The Latency Elephant http://seven-degrees-of-freedom.blogspot.com/2009/10/latency-elephant.html Pitfalls of Object Oriented Programming http://seven-degrees-of-freedom.blogspot.com/2009/12/pitfalls-of-object-oriented-programming.html Insomniac R&D http://www.insomniacgames.com/research_dev CellPerformance
  • 77.
    Image credits Catimage: http://icanhascheezburger.com/2007/06/24/uninterested-cat photo by: Arinn capped and submitted by: Andy Playstation 3 and Playstation 2 Copyright to Sony Computer Entertainment Xbox 360 Image Copyright to Microsoft “ WTF” Code quality image: Copyright by Thom Holwerda http://www.osnews.com/comics