Building Dynamic
Instrumentation Tools
         with
     DynamoRIO
     Saman Amarasinghe
       Derek Bruening
         Qin Zhao
                                              Tutorial Outline
•     1:30-1:40              Welcome + DynamoRIO History
•     1:40-2:40              DynamoRIO Internals
•     2:40-3:00              Examples, Part 1
•     3:00-3:15              Break
•     3:15-4:15              DynamoRIO API
•     4:15-5:15              Examples, Part 2
•     5:15-5:30              Feedback
    DynamoRIO Tutorial at CGO 24 April 2010
                                                                 2
                                      DynamoRIO History
• Dynamo
     – HP Labs: PA-RISC late 1990’s
     – x86 Dynamo: 2000
• RIO  DynamoRIO
     – MIT: 2001-2004
• Prior releases
     –    0.9.1: Jun 2002 (PLDI tutorial)
     –    0.9.2: Oct 2002 (ASPLOS tutorial)
     –    0.9.3: Mar 2003 (CGO tutorial)
     –    0.9.4: Feb 2005
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          3
                                      DynamoRIO History
     – 0.9.5: Apr 2008 (CGO tutorial)
     – 1.0 (0.9.6): Sep 2008 (GoVirtual.org launch)
• Determina
     – 2003-2007
     – Security company
• VMware
     – Acquired Determina (and DynamoRIO) in 2007
• Open-source BSD license
     – Feb 2009: 1.3.1 release
     – Dec 2009: 1.5.0 release
     – Apr 2010: 2.0.0 release
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          4
DynamoRIO Internals
  1:30-1:40   Welcome + DynamoRIO History
  1:40-2:40   DynamoRIO Internals
  2:40-3:00   Examples, Part 1
  3:00-3:15   Break
  3:15-4:15   DynamoRIO API
  4:15-5:15   Examples, Part 2
  5:15-5:30   Feedback
                      Typical Modern Application: IIS
DynamoRIO Tutorial at CGO 24 April 2010
                                                        6
                           Runtime Interposition Layer
DynamoRIO Tutorial at CGO 24 April 2010
                                                         7
                                           Design Goals
• Efficient
     – Near-native performance
• Transparent
     – Match native behavior
• Comprehensive
     – Control every instruction, in any application
• Customizable
     – Adapt to satisfy disparate tool needs
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          8
                       Challenges of Real-World Apps
• Multiple threads
     – Synchronization
• Application introspection
     – Reading of return address
• Transparency corner cases are the norm
     – Example: access beyond top of stack
• Scalability
     – Must adapt to varying code sizes, thread counts, etc.
• Dynamically generated code
     – Performance challenges
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               9
                                           Internals Outline
• Efficient
     –    Software code cache overview
     –    Thread-shared code cache
     –    Cache capacity limits
     –    Data structures
• Transparent
• Comprehensive
• Customizable
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               10
                               Direct Code Modification
e9 37 6f 48 92                       jmp <callout>
                  Kernel32!TerminateProcess:
                  7d4d1028 7c 05                     jl    7d4d102f
                  7d4d102a 33 c0                     xor   %eax,%eax
                  7d4d102c 40                        inc   %eax
                  7d4d102d eb 08                     jmp   7d4d1037
                  7d4d102f 50                        push %eax
                  7d4d1030 e8 ed 7c 00 00            call 7d4d8d22
DynamoRIO Tutorial at CGO 24 April 2010
                                                                       11
                       Debugger Trap Too Expensive
cc         int3 (breakpoint)
                  Kernel32!TerminateProcess:
                  7d4d1028 7c 05               jl    7d4d102f
                  7d4d102a 33 c0               xor   %eax,%eax
                  7d4d102c 40                  inc   %eax
                  7d4d102d eb 08               jmp   7d4d1037
                  7d4d102f 50                  push %eax
                  7d4d1030 e8 ed 7c 00 00      call 7d4d8d22
DynamoRIO Tutorial at CGO 24 April 2010
                                                                 12
         Variable-Length Instruction Complications
e9 37 6f 48 92                       jmp <callout>
                  Kernel32!TerminateProcess:
                  7d4d1028 7c 05                     jl    7d4d102f
                  7d4d102a 33 c0                     xor   %eax,%eax
                  7d4d102c 40                        inc   %eax
                  7d4d102d eb 08                     jmp   7d4d1037
                  7d4d102f 50                        push %eax
                  7d4d1030 e8 ed 7c 00 00            call 7d4d8d22
DynamoRIO Tutorial at CGO 24 April 2010
                                                                       13
                             Entry Point Complications
e9 37 6f 48 92                       jmp <callout>
                  Kernel32!TerminateProcess:
                  7d4d1028 7c 05                     jl    7d4d102f
                  7d4d102a 33 c0                     xor   %eax,%eax
                  7d4d102c 40                        inc   %eax
                  7d4d102d eb 08                     jmp   7d4d1037
                  7d4d102f 50                        push %eax
                  7d4d1030 e8 ed 7c 00 00            call 7d4d8d22
DynamoRIO Tutorial at CGO 24 April 2010
                                                                       14
               Direct Code Modification: Too Limited
• Not transparent
     – Cannot write jump atomically if crosses cache line
     – Even if write is atomic, not safe if overwrites part of next
       instruction
     – Jump may span code entry point
• Too limited
     – Not safe w/o suspending all threads and knowing all entry points
     – Limited to inserting callouts
• Code displaced by jump is a mini code cache
     – All the same consistency challenges of larger cache
• Inter-operation issues with other hooks
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                      15
                                     We Need Indirection
• Avoid transparency and granularity limitations of directly
  modifying application code
• Allow arbitrary modifications at unrestricted points in code
  stream
• Allow systematic, fine-grained modifications to code stream
• Guarantee that all code is observed
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                 16
                                          Basic Interpreter
         START
                               fetch             decode       execute
                                      Slowdown: ~300x
DynamoRIO Tutorial at CGO 24 April 2010
                                                                        17
Improvement #1: Interpreter + Basic Block Cache
               START                       basic block builder
                                                         dispatch
                                                     context switch
BASIC BLOCK
  CACHE                                          Non-control-flow instructions
 non-control-flow                                executed from software code
   instructions
                                                 cache
Slowdown: 300x 25x
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                 18
Improvement #1: Interpreter + Basic Block Cache
                    B       C
                                              START           basic block builder
                        D
                                                                            dispatch
                                                    context switch
                            BASIC BLOCK
                              CACHE
                             A            B     D
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                       19
                      Example Basic Block Fragment
 add        %eax, %ecx                    frag7: add   %eax, %ecx
 cmp        $4, %eax                            cmp    $4, %eax
 jle        $0x40106f                           jle    <stub0>
                                                jmp    <stub1>
                                          stub0: mov   %eax, eax-slot
                    dstub0
                                                mov    &dstub0, %eax
         target: 0x40106f
                                                jmp    context_switch
                                          stub1: mov   %eax, eax-slot
                    dstub1
                                                mov    &dstub1, %eax
         target: fall-thru
                                                jmp    context_switch
DynamoRIO Tutorial at CGO 24 April 2010
                                                                        20
          Improvement #2: Linking Direct Branches
               START                       basic block builder
                                                         dispatch
                                                     context switch
BASIC BLOCK
  CACHE                                          Direct branch to existing
 non-control-flow                                block can bypass dispatch
   instructions
Slowdown: 300x 25x 3x
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                             21
         Improvement #2: Linking Direct Branches
                    B       C
                                              START           basic block builder
                        D
                                                                            dispatch
                                                    context switch
                            BASIC BLOCK
                              CACHE
                             A            B     D
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                       22
                                          Direct Linking
 add        %eax, %ecx                           frag7: add   %eax, %ecx
 cmp        $4, %eax                                   cmp    $4, %eax
 jle        $0x40106f                                  jle    <frag8>
                                                       jmp    <stub1>
                                                 stub0: mov   %eax, eax-slot
                    dstub0
                                                       mov    &dstub0, %eax
         target: 0x40106f
                                                       jmp    context_switch
                                                 stub1: mov   %eax, eax-slot
                    dstub1
                                                       mov    &dstub1, %eax
         target: fall-thru
                                                       jmp    context_switch
DynamoRIO Tutorial at CGO 24 April 2010
                                                                               23
        Improvement #3: Linking Indirect Branches
               START                       basic block builder
                                                         dispatch
                                                     context switch
BASIC BLOCK
  CACHE                                                          Application address
 non-control-flow                    indirect branch
                                         lookup
                                                                 mapped to code cache
   instructions
Slowdown: 300x 25x 3x 1.2x
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                        24
                       Indirect Branch Transformation
                                          frag8: mov %ecx, ecx-slot
 ret                                            pop %ecx
                                                jmp    <ib_lookup>
                                          ib_lookup:   ...
                                                       ...
                                                       ...
DynamoRIO Tutorial at CGO 24 April 2010
                                                                      25
                       Improvement #4: Trace Building
           Basic Block Cache                      Trace Cache
         A         D         G            J   A        G
                                              B        K
                                              E        J
         B         E         H            K
                                              F
                                              H
         C         F         I            L
                                              D
• Traces reduce branching, improve layout and locality, and
  facilitate optimizations across blocks
     – We avoid indirect branch lookup
• Next Executing Tail (NET) trace building scheme [Duesterwald
  2000]
DynamoRIO Tutorial at CGO 24 April 2010
                                                                 26
                     Incremental NET Trace Building
           Basic Block Cache                  Trace Cache
                             G            J        G
                             G            J        G
                                                   K
                                          K
                             G            J        G
                                                   K
                                          K        J
DynamoRIO Tutorial at CGO 24 April 2010
                                                            27
                      Improvement #4: Trace Building
               START                       basic block builder        trace selector
                                                         dispatch
                                                     context switch
BASIC BLOCK                                                         TRACE
  CACHE                                                              CACHE
 non-control-flow                    indirect branch             non-control-flow      indirect branch
   instructions                          lookup                    instructions         stays on trace?
Slowdown: 300x 26x 3x 1.2x 1.1x
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                                     28
                                      Base Performance
                                          SPEC CPU2000   Server   Desktop
DynamoRIO Tutorial at CGO 24 April 2010
                                                                            29
                                    Sources of Overhead
• Extra instructions
     – Indirect branch target comparisons
     – Indirect branch hashtable lookups
• Extra data cache pressure
     – Indirect branch hashtable
• Branch mispredictions
     – ret becomes jmp*
• Application code modification
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          30
                       Time Breakdown for SPECINT
              START                       basic block builder        trace selector
                                                        dispatch                      < 1%
                                                    context switch
BASIC BLOCK                                                        TRACE
  CACHE                                                             CACHE
non-control-flow                    indirect branch             non-control-flow       indirect branch
  instructions                          lookup                    instructions          stays on trace?
     ~ 0%                                 ~ 4%                     ~ 94%                 ~ 2%
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                                     31
                            Not An Ordinary Application
• An application executing in DynamoRIO’s code cache looks
  different from what the underlying hardware has been tuned
  for
• The hardware expects:
     – Little or no dynamic code modification
             • Writes to code are expensive
     – call and ret instructions
             • Return Stack Buffer predictor
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               32
                            Performance Counter Data
DynamoRIO Tutorial at CGO 24 April 2010
                                                       33
                                           Internals Outline
• Efficient
     –    Software code cache overview
     –    Thread-shared code cache
     –    Cache capacity limits
     –    Data structures
• Transparent
• Comprehensive
• Customizable
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               34
                                          Threading Model
                                          Running Program
          Thread1                   Thread2       Thread3    …   ThreadN
                             Code Caching Runtime System
          Thread1                   Thread2       Thread3    …   ThreadN
                                          Operating System
          Thread1                   Thread2       Thread3    …   ThreadN
DynamoRIO Tutorial at CGO 24 April 2010
                                                                           35
                                             Code Space
                                           Running Program
     Thread                        Thread                Thread     Thread
Thread-Private Code Caches                            Thread-Shared Code Cache
     Thread                        Thread                Thread     Thread
                                           Operating System
    Thread1                      Thread2                 Thread1    Thread2
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                              36
               Thread-Private versus Thread-Shared
• Thread-private
     – Less synchronization needed
     – Absolute addressing for thread-local storage
     – Thread-specific optimization and instrumentation
• Thread-shared
     – Scales to many-threaded apps
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          37
                     Database and Web Server Suite
   Benchmark                                   Server                 Processes
   ab low                                 IIS low isolation           inetinfo.exe
   ab med                             IIS medium isolation     inetinfo.exe, dllhost.exe
   guest low                              IIS low isolation,   inetinfo.exe, sqlservr.exe
                                          SQL Server 2000
   guest med                     IIS medium isolation, SQL     inetinfo.exe, dllhost.exe,
                                       Server 2000                    sqlservr.exe
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                            38
                                           Memory Impact
                                          ab med   guest low   guest med
DynamoRIO Tutorial at CGO 24 April 2010
                                                                           39
                                    Performance Impact
DynamoRIO Tutorial at CGO 24 April 2010
                                                         40
                                          Scalability Limit
DynamoRIO Tutorial at CGO 24 April 2010
                                                              41
                                           Internals Outline
• Efficient
     –    Software code cache overview
     –    Thread-shared code cache
     –    Cache capacity limits
     –    Data structures
• Transparent
• Comprehensive
• Customizable
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               42
                            Added Memory Breakdown
DynamoRIO Tutorial at CGO 24 April 2010
                                                     43
                                          Code Expansion
                                   exit stubs
                                     19%
     indirect branch target
            handling
               7%
                       net jumps
                          8%                               original code
                                                               66%
DynamoRIO Tutorial at CGO 24 April 2010
                                                                           44
                            Cache Capacity Challenges
• How to set an upper limit on the cache size
     – Different applications have different working sets and different
       total code sizes
• Which fragments to evict when that limit is reached
     – Without expensive profiling or extensive fragmentation
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                          45
                             Adaptive Sizing Algorithm
                                          • Enlarge cache if warranted by
                                            percentage of new fragments that are
                                            regenerated
                                          • Target working set of application: don’t
                                            enlarge for once-only code
                                          • Low-overhead, incremental, and
                                            reactive
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                   46
                                Cache Capacity Settings
• Thread-private:
     – Working set size matching is on by default
     – Client may see blocks or traces being deleted in the absence of
       any cache consistency event
     – Can disable capacity management via
             • -no_finite_bb_cache
             • -no_finite_trace_cache
• Thread-shared:
     – Set to infinite size by default
     – Can enable capacity management via
             • -finite_shared_bb_cache
             • -finite_shared_trace_cache
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     47
                                           Internals Outline
• Efficient
     –    Software code cache overview
     –    Thread-shared code cache
     –    Cache capacity limits
     –    Data structures
• Transparent
• Comprehensive
• Customizable
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               48
               Two Modes of Code Cache Operation
• Fine-grained scheme
     – Supports individual code fragment unlink and removal
     – Separate data structure per code fragment and each of its exits,
       memory regions spanned, and incoming links
• Coarse-grained scheme
     –    No individual code fragment control
     –    Permanent intra-cache links
     –    No per-fragment data structures at all
     –    Treat entire cache as a unit for consistency
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                      49
                                           Data Structures
• Fine-grained scheme
     – Data structures are highly tuned and compact
• Coarse-grained scheme
     – There are no data structures
     – Savings on applications with large amounts of code are typically
       15%-25% of committed memory and 5%-15% of working set
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                      50
                              Status in Current Release
• Fine-grained scheme
     – Current default
• Coarse-grained scheme
     –    Select with –opt_memory runtime option
     –    Possible performance hit on certain benchmarks
     –    In the future will be the default option
     –    Required for persisted and process-shared caches
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             51
                          Adaptive Level of Granularity
• Start with coarse-grain caches
     – Plus freezing and sharing/persisting
• Switch to fine-grain for individual modules or sub-regions of
  modules after significant consistency events, to avoid
  expensive entire-module flushes
     – Support simultaneous fine-grain fragments within coarse-grain
       regions for corner cases
• Match amount of bookkeeping to amount of code change
     – Majority of application code does not need fine-grain
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                       52
                       Many Varieties of Code Caches
• Coarse-grained versus fine-grained
• Thread-shared versus thread-private
• Basic blocks versus traces
 DynamoRIO Tutorial at CGO 24 April 2010
                                                       53
                                           Internals Outline
• Efficient
• Transparent
     – Rules of transparency
     – Cache consistency
     – Synchronization
• Comprehensive
• Customizable
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               54
                                           Transparency
• Do not want to interfere with the semantics of the program
• Dangerous to make any assumptions about:
     –    Register usage
     –    Calling conventions
     –    Stack layout
     –    Memory/heap usage
     –    I/O and other system call use
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               55
                                  Painful, But Necessary
• Difficult and costly to handle corner cases
• Many applications will not notice…
• …but some will!
     – Microsoft Office: Visual Basic generated code, stack convention
       violations
     – COM, Star Office, MMC: trampolines
     – Adobe Premiere: self-modifying code
     – VirtualDub: UPX-packed executable
     – etc.
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     56
                     Rule 1: Avoid Resource Conflicts
• DynamoRIO system code executes at arbitrary points during
  application execution
• If DynamoRIO uses the same library routine as the
  application, it may call that routine in the middle of the same
  routine being called by the application
• Most library routines are not re-entrant!
     – Many are thread-safe, but that does not help us
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                    57
                    Rule 1: Avoid Resource Conflicts
                    Linux                   Windows
DynamoRIO Tutorial at CGO 24 April 2010
                                                       58
          Rule 2: If It’s Not Broken, Don’t Change It
• Threads
• Executable on disk
• Application data
     – Including the stack!
 DynamoRIO Tutorial at CGO 24 April 2010
                                                        59
                     Example Transparency Violation
                                                         Error
                                                         Error
                                                         Error
                                                         Error
                                                         Error
                                                         Error
                                                         Error
                                                         Error
                                                         Error
                                                         Error
                                          SPEC CPU2000   Server   Desktop
DynamoRIO Tutorial at CGO 24 April 2010
                                                                            60
                       Rule 3: If You Change It, Emulate
                       Original Behavior’s Visible Effects
•     Application addresses
•     Address space
•     Error transparency
•     Code cache consistency
    DynamoRIO Tutorial at CGO 24 April 2010
                                                             61
                                           Internals Outline
• Efficient
• Transparent
     – Rules of transparency
     – Cache consistency
     – Synchronization
• Comprehensive
• Customizable
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               62
                           Code Change Mechanisms
                         RISC                        x86
              I-Cache                D-Cache   I-Cache   D-Cache
              A:                          A:   A:          A:
              B:                          B:   B:          B:
              C:                          C:   C:          C:
              D:                          D:   D:          D:
                         Store B                    Store B
                         Flush B                    Jump B
                         Jump B
DynamoRIO Tutorial at CGO 24 April 2010
                                                                   63
                      How Often Does Code Change?
• Not just modification of code!
• Removal of code
     – Shared library unloading
• Replacement of code
     – JIT region re-use
     – Trampoline on stack
 DynamoRIO Tutorial at CGO 24 April 2010
                                                    64
                                   Code Change Events
                                             Memory    Generated Code   Modified Code
                                          Unmappings          Regions         Regions
SPECFP                                           112               0               0
SPECINT                                           29               0               0
SPECJVM                                           7             3373            4591
Excel                                            144              21              20
Photoshop                                       1168              40               0
Powerpoint                                       367              28              33
Word                                             345              20               6
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                       65
                               Detecting Code Removal
• Example: shared library being unloaded
• Requires explicit request by application to operating system
• Detect by monitoring system calls (munmap,
  NtUnmapViewOfSection)
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                 66
                           Detecting Code Modification
• On x86, no explicit app
  request required, as the                         x86
  icache is kept consistent in
  hardware – so any memory                  I-Cache    D-Cache
  write could modify code!                   A:          A:
                                             B:          B:
                                             C:          C:
                                             D:          D:
                                                  Store B
                                                  Jump B
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                 67
                Page Protection Plus Instrumentation
• Invariant: application code copied to code cache must be
  read-only
     – If writable, hide read-only status from application
• Some code cannot or should not be made read-only
     – Self-modifying code
     – Windows stack
     – Code on a page with frequently written data
• Use per-fragment instrumentation to ensure code is not stale
  on entry and to catch self-modification
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                 68
                       Adaptive Consistency Algorithm
• Use page protection by default
     – Most code regions are always read-only
• Subdivide written-to regions to reduce flushing cost of write-
  execute cycle
     – Large read-only regions, small written-to regions
• Switch to instrumentation if write-execute cycle repeats too
  often (or on same page)
     – Switch back to page protection if writes decrease
                        Bruening et al. “Maintaining Consistency and Bounding Capacity
                                        of Software Code Caches” CGO’05
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                  69
                                           Internals Outline
• Efficient
• Transparent
     – Rules of transparency
     – Cache consistency
     – Synchronization
• Comprehensive
• Customizable
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               70
                        Synchronization Transparency
• Application thread management should not interfere with the
  runtime system, and vice versa
     – Cannot allow the app to suspend a thread holding a runtime
       system lock
     – Runtime system cannot use app locks
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                    71
                                   Code Cache Invariant
• App thread suspension requires safe spots where no runtime
  system locks are held
• Time spent in the code cache can be unbounded
  → Our invariant: no runtime system lock can be held while
     executing in the code cache
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           72
                                              Internals Outline
•     Efficient
•     Transparent
•     Comprehensive
•     Customizable
    DynamoRIO Tutorial at CGO 24 April 2010
                                                                  73
                         Above the Operating System
DynamoRIO Tutorial at CGO 24 April 2010
                                                      74
                   Kernel-Mediated Control Transfers
                              user mode       kernel mode
                                            message pending
                                           save user context
majority of
  executed
 code in a
                                                                time
   typical              message handler
   Windows
application
                                           no message pending
                                             restore context
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                       75
                              Intercepting Linux Signals
                              user mode          kernel mode
                                                signal pending
register our
                                              save user context
 own signal
   handler
                     DynamoRIO handler
                                                                  time
                         signal handler
                                              no signal pending
                                               restore context
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                         76
                                     Windows Messages
                             user mode            kernel mode
                                                message pending
                                               save user context
                                                                    time
                           dispatcher
                       message handler
                                               no message pending
                                                 restore context
DynamoRIO Tutorial at CGO 24 April 2010
                                                                           77
                         Intercepting Windows Messages
                                 user mode       kernel mode
                                               message pending
     modify                                   save user context
shared library
  memory image
                              dispatcher
                                                                   time
                               dispatcher
                           message handler
                                              no message pending
                                                restore context
    DynamoRIO Tutorial at CGO 24 April 2010
                                                                          78
                             Must Monitor System Calls
• To maintain control:
     – Calls that affect the flow of control: register signal handler,
       create thread, set thread context, etc.
• To maintain transparency:
     – Queries of modified state app should not see
• To maintain cache consistency:
     – Calls that affect the address space
• To support cache eviction:
     – Interruptible system calls must be redirected
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                         79
                     Operating System Dependencies
• System calls and their numbers
     – Monitor application’s usage, as well as for our own resource
       management
     – Windows changes the numbers each major rel
• Details of kernel-mediated control flow
     – Must emulate how kernel delivers events
• Initial injection
     – Once in, follow child processes
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                      80
                                              Internals Outline
•     Efficient
•     Transparent
•     Comprehensive
•     Customizable
    DynamoRIO Tutorial at CGO 24 April 2010
                                                                  81
                                           Clients
• The engine exports an API for
  building a client
• System details abstracted away:
  client focuses on manipulating
  the code stream
 DynamoRIO Tutorial at CGO 24 April 2010
                                                     82
                                             Client Events
                                                client                 client
                START
                                          basic block builder        trace selector
                client
                                                         dispatch
                                                    context switch
BASIC BLOCK                                                         TRACE
  CACHE                                                              CACHE
 non-control-flow                    indirect branch            non-control-flow      indirect branch
   instructions                          lookup                   instructions         stays on trace?
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                                   83
Examples: Part I
1:30-1:40   Welcome + DynamoRIO History
1:40-2:40   DynamoRIO Internals
2:40-3:00   Examples, Part 1
3:00-3:15   Break
3:15-4:15   DynamoRIO API
4:15-5:15   Examples, Part 2
5:15-5:30   Feedback
                DynamoRIO Examples Part I Outline
• Common Steps of writing a DynamoRIO client
• Dynamic Instruction Counting Example
 DynamoRIO Tutorial at CGO 24 April 2010
                                                    85
                                           Common Steps
• Step 1: Register Events
     – DR_EXPORT void dr_init(client_id_t id)
          Register Function                    Events
          dr_register_bb_event                 Basic Block Building
          dr_register_thread_init_event        Thread Initialization
          dr_register_exit_event               Process Exit
• Step2: Implementation
     – Initialization
     – Finalization
     – Instrumentation
• Step 3: Optimization
     – Optimize the instrumentation to improve the performance
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                       86
                DynamoRIO Examples Part I Outline
• Common Steps of writing a DynamoRIO client
• Dynamic Instruction Counting Example
 DynamoRIO Tutorial at CGO 24 April 2010
                                                    87
                    A Simplified View of DynamoRIO
              START                       basic block builder
                                                        dispatch
                                                    context switch
BASIC BLOCK
  CACHE
DynamoRIO Tutorial at CGO 24 April 2010
                                                                     88
                                Step 1: Register Events
    uint num_dyn_instrs;
    static void event_init(void);
    static void event_exit(void);
    static dr_emit_flags_t event_basic_block(void *drcontext, void *tag, instrlist_t *ilist,
                                             bool for_trace, bool translating);
    DR_EXPORT void dr_init(client_id_t id) {
      /* register events */
      dr_register_bb_event (event_basic_block);
      dr_register_exit_event(event_exit);
      /* process initialization event */
      event_init();
    }
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                           89
                             Step 2: Implementation (I)
    static void event_init(void) {
       num_dyn_instrs = 0;
    }
    static void event_exit(void) {
       dr_printf(“Total number of instruction executed: %u\n”, num_dyn_instrs);
    }
    static dr_emit_flags_t event_basic_block(void *drcontext, void *tag, instrlist_t *ilist,
                                               bool for_trace, bool translating) {
       int num_instrs;
       num_instrs = ilist_num_instrs(ilist);
       insert_count_code(drcontext, ilist, num_instrs);
       return DR_EMIT_DEFAULT;
    }
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                           90
                            Step 2: Implementation (II)
    static int ilist_num_instrs(instrlist_t *ilist) {
       instr_t *instr;
       int       num_instrs = 0;
       /* iterate over instruction list to count number of instructions */
       for (instr = instrlist_first(ilist); instr != NULL; instr = instr_get_next(instr))
           num_instrs++;
       return num_instrs;
    }
    static void do_ins_count(int num_instrs) { num_dyn_instrs += num_instrs; }
    static void insert_count_code(void * drcontext, instrlist_t * ilist, int num_instrs) {
       dr_insert_clean_call(drcontext, ilist, instrlist_first(ilist),
                            do_ins_count, false, 1,
                            OPND_CREATE_INT32(num_instrs));
    }
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                             91
                              Instrumented Basic Block
                                     # switch stack
                                     # switch aflags and errorno
                                     # save all registers
                                     # call do_ins_count
                                     push $0x00000003
                                     call $0xb7ef73e4 (do_ins_count)
                                     # restore registers
                                     # switch aflags and errorno back
                                     # switch stack back
                                     # application code
                                     add $0x0000e574 %ebx  %ebx
                                     test %al $0x08
                                     jz    $0xb80e8a98
DynamoRIO Tutorial at CGO 24 April 2010
                                                                        92
  Step 3: Optimization (I): counter update inlining
    static void insert_count_code (void * drcontext, instrlist_t * ilist, int num_instrs) {
       instr_t *instr, *where;
       opnd_t opnd1, opnd2;
        where = instrlist_first(ilist);
        /* save aflags */
        dr_save_arith_flags(drcontext, ilist, where, SPILL_SLOT_1);
        /* num_dyn_instrs += num_instrs */
        opnd1 = OPND_CREATE_ABSMEM(&num_dyn_instrs, OPSZ_PTR);
        opnd2 = OPND_CREATE_INT32(num_instrs);
        instr = INSTR_CREATE_add(drcontext, opnd1, opnd2);
        instrlist_meta_preinsert(ilist, where, instr);
        /* restore aflags */
        dr_restore_arith_flags(drcontext, ilist, where, SPILL_SLOT_1);
    }
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                              93
                              Instrumented Basic Block
                                mov %eax  %fs:0x0c
                                lahf  %ah
                                seto  %al
                                add $0x00000003, 0xb7d25030
                                add $0x7f %al  %al
                                sahf %ah
                                mov %fs:0x0c  %eax
                                # application code
                                add $0x0000e574 %ebx  %ebx
                                test %al $0x08
                                jz    $0xb7f14a98
DynamoRIO Tutorial at CGO 24 April 2010
                                                              94
            Step 3: Optimization (II): aflags stealing
    static void insert_count_code (void * drcontext, instrlist_t * ilist, int num_instrs) {
       …
       save_aflags = aflags_analysis(ilist);
       /* save aflags */
       if (save_aflags)
           dr_save_arith_flags(drcontext, ilist, where, SPILL_SLOT_1);
       /* num_dyn_instrs += num_instrs */
       opnd1 = OPND_CREATE_ABSMEM(&num_dyn_instrs, OPSZ_PTR);
       opnd2 = OPND_CREATE_INT32(num_instrs);
       instr = INSTR_CREATE_add(drcontext, opnd1, opnd2);
       instrlist_meta_preinsert(ilist, where, instr);
       /* restore aflags */
       if (save_aflags)
           dr_restore_arith_flags(drcontext, ilist, where, SPILL_SLOT_1);
    }
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                              95
                              Instrumented Basic Block
                                add $0x00000003, 0xb7d25030
                                # application code
                                add $0x0000e574 %ebx  %ebx
                                test %al $0x08
                                jz    $0xb7f14a98
DynamoRIO Tutorial at CGO 24 April 2010
                                                              96
      Step 3: Optimization (III): more optimizations
• Using lea (load effective address) instead of add
      lea [%reg, num_instr]  %reg
• Register liveness analysis
      – Using dead register to avoid register save/restore for lea
• Global aflags/registers analysis
      – Analyze aflags/registers liveness over CFG
• Trace Optimization
      – Trace: single-entry multi-exit
      – Update counters only at trace exits
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     97
                                           Other Issues
• Data race on counter update in multithreaded programs
     – Global lock for every update
     – Atomic update (lock prefixed add)
             • LOCK(instr);
     – Thread private counter
             • Thread-private code cache: different variable at different address
             • Thread-shared code cache: thread local storage
• 32-bit counter overflow
     – 64-bit counter:
             • Two instructions on 32-bit architecture: add, adc
     – One 32-bit local counter and one 64-bit global counter
             • Instrument to update 32-bit local counter
             • Update 64-bit global counter using time interrupt
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                    98
DynamoRIO API
1:30-1:40   Welcome + DynamoRIO History
1:40-2:40   DynamoRIO Internals
2:40-3:00   Examples, Part 1
3:00-3:15   Break
3:15-4:15   DynamoRIO API
4:15-5:15   Examples, Part 2
5:15-5:30   Feedback
                                   DynamoRIO API Outline
•     Building and Deploying
•     Events
•     Utilities
•     Instruction Manipulation
•     State Translation
•     Comparison with Pin
•     Troubleshooting
    DynamoRIO Tutorial at CGO 24 April 2010
                                                           100
                                           Clients
• The engine exports an API for
  building a client
• System details abstracted away:
  client focuses on manipulating
  the code stream
 DynamoRIO Tutorial at CGO 24 April 2010
                                                     101
                                  Cross-Platform Clients
• DynamoRIO API presents a consistent interface that works
  across platforms
     – Windows versus Linux
     – 32-bit versus 64-bit
     – Thread-private versus thread-shared
• Same client source code generally works on all combinations
  of platforms
• Some exceptions, noted in the documentation
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             102
                                              Building a Client
•       Include DR API header file
        – #include “dr_api.h”
•       Set platform defines
        – WINDOWS or LINUX
        – X86_32 or X86_64
•       Export a dr_init function
        –      DR_EXPORT void dr_init (client_id_t client_id)
•       Build a shared library
    DynamoRIO Tutorial at CGO 24 April 2010
                                                                  103
                          Auto-Configure Using CMake
add_library(myclient SHARED myclient.c)
find_package(DynamoRIO)
if (NOT DynamoRIO_FOUND)
  message(FATAL_ERROR "DynamoRIO package
      required to build")
endif(NOT DynamoRIO_FOUND)
configure_DynamoRIO_client(myclient)
 DynamoRIO Tutorial at CGO 24 April 2010
                                                       104
                                           CMake
• Build system converted to CMake when open-sourced
     – Switch from frozen toolchain to supporting range of tools
• CMake generates build files for native compiler of choice
     – Makefiles for UNIX, nmake, etc.
     – Visual Studio project files
• http://www.cmake.org/
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                   105
                     Library Usage and Transparency
• For best transparency: completely self-contained client
     – Imports only from DynamoRIO API
     – -nodefaultlibs or /nodefaultlib
• On Windows:
     – String and utility routines provided by forwards to ntdll
     – Cl.exe /MT static copy of C/C++ libraries
     – Custom loader loads private copy of client dependences
• On Linux:
     –    Use ld –wrap to redirect malloc calls to DR’s heap
     –    Older distributions shipped suitable static C/C++ lib
     –    Newer distros: need to build yourself
     –    Coming soon: custom loader for private copy of libs
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                   106
                                    DynamoRIO Extensions
•     DynamoRIO API is extended via libraries called Extensions
•     Both static and shared supported
•     Built and packaged with DynamoRIO
•     Easy for a client to use
        – use_DynamoRIO_extension(myclient drsyms)
• Current Extensions:
        – drsyms: symbol lookup (currently Windows-only)
        – drcontainers: hashtable
• Coming soon:
        – Umbra: shadow memory framework
        – Your utility library or framework contribution!
    DynamoRIO Tutorial at CGO 24 April 2010
                                                                  107
                               Application Configuration
• File-based scheme
• Per-user local files
     – $HOME/.dynamorio/ on Linux
     – $USERPROFILE/dynamorio/ on Windows
• Global files
     – /etc/dynamorio/ on Linux
     – Registry-specified directory on Windows
• Files are lists of var=value
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           108
                                           Deploying Clients
• One-step configure-and-run usage model:
     – drrun <client> <options> <app cmdline>
     – Uses an invisible temporary one-time configuration file
     – Overrides any regular config file
• Two-step usage model giving control over children:
     – drconfig –reg <appname> <client> <options>
     – drinject <app cmdline>
• Systemwide injection:
     – drconfig –syswide_on –reg <appname> <client> <options>
     – <run app normally>
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                 109
                            Deploying Clients On Linux
• drrun and drinject scripts: LD_PRELOAD-based
     – Take over after statically-dependent shared libs but before exe
• Suid apps ignore LD_PRELOAD
     – Place libdrpreload.so's full path in /etc/ld.so.preload
     – Copy libdynamorio.so to /usr/lib
• In the future:
     – Attach
     – Early injection
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                         110
                        Deploying Clients On Windows
• drinject and drrun injection
     – Currently after all shared libs are initialized
• From-parent injection
     – Early: before any shared libs are loaded
• Systemwide injection via –syswide_on
     – Requires administrative privileges
     – Launch app normally: no need to run via drinject/drrun
     – Moderately early: during user32.dll initialization
• In the future:
     – Earliest injection for drrun/drinject and from-parent
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                111
                             Non-Standard Deployment
• Standalone API
     – Use DynamoRIO as a library of IA-32/AMD64 manipulation
       routines
• Start/Stop API
     – Can instrument source code with where DynamoRIO should
       control the application
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                112
                                           Runtime Options
• Pass options to drconfig/drrun
• A large number of options; the most relevant are:
     –    -code_api
     –    -client <client lib> <client ops> <client id>
     –    -thread_private
     –    -tracedump_text and –tracedump_binary
     –    -prof_pcs
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             113
                      Runtime Options For Debugging
• Notifications:
     – -stderr_mask 0xN
     – -msgbox_mask 0xN
• Windows:
     – -no_hide
• Debug-build-only:
     – -loglevel N
     – -ignore_assert_list ‘*’
 DynamoRIO Tutorial at CGO 24 April 2010
                                                      114
                                   DynamoRIO API Outline
•     Building and Deploying
•     Events
•     Utilities
•     Instruction Manipulation
•     State Translation
•     Comparison with Pin
•     Troubleshooting
    DynamoRIO Tutorial at CGO 24 April 2010
                                                           115
                                             Client Events
                                                client                 client
                START
                                          basic block builder        trace selector
                client
                                                         dispatch
                                                    context switch
BASIC BLOCK                                                         TRACE
  CACHE                                                              CACHE
 non-control-flow                    indirect branch            non-control-flow      indirect branch
   instructions                          lookup                   instructions         stays on trace?
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                                  116
                            Client Events: Code Stream
• Client has opportunity to inspect and potentially modify every
  single application instruction, immediately before it executes
• Entire application code stream
     – Basic block creation event: can modify the block
     – For comprehensive instrumentation tools
• Or, focus on hot code only
     – Trace creation event: can modify the trace
     – Custom trace creation: can determine trace end condition
     – For optimization and profiling tools
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                  117
                                  Simplifying Client View
• Several optimizations disabled
     –    Elision of unconditional branches
     –    Indirect call to direct call conversion
     –    Shared cache sizing
     –    Process-shared and persistent code caches
• Future release will give client control over optimizations
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               118
                                           Basic Block Event
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag,
                   instrlist_t *bb, bool for_trace,
                   bool translating) {
    instr_t *inst;
    for (inst = instrlist_first(bb);
         inst != NULL;
         inst = instr_get_next(inst)) {
        /* … */
    }
    return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
    dr_register_bb_event(event_basic_block);
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               119
                                           Trace Event
static dr_emit_flags_t
event_trace(void *drcontext, void *tag,
            instrlist_t *trace, bool translating) {
    instr_t *inst;
    for (inst = instrlist_first(trace);
         inst != NULL;
         inst = instr_get_next(inst)) {
        /* … */
    }
    return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
    dr_register_trace_event(event_trace);
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                         120
                    Client Events: Application Actions
• Application thread creation and deletion
• Application library load and unload
• Application exception (Windows)
     – Client chooses whether to deliver or suppress
• Application signal (Linux)
     – Client chooses whether to deliver, suppress, bypass the app
       handler, or redirect control
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     121
             Client Events: Application System Calls
• Application pre- and post- system call
     – Platform-independent system call parameter access
     – Client can modify:
             • Return value in post-, or set value and skip syscall in pre-
             • Call number
             • Params
     – Client can invoke an additional system call as the app
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                              122
                            Client Events: Bookkeeping
• Initialization and Exit
     – Entire process
     – Each thread
     – Child of fork (Linux-only)
• Basic block and trace deletion during cache management
• Nudge received
     – Used for communication into client
• Itimer fired (Linux-only)
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           123
                                           Multiple Clients
• It is each client's responsibility to ensure compatibility with
  other clients
     – Instruction stream modifications made by one client are visible to
       other clients
• At client registration each client is given a priority
     – dr_init() called in priority order (priority 0 called first and thus
       registers its callbacks first)
• Event callbacks called in reverse order of registration
     – Gives precedence to first registered callback, which is given the
       final opportunity to modify the instruction stream or influence
       DynamoRIO's operation
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                              124
                                   DynamoRIO API Outline
•     Building and Deploying
•     Events
•     Utilities
•     Instruction Manipulation
•     State Translation
•     Comparison with Pin
•     Troubleshooting
    DynamoRIO Tutorial at CGO 24 April 2010
                                                           125
                    DynamoRIO API: General Utilities
• Transparency support
     – Separate memory allocation and I/O
     – Alternate stack
• Thread support
     – Thread-local memory
     – Simple mutexes
     – Thread-private code caches, if requested
 DynamoRIO Tutorial at CGO 24 April 2010
                                                       126
          DynamoRIO API: General Utilities, Cont’d
• Communication
     – Nudges: ping from external process
     – File creation, reading, and writing
• Sideline support
     – Create new client-only thread
     – Thread-private itimer (Linux-only)
 DynamoRIO Tutorial at CGO 24 April 2010
                                                     127
          DynamoRIO API: General Utilities, Cont’d
• Application inspection
     –    Address space querying
     –    Module iterator
     –    Processor feature identification
     –    Symbol lookup (currently Windows-only)
• Third-party library support
     –    If transparency is maintained!
     –    -wrap support on Linux
     –    ntdll.dll link support on Windows
     –    Custom loader for private library copy on Windows
 DynamoRIO Tutorial at CGO 24 April 2010
                                                              128
                                           DynamoRIO Heap
• Three flavors:
     –    Thread-private: no synchronization; thread lifetime
     –    Global: synchronized, process lifetime
     –    “Non-heap”: for generated code, etc.
     –    No header on allocated memory: low overhead but must pass
          size on free
• Leak checking
     – Debug build complains at exit if memory was not deallocated
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                      129
                                   DynamoRIO API Outline
•     Building and Deploying
•     Events
•     Utilities
•     Instruction Manipulation
•     State Translation
•     Comparison with Pin
•     Troubleshooting
    DynamoRIO Tutorial at CGO 24 April 2010
                                                           130
          DynamoRIO API: Instruction Representation
•     Full IA-32/AMD64 instruction representation
•     Instruction creation with auto-implicit-operands
•     Operand iteration
•     Instruction lists with iteration, insertion, removal
•     Decoding at various levels of detail
•     Encoding
    DynamoRIO Tutorial at CGO 24 April 2010
                                                             131
                             Instruction Representation
8d 34 01                                  lea      (%ecx,%eax,1) -> %esi        -
8b 46 0c                                  mov      0xc(%esi) -> %eax            -
2b 46 1c                                  sub      0x1c(%esi) %eax -> %eax   WCPAZSO
0f b7 4e 08                               movzx    0x8(%esi) -> %ecx            -
c1 e1 07                                  shl      $0x07 %ecx -> %ecx        WCPAZSO
3b c1                                     cmp      %eax %ecx                 WCPAZSO
0f 8d a2 0a 00 00 jnl                              $0x77f52269                 RSO
      raw bytes                           opcode          operands           eflags
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                     132
                             Instruction Representation
                                          lea      (%ecx,%eax,1) -> %edi        -
                                          mov      0xc(%edi) -> %eax            -
                                          sub      0x1c(%edi) %eax -> %eax   WCPAZSO
                                          movzx    0x8(%edi) -> %ecx            -
c1 e1 07                                  shl      $0x07 %ecx -> %ecx        WCPAZSO
3b c1                                     cmp      %eax %ecx                 WCPAZSO
0f 8d a2 0a 00 00 jnl                              $0x77f52269                 RSO
      raw bytes                           opcode          operands            eflags
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                     133
                                         Instruction Creation
•       Method 1: use the INSTR_CREATE_opcode macros that fill
        in implicit operands automatically:
        instr_t *instr = INSTR_CREATE_dec(dcontext,
           opnd_create_reg(REG_EDX));
•       Method 2: specify opcode + all operands (including implicit
        operands):
        instr_t *instr = instr_create(dcontext);
        instr_set_opcode(instr, OP_dec);
        instr_set_num_opnds(dcontext, instr, 1, 1);
        instr_set_dst(instr, 0, opnd_create_reg(REG_EDX));
        instr_set_src(instr, 0, opnd_create_reg(REG_EDX));
    DynamoRIO Tutorial at CGO 24 April 2010
                                                                      134
                                      Linear Control Flow
• Both basic blocks and traces are
  linear
• Instruction sequences are all
  single-entrance, multiple-exit
• Greatly simplifies analysis
  algorithms
 DynamoRIO Tutorial at CGO 24 April 2010
                                                            135
                                      64-Bit Versus 32-Bit
• 32-bit build of DynamoRIO only handles 32-bit code
• 64-bit build of DynamoRIO decodes/encodes both 32-bit and
  64-bit code
     – Current release does not support executing applications that mix
       the two
• IR is universal: covers both 32-bit and 64-bit
     – Abstracts away underlying mode
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     136
                 64-Bit Thread and Instruction Modes
• When going to or from the IR, the thread mode and instruction
  mode determine how instrs are interpreted
• When decoding, current thread’s mode is used
     – Default is 64-bit for 64-bit DynamoRIO
     – Can be changed with set_x86_mode()
• When encoding, that instruction’s mode is used
     – When created, set to mode of current thread
     – Can be changed with instr_set_x86_mode()
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             137
                                           64-Bit Clients
• Define X86_64 before including header files when building a
  64-bit client
• Convenience macros for printf formats, etc. are provided
     – E.g.:
             • printf(“Pointer is ”PFX“\n”, p);
• Use “X” macros for cross-platform registers
     – REG_XAX is REG_EAX when compiled 32-bit, and REG_RAX
       when compiled 64-bit
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                138
                DynamoRIO API: Code Manipulation
• Processor information
• State preservation
     – Eflags, arith flags, floating-point state, MMX/SSE state
     – Spill slots, TLS
• Clean calls to C code
• Dynamic instrumentation
     – Replace code in the code cache
• Branch instrumentation
     – Convenience routines
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                  139
                                   Processor Information
• Processor type
     – proc_get_vendor(), proc_get_family(), proc_get_type(),
       proc_get_model(), proc_get_stepping(), proc_get_brand_string()
• Processor features
     – proc_has_feature(), proc_get_all_feature_bits()
• Cache information
     – proc_get_cache_line_size(), proc_is_cache_aligned(),
       proc_bump_to_end_of_cache_line(),
       proc_get_containing_page()
     – proc_get_L1_icache_size(), proc_get_L1_dcache_size(),
       proc_get_L2_cache_size(), proc_get_cache_size_str()
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                   140
                                           State Preservation
• Spill slots for registers
     – 3 fast slots, 6/14 slower slots
     – dr_save_reg(), dr_restore_reg(), and dr_reg_spill_slot_opnd()
     – from C code: dr_read_saved_reg(), dr_write_saved_reg()
• Dedicated TLS field for thread-local data
     – dr_insert_read_tls_field(), dr_insert_write_tls_field()
     – from C code: dr_get_tls_field(), dr_set_tls_field()
• Arithmetic flag preservation
     – dr_save_arith_flags(), dr_restore_arith_flags()
• Floating-point/MMX/SSE state
     – dr_insert_save_fpstate(), dr_insert_restore_fpstate()
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                       141
                           Thread-Local Storage (TLS)
• Absolute addressing
     – Thread-private only
• Application stack
     – Not reliable or transparent
• Stolen register
     – Performance hit
• Segment
     – Best solution for thread-shared
 DynamoRIO Tutorial at CGO 24 April 2010
                                                        142
                                              Clean Calls
if (instr_is_mbr(instr)) {
  app_pc address = instr_get_app_pc(instr);
  uint opcode = instr_get_opcode(instr);
  instr_t *nxt = instr_get_next(instr);
  dr_insert_clean_call(drcontext, ilist, nxt, (void *) at_mbr,
                       false/*don't need to save fp state*/,
                       2 /* 2 parameters */,
                       /* opcode is 1st parameter */
                       OPND_CREATE_INT32(opcode),
                       /* address is 2nd parameter */
                       OPND_CREATE_INTPTR(address));
}
•     Saved interrupted application state can be accessed using
      dr_get_mcontext() and modified using dr_set_mcontext()
    DynamoRIO Tutorial at CGO 24 April 2010
                                                                  143
                               Dynamic Instrumentation
• Thread-shared: flush all code corresponding to application
  address and then re-instrument when re-executed
     – Can flush from clean call, and use dr_redirect_execution() since
       cannot return to potentially flushed cache fragment
• Thread-private: can also replace particular fragment (does not
  affect other potential copies of the source app code)
     – dr_replace_fragment()
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     144
                                      Flushing the Cache
• Immediately deleting or replacing individual code cache
  fragments is available for thread-private caches
     – Only removes from that thread’s cache
• Two basic types of thread-shared flush:
     – Non-precise: remove all entry points but let target cache code be
       invalidated and freed lazily
     – Precise/synchronous:
             • Suspend the world
             • Relocate threads inside the target cache code
             • Invalidate and free the target code immediately
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     145
                                      Flushing the Cache
• Thread-shared flush API routines:
     – dr_unlink_flush_region(): non-precise flush
     – dr_flush_region(): synchronous flush
     – dr_delay_flush_region():
             • No action until a thread exits code cache on its own
             • If provide a completion callback, synchronous once triggered
             • Without a callback, non-precise
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                              146
                                   DynamoRIO API Outline
•     Building and Deploying
•     Events
•     Utilities
•     Instruction Manipulation
•     State Translation
•     Comparison with Pin
•     Troubleshooting
    DynamoRIO Tutorial at CGO 24 April 2010
                                                           147
                          DynamoRIO API: Translation
• Translation refers to the mapping of a code cache machine
  state (program counter, registers, and memory) to its
  corresponding application state
     – The program counter always needs to be translated
     – Registers and memory may also need to be translated
       depending on the transformations applied when copying into the
       code cache
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                   148
                               Translation Case 1: Fault
                    user context                     user context
                faulting instr.                    faulting instr.
• Exception and signal handlers are passed machine context of
  the faulting instruction.
• For transparency, that context must be translated from the
  code cache to the original code location
• Translated location should be where the application would
  have had the fault or where execution should be resumed
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     149
                        Translation Case 2: Relocation
• If one application thread suspends another, or DynamoRIO
  suspends all threads for a synchronous cache flush:
     – Need suspended target thread in a safe spot
     – Not always practical to wait for it to arrive at a safe spot (if in a
       system call, e.g.)
• DynamoRIO forcibly relocates the thread
     – Must translate its state to the proper application state at which to
       resume execution
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                               150
                                 Translation Approaches
• Two approaches to program counter translation:
     – Store mappings generated during fragment building
             • High memory overhead (> 20% for some applications, because it
               prevents internal storage optimizations) even with highly optimized
               difference-based encoding. Costly for something rarely used.
     – Re-create mapping on-demand from original application code
             • Cache consistency guarantees mean the corresponding application
               code is unchanged
             • Requires idempotent code transformations
• DynamoRIO supports both approaches
     – The engine mostly uses the on-demand approach, but stored
       mappings are occasionally needed
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                 151
                            Instruction Translation Field
• Each instruction contains a translation field
• Holds the application address that the instruction corresponds
  to
• Set via instr_set_translation()
 DynamoRIO Tutorial at CGO 24 April 2010
                                                              152
                Context Translation Via Re-Creation
                                  A1: mov   %ebx, %ecx
                                  A2: add   %eax, (%ecx)
                                  A3: cmp   $4, (%eax)
                                  A4: jle   710349fb
  C1: mov             %ebx, %ecx               D1: (A1) mov   %ebx, %ecx
  C2: add             %eax, (%ecx)             D2: (A2) add   %eax, (%ecx)
  C3: cmp             $4, (%eax)               D3: (A3) cmp   $4, (%eax)
  C4: jle             <stub0>                  D4: (A4) jle   <stub0>
  C5: jmp             <stub1>                  D5: (A4) jmp   <stub1>
DynamoRIO Tutorial at CGO 24 April 2010
                                                                             153
                       Meta vs. Non-Meta Instructions
• Non-Meta instructions are treated as application instructions
     – They must have translations
     – Control flow changing instructions are modified to retain
       DynamoRIO control and result in cache populating
• Meta instructions are added instrumentation code
     – Not treated as part of the application (e.g., calls run natively)
     – Cannot fault, so translations not needed
• Meta-may-fault instructions
     – Can fault, but should not be “interpreted”: won’t modify app code
     – Fault typically deliberate and handled by client
• Xref instr_set_ok_to_mangle() and
   instr_set_meta_may_fault()
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                           154
                              Client Translation Support
• Instruction lists passed to clients are annotated with
  translation information
     – Read via instr_get_translation()
     – Clients are free to delete instructions, change instructions and
       their translations, and add new meta and non-meta instructions
       (see dr_register_bb_event() for restrictions)
     – An idempotent client that restricts itself to deleting app
       instructions and adding non-faulting meta instructions can ignore
       translation concerns
     – DynamoRIO takes care of instructions added by API routines
       (insert_clean_call(), etc.)
• Clients can choose between storing or regenerating
  translations on a fragment by fragment basis.
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                      155
                      Client Regenerated Translations
• Client returns DR_EMIT_DEFAULT from its bb or trace event
  callback
• Client bb & trace event callbacks are re-called when
  translations are needed with translating==true
• Client must exactly duplicate transformations performed when
  the block was generated
• Client must set translation field for all added non-meta
  instructions and all meta-may-fault instructions
     – This is true even if translating==false since DynamoRIO may
       decide it needs to store translations anyway
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     156
                              Client Stored Translations
• Client returns DR_EMIT_STORE_TRANSLATIONS from its
  bb or trace event callback
• Client must set translation field for all added non-meta
  instructions and all meta-may-fault instructions
• Client bb or trace hook will not be re-called with
  translating==true
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             157
                              Register State Translation
• Translation may be needed at a point where some registers
  are spilled to memory
     – During indirect branch or RIP-relative mangling, e.g.
• DynamoRIO walks fragment up to translation point, tracking
  register spills and restores
     – Special handling for stack pointer around indirect calls and
       returns
• DynamoRIO tracks client spills and restores implicitly added
  by API routines
     – Clean calls, etc.
     – Explicit spill/restore (e.g., dr_save_reg()) client’s responsibility
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                              158
                      Client Register State Translation
• If a client adds its own register spilling/restoring code or
  changes register mappings it must register for the restore
  state event to correct the context
• The same event can also be used to fix up the application’s
  view of memory
• DynamoRIO does not internally store this kind of translation
  information ahead of time when the fragment is built
     – The client must maintain its own data structures
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                 159
                                   DynamoRIO API Outline
•     Building and Deploying
•     Events
•     Utilities
•     Instruction Manipulation
•     State Translation
•     Comparison with Pin
•     Troubleshooting
    DynamoRIO Tutorial at CGO 24 April 2010
                                                           160
                                 DynamoRIO versus Pin
• Basic interface is fundamentally different
• Pin = insert callout/trampoline only
     – Not so different from tools that modify the original code: Dyninst,
       Vulcan, Detours
     – Uses code cache only for transparency
• DynamoRIO = arbitrary code stream modifications
     – Only feasible with a code cache
     – Takes full advantage of power of code cache
     – General IA-32/AMD64 decode/encode/IR support
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                        161
                                 DynamoRIO versus Pin
• Pin = insert callout/trampoline only
     – Pin tries to inline and optimize
     – Client has little control or guarantee over final performance
• DynamoRIO = arbitrary code stream modifications
     – Client has full control over all inserted instrumentation
     – Result can be significant performance difference
             • PiPA Memory Profiler + Cache Simulator:
               3.27x speedup w/ DynamoRIO vs 2.6x w/ Pin
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                       162
         Base Performance Comparison (No Tool)
                                           171%
                                             121%
DynamoRIO Tutorial at CGO 24 April 2010
                                                    163
                            Base Memory Comparison
                                                      44MB
                                                     15MB
                                                          3.5MB
                                                        2.6MB
DynamoRIO Tutorial at CGO 24 April 2010
                                                                  164
                                           BBCount Pin Tool
static int bbcount;
VOID PIN_FAST_ANALYSIS_CALL docount() { bbcount++; }
VOID Trace(TRACE trace, VOID *v) {
    for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) {
        BBL_InsertCall(bbl, IPOINT_ANYWHERE, AFUNPTR(docount),
                       IARG_FAST_ANALYSIS_CALL, IARG_END);
    }
}
int main(int argc, CHAR *argv[]) {
    PIN_InitSymbols();
    PIN_Init(argc, argv);
    TRACE_AddInstrumentFunction(Trace, 0);
    PIN_StartProgram();
    return 0;
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                  165
                             BBCount DynamoRIO Tool
static int global_count;
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
                   bool for_trace, bool translating) {
    instr_t *instr, *first = instrlist_first(bb);
    uint flags;
    /* Our inc can go anywhere, so find a spot where flags are dead.
     * Technically this can be unsafe if app reads flags on fault =>
     * stop at instr that can fault, or supply runtime op */
    for (instr = first; instr != NULL; instr = instr_get_next(instr)) {
        flags = instr_get_arith_flags(instr);
        /* OP_inc doesn't write CF but not worth distinguishing */
        if (TESTALL(EFLAGS_WRITE_6, flags) && !TESTANY(EFLAGS_READ_6, flags))
            break;
    }
    if (instr == NULL)
        dr_save_arith_flags(drcontext, bb, first, SPILL_SLOT_1);
    instrlist_meta_preinsert(bb, (instr == NULL) ? first : instr,
        INSTR_CREATE_inc(drcontext, OPND_CREATE_ABSMEM((byte *)&global_count, OPSZ_4)));
    if (instr == NULL)
        dr_restore_arith_flags(drcontext, bb, first, SPILL_SLOT_1);
    return DR_EMIT_DEFAULT;
}
DR_EXPORT void dr_init(client_id_t id) {
    dr_register_bb_event(event_basic_block);
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                     166
                    BBCount: Pin Inlining Importance
                                                       569%
                                                         231%
DynamoRIO Tutorial at CGO 24 April 2010
                                                              167
                 BBCount Performance Comparison
                                                233%
                                             226%
                                                   185%
DynamoRIO Tutorial at CGO 24 April 2010
                                                          168
                                   DynamoRIO API Outline
•     Building and Deploying
•     Events
•     Utilities
•     Instruction Manipulation
•     State Translation
•     Comparison with Pin
•     Troubleshooting
    DynamoRIO Tutorial at CGO 24 April 2010
                                                           169
                                           Obtaining Help
• Read the documentation
     – http://dynamorio.org/docs/
• Look at the sample clients
     – In the documentation
     – In the release package: samples/
• Ask on the DynamoRIO Users discussion forum/mailing list
     – http://groups.google.com/group/dynamorio-users
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             170
                                           Debugging Clients
• Use the DynamoRIO debug build for asserts
     – Often point out the problem
• Use logging
     – -loglevel N
     – stored in logs/ subdir of DR install dir
• Attach a debugger
     –    gdb or windbg
     –    -msgbox_mask 0xN
     –    -no_hide
     –    windbg: .reload myclient.dll=0xN
• More tips:
     – http://code.google.com/p/dynamorio/wiki/Debugging
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               171
                                           Reporting Bugs
• Search the Issue Tracker off http://dynamorio.org first
     – http://code.google.com/p/dynamorio/issues/list
• File a new Issue if not found
• Follow conventions on wiki
     – http://code.google.com/p/dynamorio/wiki/BugReporting
     – CRASH, APP CRASH, HANG, ASSERT
• Example titles:
     – CRASH (1.3.1 calc.exe)
       vm_area_add_fragment:vmareas.c(4466)
     – ASSERT (1.3.0 suite/tests/common/segfault)
       study_hashtable:fragment.c:1745 ASSERT_NOT_REACHED
 DynamoRIO Tutorial at CGO 24 April 2010
                                                              172
                        Changes From Prior Releases
• Backward compatible with 1.0 (0.9.6) and above
     – Except configuration and deployment scheme and tools:
       switched to file-based scheme to support unprivileged and
       parallel execution on Windows
• Not backward compatible with 0.9.1-0.9.5
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                   173
Examples: Part 2
1:30-1:40   Welcome + DynamoRIO History
1:40-2:40   DynamoRIO Internals
2:40-3:00   Examples, Part 1
3:00-3:15   Break
3:15-4:15   DynamoRIO API
4:15-5:15   Examples, Part 2
5:15-5:30   Feedback
                                           More Examples
• Dynamic Optimization
     – Strength Reduction
     – Software Prefetching
• Profiling
     – Memory Reference Trace
     – PiPA
• Shadow Memory
     – Umbra
• Dr. Memory
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           175
                 Dynamic Optimization Opportunities
• Traditional compiler optimizations
     – Compiler has limited view: application assembled at runtime
     – Some shipped products are built without optimizations
• Microarchitecture-specific optimizations
     – Feature set and relative performance of instructions varies
     – Combinatorial blowup if done statically
• Adaptive optimizations
     – Need runtime information: prior profiling runs not always
       representative
     – Execution phase changes during execution
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     176
               Dynamic Optimization in DynamoRIO
• Traces are natural unit for optimization
     – Focus only on hot code
     – Cross procedure, file and module boundaries
• Linear control flow
     – Single-entry, multi-exit simplifies analysis
• Support for adaptive optimization
     – Can replace traces dynamically
 DynamoRIO Tutorial at CGO 24 April 2010
                                                      177
               DynamoRIO with Trace Optimization
              START                       basic block builder        trace selector
                                                        dispatch
                                                    context switch
BASIC BLOCK                                                        TRACE
  CACHE                                                             CACHE
                                    indirect branch                                   indirect branch
                                        lookup                                         stays on trace?
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                                   178
                       Strength Reduction: inc to add
• Pentium 4
     – inc is slower add 1
     – dec is slower than sub 1
• Pentium 3
     – inc is faster add 1
     – dec is faster than sub 1
• Microarchitecture-specific optimization best performed
  dynamically
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           179
EXPORT void dr_init() {
                                                                          Pentium 4?
    if (proc_get_family() == FAMILY_PENTIUM_IV)
        dr_register_trace_event(event_trace);
}
static void event_trace(void *drcontext, app_pc tag, instrlist_t *trace, bool xl8) {
    instr_t *instr, *next_instr;
    int opcode;
    for (instr = instrlist_first(bb); instr != NULL; instr = next_instr) {
        next_instr = instr_get_next(instr);
        opcode = instr_get_opcode(instr);
        if (opcode == OP_inc || opcode == OP_dec)
            replace_inc_with_add(drcontext, instr, trace);
        }
}
    }                                                          Look for inc / dec
static bool replace_inc_with_add(void *drcontext, instr_t *instr, instrlist_t *trace) {
    instr_t *in;
    uint eflags;
    int opcode = instr_get_opcode(instr);
    bool ok_to_replace = false;
    for (in = instr; in != NULL; in = instr_get_next(in)) {
        eflags = instr_get_arith_flags(in);
        if ((eflags & EFLAGS_READ_CF) != 0) return false;
        if ((eflags & EFLAGS_WRITE_CF) != 0) {
            ok_to_replace = true;
            break;
                                                   Ensure eflags change ok
        }
        if (instr_is_exit_cti(in)) return false;
    }
    if (!ok_to_replace) return false;
    if (opcode == OP_inc)
        in = INSTR_CREATE_add(drcontext, instr_get_dst(instr, 0), OPND_CREATE_INT8(1));
     else
         in = INSTR_CREATE_sub(drcontext, instr_get_dst(instr, 0), OPND_CREATE_INT8(1));
     instr_set_prefixes(in, instr_get_prefixes(instr));
                                                      Replace with add / sub
     instrlist_replace(trace, instr, in);
     instr_destroy(drcontext, instr);
     return true;
}
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                           180
                                                         Strength Reduction Results
                                                                                                                                         2% mean
                            1.2
                            1.1
                                                                                                                                         speedup
Normalized Execution Time
                            1.0
                            0.9
                            0.8
                            0.7
                            0.6
                            0.5
                            0.4
                            0.3
                            0.2
                            0.1
                            0.0
                                               applu
                                                                                                     mgrid
                                                                              equake
                                    ammp
                                                                                                             sixtrack
                                                                                                                               wupwise
                                                                                                                                          har. mean
                                                          apsi
                                                                                                                        swim
                                                                                          mesa
                                                                        art
                                              base       inc2add                       B e nchmark
                              DynamoRIO Tutorial at CGO 24 April 2010
                                                                                                                                                      181
                                     Software Prefetching
• Ubiquitous Memory Introspection (cgo 2007)
     – Sampling to select hot traces
     – Instrument to collect memory references
     – Analyze to discover reference patterns
             • Stride
     – Insert software prefetching instruction
 DynamoRIO Tutorial at CGO 24 April 2010
                                                            182
                         Software Prefetching Results
DynamoRIO Tutorial at CGO 24 April 2010
                                                        183
                                           More Examples
• Dynamic Optimization
     – Strength Reduction
     – Software Prefetching
• Profiling
     – Memory Reference Trace
     – PiPA
• Shadow Memory
     – Umbra
• Dr. Memory
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           184
                               Memory Reference Trace
• Memtrace
     – Profile format
             • <r/w, addr>
     – Steps
             • Thread initialization
                    – Allocate buffer per thread
             • Instrumentation
                    – Fill the buffer
                    – Dump to file if buffer is full
             • Thread exit
                    – Delete the buffer
     – Optimization
             • Inline buffer filling code
             • Out-line the clean call invocation code
 DynamoRIO Tutorial at CGO 24 April 2010
                                                         185
                                           Register Events
DR_EXPORT void
dr_init(client_id_t id)
{
    …
    mutex = dr_mutex_create();
    dr_register_exit_event(event_exit);
    dr_register_thread_init_event(event_thread_init);
    dr_register_thread_exit_event(event_thread_exit);
    dr_register_bb_event(event_basic_block);
    /* out-lined clean call invocation code */
    code_cache_init();
    …
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             186
                         Own Code Cache Initialization
static void code_cache_init(void) {
    drcontext      = dr_get_current_drcontext();
    code_cache = dr_nonheap_alloc(PAGE_SIZE, DR_MEMPROT_READ |
                                                      DR_MEMPROT_WRITE |
                                                      DR_MEMPROT_EXEC);
    ilist = instrlist_create(drcontext);
    where = INSTR_CREATE_jmp_ind(drcontext, opnd_create_reg(REG_XCX));
    instrlist_meta_append(ilist, where);
    dr_insert_clean_call(drcontext, ilist, where, (void *)clean_call, false, 0);
    end = instrlist_encode(drcontext, ilist, code_cache, false);
    DR_ASSERT((end - code_cache) < PAGE_SIZE);
    instrlist_clear_and_destroy(drcontext, ilist);
    dr_memory_protect(code_cache, PAGE_SIZE, DR_MEMPROT_READ |
                                                       DR_MEMPROT_EXEC);
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                   187
                                           Clean Call
static void clean_call() {
    …
    drcontext = dr_get_current_drcontext();
    data          = dr_get_tls_field(drcontext);
    mem_ref = (mem_ref_t *)databuf_base;
    num_refs = (int)((mem_ref_t *)databuf_ptr - mem_ref);
    for (i = 0; i < num_refs; i++) {
            dr_fprintf(datalog, "%c:"PFX"\n", mem_refwrite ? 'w' : 'r', mem_refaddr);
            ++mem_ref;
    }
    datanum_refs += num_refs;
    …
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                      188
                              Thread initialization & exit
void event_thread_init(void *drcontext) {
    …
    data = dr_thread_alloc(drcontext, sizeof(per_thread_t));
    dr_set_tls_field(drcontext, data);
    databuf_base = dr_thread_alloc(drcontext, MAX_MEM_BUF_SIZE);
    …
    datanum_refs = 0;
}
void event_thread_exit(void *drcontext) {
    data = dr_get_tls_field(drcontext);
    dr_mutex_lock(mutex);
    num_refs += datanum_refs;
    dr_mutex_unlock(mutex);
    dr_thread_free(drcontext, databuf_base, MAX_MEM_BUF_SIZE);
    dr_thread_free(drcontext, data, sizeof(per_thread_t));
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                    189
                            Basic Block Instrumentation
static dr_emit_flags_t
event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
                          bool for_trace, bool translating) {
   …
   for (instr = instrlist_first(bb); instr != NULL; instr = instr_get_next(instr)) {
      if (instr_get_app_pc(instr) == NULL) continue;
      if (instr_reads_memory(instr))
          for (i = 0; i < instr_num_srcs(instr); i++)
             if (opnd_is_memory_reference(instr_get_src(instr, i)))
                 instrument_mem(drcontext, bb, instr, i, false);
      if (instr_writes_memory(instr))
          for (i = 0; i < instr_num_dsts(instr); i++)
             if (opnd_is_memory_reference(instr_get_dst(instr, i)))
                 instrument_mem(drcontext, bb, instr, i, true);
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                       190
                                     instrument_mem (I)
  /* get memory reference address into reg1 */
  opnd1 = opnd_create_reg(reg1);
  if (opnd_is_base_disp(ref)) {
          /* lea [ref]  reg */
          opnd2 = ref;
          opnd_set_size(&opnd2, OPSZ_lea);
          instr = INSTR_CREATE_lea(drcontext, opnd1, opnd2);
  } else if(IF_X64(opnd_is_rel_addr(ref) ||) opnd_is_abs_addr(ref)) {
          /* mov addr  reg */
          opnd2 = OPND_CREATE_INTPTR(opnd_get_addr(ref));
          instr = INSTR_CREATE_mov_imm(drcontext, opnd1, opnd2);
  } else {
          instr = NULL;
          DR_ASSERT_MSG(false, "Unhandled instructions");
  }
  instrlist_meta_preinsert(ilist, where, instr);
DynamoRIO Tutorial at CGO 24 April 2010
                                                                        191
                                     instrument_mem (II)
  /* Move write/read to write field */
  opnd1 = OPND_CREATE_MEM32(reg2, offsetof(mem_ref_t, write));
  opnd2 = OPND_CREATE_INT32(write);
  instr = INSTR_CREATE_mov_imm(drcontext, opnd1, opnd2);
  instrlist_meta_preinsert(ilist, where, instr);
  /* Store address in memory ref */
  opnd1 = OPND_CREATE_MEMPTR(reg2, offsetof(mem_ref_t, addr));
  opnd2 = opnd_create_reg(reg1);
  instr = INSTR_CREATE_mov_st(drcontext, opnd1, opnd2);
  instrlist_meta_preinsert(ilist, where, instr);
   /* Increment reg value by pointer size using lea instr */
  opnd1 = opnd_create_reg(reg2);
  opnd2 = opnd_create_base_disp(reg2, REG_NULL, 0, sizeof(mem_ref_t), OPSZ_lea);
  instr = INSTR_CREATE_lea(drcontext, opnd1, opnd2);
  instrlist_meta_preinsert(ilist, where, instr);
DynamoRIO Tutorial at CGO 24 April 2010
                                                                            192
                                       instrument_mem (III)
/* jecxz call */
call = INSTR_CREATE_label(drcontext);
instrlist_meta_preinsert(ilist, where, INSTR_CREATE_jecxz(drcontext, opnd_create_instr(call)));
/* jump restore to skip clean call */
restore = INSTR_CREATE_label(drcontext);
instrlist_meta_preinsert(ilist, where,
                          INSTR_CREATE_jmp(drcontext, opnd_create_instr(restore)));
instrlist_meta_preinsert(ilist, where, call);
/* mov restore REG_XCX */
instr = INSTR_CREATE_mov_st(drcontext,opnd_create_reg(reg2),opnd_create_instr(restore));
instrlist_meta_preinsert(ilist, where, instr);
/* jmp code_cache */
opnd1 = opnd_create_pc(code_cache);
instrlist_meta_preinsert(ilist, where, INSTR_CREATE_jmp(drcontext, opnd1));
/* restore %reg */
instrlist_meta_preinsert(ilist, where, restore);
   DynamoRIO Tutorial at CGO 24 April 2010
                                                                                       193
                                           PiPA
• Pipelined Profiling and Analysis (cgo 2008)
     – Stages (thread/process)
             • Profiling
             • Reconstruction/extraction
             • Analysis
     – Profiling
             • Runtime Execution Profile (REP)
     – Communication
             • Double buffer
     – Analysis
             • Parallel cache simulation
 DynamoRIO Tutorial at CGO 24 April 2010
                                                  194
                                           More Examples
• Dynamic Optimization
     – Strength Reduction
• Profiling
     – Memory Reference Trace
• Shadow Memory
     – Umbra
• Dr. Memory
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           195
                                           Shadow Memory
• Application
     – Store meta-data to track properties of application memory
             •   Millions of software watchpoints
             •   Dynamic information flow tracking (taint propagation)
             •   Race detection
             •   Memory usage debugging tool (MemCheck/Dr. Memory)
• Issues
     –    Performance
     –    Multi-thread applications
     –    Flexibility
     –    Platform dependent
     –    Development challenges
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                         196
                                         Umbra (CGO 2010)
•     Design
•     Implementation
•     Optimization
•     Download
        – http://people.csail.mit.edu/qin_zhao/umbra/
    DynamoRIO Tutorial at CGO 24 April 2010
                                                            197
                                                 Design
• Address Space                                                     App Mem 1
     – A collection of fixed size units
             • 4G (64-bit)                                           Unused
             • Application, Shadow, Unused
                                                                    Shd Mem 1
• Translation Table
     – Translation from application memory unit to                   Unused
       corresponding shadow memory unit                             Shd Mem 2
        addr shd = addr app × scale + offset
            App Mem                        Shd Mem        Offset    Shd Mem 3
          [0x00000000,                 [0x20000000,   0x20000000
          0x10000000)                  0x30000000)                  App Mem 2
          [0x60000000,                 [0x40000000,   -0x20000000
                                                                    App Mem 3
          0x70000000)                  0x50000000)
          [0x80000000,                 [0x50000000,   -0x20000000
          0x90000000)                  0x60000000)
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                              198
                                           Implementation
• Memory Manager
     – Monitor and control application memory allocation
             • brk, mmap, munmap, mremap
             • dr_register_pre_syscall_event
             • dr_register_post_syscall_event
     – Allocate shadow memory
     – Maintain translation table
• Instrumenter
     – Instrument every memory reference
             •   Context save
             •   Address calculation
             •   Address translation
             •   Shadow memory update
             •   Context restore
 DynamoRIO Tutorial at CGO 24 April 2010
                                                            199
                             Instrument Code Example
Context Save                              mov %ecx  [ECX_SLOT]
                                          mov %edx  [EDX_SLOT]
                                          mov %eax  [EAX_SLOT]
                                          lahf  %ah
                                          seto  %al
Address Calculation                       lea [%ebx, 16]  %ecx
Address Translation                       mov 0  %edx
                                          …                # table lookup code
                                          add %ecx, table[%edx].offset  %ecx
Shadow Memory Update                      mov 1  [%ecx]
Context Restore                           add %al 0x7f
                                          sahf
                                          mov [ECX_SLOT]  %ecx
                                          mov [EDX_SLOT]  %edx
                                          mov [EAX_SLOT]  %eax
Application memory reference              mov 0  [%ebx, 16]
DynamoRIO Tutorial at CGO 24 April 2010
                                                                                 200
                                           Optimization
• Translation Optimization
     – Thread Local Translation Table
     – Memoization Check
     – Reference Check
• Instrumentation Optimization
     – Context Switch Reduction
     – Reference Grouping
     – 3-stage Code Layout
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          201
                                Translation Optimization
• Caching
                                                    App 1
                                                    Shd 2
                                                    Shd 1
                                                    App 2
                                                            Global translation
                                                                  table
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                            202
                                Translation Optimization
• Thread Local Translation Optimization
     – Local translation table per thread
     – Synchronize with global translation table when necessary
     – Avoid lock contention
Thread 1
Thread 2
                                           Thread Local translation   Global translation
                                                    table                   table
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                     203
                                Translation Optimization
• Memoization Cache
     – Software cache per thread
     – Stores frequently used translation entries
             • Stack
             • Units found in last table lookup
Thread 1
Thread 2
                                      Memoization   Thread Local translation   Global translation
                                        Cache                table                   table
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                              204
                                Translation Optimization
• Reference Cache
     – Software cache per static application memory reference
             • Last reference unit tag
             • Last translation offset
Thread 1
Thread 2
                Reference             Memoization   Thread Local translation   Global translation
                  cache                 Cache                table                   table
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                              205
                                                                                              205
                           Instrumentation Optimization
• Context Switch Reduction
     – Registers liveness analysis
• Reference Grouping
     – One translation lookup for multiple references
             • Stack local variables
             • Different members of the same object
• 3-stage Code Layout
     – Inline stub
             • Quick inline check code with minimal context switch
     – Lean procedure
             • Simple assembly procedure with partial context switch
     – Callout
             • C function with complete context switch
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                       206
                                     3-stage Code Layout
• Inline stub
     – Reference cache check
     – Jump to lean procedure if miss
• Lean procedure
     – Memoization cache check
     – Local table lookup
     – Clean call to call out
• Callout
     – Global table synchronization
     – Local table lookup
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           207
                          Instrumentation Optimization
Inline Stub                               Lean Procedure
# reference cache check                   # memorization check
   lea [ref]  %r1                           cmp %r1, cache1.tag
   %r1 & 0xffffffff00000000  %r1            jne .cache1_miss
   cmp %r1, ref.tag                          mov cache1.offset  %r1
   je .update_shadow_memory                  jmp [ret_pc]
# jmp-and-link to lean procedure          .cache1_miss
   mov %r1  ref.tag                         cmp %r1, cache2.tag
   mov .update_ref_cache  [ret_pc]          jne .cache2_miss
   jmp lean_procedure                        mov cache1.offset  %r1
.update_ref_cache                            jmp [ret_pc]
   mov %r1  ref.offset                   .cache2_miss
# shadow memory update                    # table lookup
.update_shadow_memory                        mov %r1  cache2.tag
   lea [ref]  %r1                           mov %r2  [R2_SLOT]
   add %r1 + ref.offset %r1                 …
   mov 1  [%r1]                             mov [R2_SLOT]  %r2
                                             mov %r1  cache2.offset
                                             jmp [ret_pc]
DynamoRIO Tutorial at CGO 24 April 2010
                                                                       208
                               Performance Evaluation
DynamoRIO Tutorial at CGO 24 April 2010
                                                        209
           Umbra Client: Shared Memory Detection
static void instrument_update(void *drcontext, umbra_info_t *umbra_info,
                                mem_ref_t *ref, instrlist_t *ilist, instr_t *where) {
   …
   /* test [%reg].tid_map, tid_map*/
   opnd1 = OPND_CREATE_MEM32(umbra_inforeg, 0, OPSZ_4);
   opnd2 = OPND_CREATE_INT32(client_tls_datatid_map);
   instrlist_meta_preinsert(ilist, where, INSTR_CREATE_test(drcontext, opnd1, opnd2));
   /* jnz where */
   opnd1 = opnd_create_instr(where);
   instrlist_meta_preinsert(ilist, where, INSTR_CREATE_jcc(drcontext, OP_jnz, opnd1));
   /* or */
   opnd1 = OPND_CREATE_MEM32(umbra_inforeg, 0, OPSZ_4);
   opnd2 = OPND_CREATE_INT32(client_tls_datatid_map | 1);
   instr = INSTR_CREATE_or(drcontext, opnd1, opnd2);
   LOCK(instr);
   instrlist_meta_preinsert(ilist, label, instr);
}
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                    210
                                           More Examples
• Dynamic Optimization
     – Strength Reduction
• Profiling
     – Memory Reference Trace
• Shadow Memory
     – Umbra
• Dr. Memory
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           211
                                           Dr. Memory
• Detects reads of uninitialized memory
• Detects heap errors
     –    Out-of-bounds accesses (underflow, overflow)
     –    Access to freed memory
     –    Invalid frees
     –    Memory leaks
• Detects other accesses to invalid memory
     – Stack tracking
     – Thread-local storage slot tracking
• Operates at runtime on unmodified Windows & Linux binaries
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           212
                            Dr. Memory Instrumentation
• Monitor all memory accesses, stack adjustments, and heap
  allocations
• Shadow each byte of app memory
• Each byte’s shadow stores one of 4 values:
     –    Unaddressable
     –    Uninitialized
     –    Defined at byte level
     –    Defined at bit level  escape to extra per-bit shadow values
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                         213
                                             Dr. Memory
                            Shadow Stack                       Shadow Heap
   Stack                                              Heap
                                  defined            redzone     invalid
                               undefined                         defined
                                                     malloc     undefined
                                  defined
                                                                 defined
                                   invalid           redzone     invalid
                                                      freed      invalid
DynamoRIO Tutorial at CGO 24 April 2010
                                                                            214
  Partial-Word Defines But Whole-Word Transfers
• Sub-dword variables are moved around as whole dwords
• Cannot raise error when a move reads uninitialized bits
• Must propagate on moves and thus must shadow registers
     – Propagate shadow values by mirroring app data flow
• Check system call reads and propagate system call writes
     – Else, false negatives (reads) or positives (writes)
• Raise errors instead of propagating at certain points
     – Report errors only on “significant” reads
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             215
                                    Shadowing Registers
• Use multiple TLS slots
     – dr_raw_tls_calloc()
     – Alternative: steal register
• Can read and write w/o spilling
• Bring into spilled register to combine w/ other args
     – Defined=0, uninitialized=1
     – Combine via bitwise or
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          216
                             Monitoring Stack Changes
• As stack is extended and contracts again, must update stack
  shadow as unaddressable vs uninitialized
• Push, pop, or any write to stack pointer
• Try to distinguish large alloc/dealloc from stack swap
 DynamoRIO Tutorial at CGO 24 April 2010
                                                            217
                      Kernel-Mediated Stack Changes
• Kernel places data on the stack and removes it again
     – Windows: APC, callback, and exception
     – Linux: signals
• Linux signals as an example:
     – intercept sigaltstack changes
     – intercept handler registration to instrument handler code
     – use DR's signal event to record app xsp at interruption point
     – when see event followed by handler, check which stack and
       mark from either interrupted xsp or altstack base to cur xsp as
       defined (ignoring padding)
     – record cur xsp in handler, and use to undo on sigreturn
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                         218
                               Types Of Instrumentation
• Clean call
     – Simplest, but expensive in both time and space
• Shared clean call
     – Saves space
• Lean procedure
     – Shared routine with smaller context switch than full clean call
• Inlined
     – Smallest context switch, but should limit to small sequences of
       instrumentation
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                         219
                                 Non-Code-Cache Code
• Use dr_nonheap_alloc() to allocate space to store code
• Generate code using DR’s IR and emit to target space
• Mark read-only once emitted via dr_memory_protect()
 DynamoRIO Tutorial at CGO 24 April 2010
                                                           220
                                           Jump-and-Link
• Rather than using call+return, avoid stack swap cost by using
  jump-and-link
     – Store return address in a register or TLS slot
     – Direct jump to target
     – Indirect jump back to source
     PRE(bb, inst, INSTR_CREATE_mov_st(drcontext,
         spill_slot_opnd(drcontext, SPILL_SLOT_2),
         opnd_create_instr(appinst)));
     PRE(bb, inst, INSTR_CREATE_jmp(drcontext,
         opnd_create_pc(shared_slowpath_region)));
     ...
     PRE(ilist, NULL, INSTR_CREATE_jmp_ind(drcontext,
         spill_slot_opnd(SPILL_SLOT_2)));
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             221
                                Inter-Instruction Storage
• Spill slots provided by DR are only guaranteed to be live
  during a single app instr
     – In practice, live until next selfmod instr
• Allocate own TLS for spill slots
     – dr_raw_tls_calloc()
• Steal registers across whole bb
     – Restore before each app read
     – Update spill slot after each app write
     – Restore on fault
 DynamoRIO Tutorial at CGO 24 April 2010
                                                              222
     Using Faults For Faster Common Case Code
• Instead of explicitly checking for rare cases, use faults to
  handle them and keep common case code path fast
• Signal and exception event and restore state extended event
  all provide pre- and post-translation contexts and containing
  fragment information
• Client can return failure for extended restore state event
     – When can support re-execution of faulting cache instr, but not
       re-start translation for relocation
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                        223
                                Address Space Iteration
• Repeated calls to dr_query_memory_ex()
• Check dr_memory_is_in_client() and
  dr_memory_is_dr_internal()
• Heap walk
     – API on Windows
• Initial structures on Windows
     – TEB, TLS, etc.
     – PEB, ProcessParameters, etc.
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          224
                          Intercepting Library Routines
• Common task
• Dr. Memory monitors malloc, calloc, realloc, free,
  malloc_usable_size, etc.
     – Alternative is to replace w/ own copies
• Locating entry point
     – Module API
• Pre-hooks are easy
• Post-hooks are hard
     – Three techniques, each with its own limitations
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          225
         Intercepting Library Routines: Technique 1
• CFG analysis at init time
     – Statically analyze code from entry point and find return
       instruction(s)
     – Post-hook placed at each return instruction
• Complications:
     – Not always easy or even possible to statically analyze
             • Hot/cold and other layout optimizations
             • Switches or other indirection
             • Mixed code/data
     – Tailcall from hooked routine A to hooked routine B will skip a
       return in A
     – Longjmp or SEH unwind can skip any post-hook
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                        226
         Intercepting Library Routines: Technique 2
• At call site identify target
     –    Direct calls/jmp: easy
     –    Indirect through PLT/IAT: easy
     –    Indirect through register/unknown memory: not always easy
     –    Post-hook is placed at post-call-site
     –    Flush post-call-site if it exists
• Complications:
     – Indirect targets that are not “statically” analyzable
     – Same call targets multiple hooked routines
     – Tailcall from hooked routine A to hooked routine B will skip post-
       call of site in A
     – Longjmp or SEH unwind can skip any post-hook
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                       227
         Intercepting Library Routines: Technique 3
• Inside callee, obtain return address
     – Flush that address if it already exists
     – Post-hooks are placed at return address
• Complications:
     – Same call targets multiple hooked routines
             • Store which inside callee
     – Tailcall from hooked routine A to hooked routine B will skip
       return address of B
             • Identify tailcall and store target B; process B and then A at post-A
     – Longjmp or SEH unwind can skip any post-hook
             • Try to intercept
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                                      228
               Modifying Library Routine Parameters
• Use cases for Dr. Memory:
     – Add redzone to heap allocations
     – Delay frees
• Simply clobber the actual parameter
     – Need to know calling convention
     – Use dr_safe_write() for robustness
 DynamoRIO Tutorial at CGO 24 April 2010
                                                      229
                             Replacing Library Routines
• Dr. Memory replaces libc routines containing optimized code
  that raises false positives
     – memcpy, strlen, strchr, etc.
• Simplification: arrange for routines to always be entered in a
  new bb
     – Do not request elision or indcall2direct from DR
• Want to interpret replaced routines
     – DR treats native execution differently: aborts on fault, etc.
• Replace entire bb with jump to replacement routine
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                       230
                     State Across Windows Callbacks
• Per-thread state varies in whether should be shared or private
  across callbacks
     – Pre-to-post syscall data must be callback-private
• Callback entry:
     – Ntdll!KiUserCallbackDispatcher
• Callback exit:
     – NtCallbackReturn system call
     – Int 0x2b
• Make these DynamoRIO Events? (Issue 241)
 DynamoRIO Tutorial at CGO 24 April 2010
                                                              231
                            Delayed Fragment Deletion
• Due to non-precise flushing we can have a flushed bb made
  inaccessible but not actually freed for some time
• When keeping state per bb, if a duplicate bb is seen, replace
  the state and increment a counter ignore_next_delete
• On a deletion event, decrement and ignore unless below 0
• Can't tell apart from duplication due to thread-private copies:
  but this mechanism handles that if saved info is deterministic
  and identical for each copy
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                232
                                           Callstack Walking
• Use case: error reporting
• Technique:
     – Start with xbp as frame pointr (fp)
     – Look for <fp,retaddr> pairs where retaddr = inside a module
• Interesting issues:
     – When scanning for frame pointer (in frameless func, or at bottom
       of stack), querying whether in a module dominates performance
     – msvcr80!malloc pushes ebx and then ebp, requiring special
       handling
     – When displaying, use retaddr-1 for symbol lookup
 DynamoRIO Tutorial at CGO 24 April 2010
                                                                     233
                    Client Files Closed By Application
• Some applications close all file descriptors
• Solution:
     – Keep table of file descriptors owned by client
     – Intercept close system call
     – Turn close into nop when called on client descriptors
 DynamoRIO Tutorial at CGO 24 April 2010
                                                               234
                                  Suspending The World
• Use case: Dr. Memory leak check
     – GC-like memory scan
• Use dr_suspend_all_other_threads() and
  dr_resume_all_other_threads()
• Cannot hold locks while suspending
 DynamoRIO Tutorial at CGO 24 April 2010
                                                         235
                                           Using Nudges
• Daemon apps do not exit
• Request results mid-run
• Cross-platform
     – Signal on Linux
     – Remote thread on Windows
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          236
                                           Tool Packaging
• DynamoRIO is redistributable, so can include a copy with your
  tool
• Front end to configure and launch app
     – On Linux use a script that execs drrun
     – On Windows use drinjectlib.dll
 DynamoRIO Tutorial at CGO 24 April 2010
                                                             237
       Feedback
1:30-1:40   Welcome + DynamoRIO History
1:40-2:40   DynamoRIO Internals
2:40-3:00   Examples, Part 1
3:00-3:15   Break
3:15-4:15   DynamoRIO API
4:15-5:15   Examples, Part 2
5:15-5:30   Feedback
                                           Contributors
• Looking for contributors to DynamoRIO
     – Work on major features
             •   Windows 7 support
             •   Auto-inline callouts
             •   Attach to running process
             •   MacOS port
     –    Add new Extensions
     –    Run nightly test suites
     –    Maintain particular platforms
     –    Etc.
 DynamoRIO Tutorial at CGO 24 April 2010
                                                          239
                                              Future Releases
• Better Linux library support (STL, etc.)
        – Custom loader
•     Auto-inline callouts
•     Persistent and process-shared caches
•     Attach to a process
•     Symbol table lookup support on Linux
•     64-bit client controlling 32-bit app
•     Your suggestion here
    DynamoRIO Tutorial at CGO 24 April 2010
                                                                240
                                           Feedback
• Questions for you
     – Feedback on what you want to see in API
• Thank you!
 DynamoRIO Tutorial at CGO 24 April 2010
                                                      241