KEMBAR78
Python opcodes | PDF
The Python Interpreter is Fun and
 Not At All Terrifying: Opcodes
                              name: Alex Golec
                         twitter: @alexandergolec
                             not @alexgolec : (
  email: akg2136 (rhymes with cat) columbia dot (short for education)
                   this talk lives at: blog.alexgolec.com




                                                                        1
Python is Bytecode-Interpreted


•   Your python program is compiled down to bytecode

    •   Sort of like assembly for the python virtual machine

•   The interpreter executes each of these bytecodes one by one




                                                                  2
Before we Begin


•   This presentation was written using the CPython 2.7.2 which ships with
    Mac OS X Mountain Lion GM Image

•   The more adventurous among you will find that minor will details differ
    on PyPy / IronPython / Jython




                                                                             3
The Interpreter is Responsible For:

•   Issuing commands to objects and maintaining stack state

•   Flow Control

•   Managing namespaces

•   Turning code objects into functions and classes



                                                              4
Issuing Commands to Objects and
      Maintaining Stack State


                                  5
The dis Module
                          >>> def parabola(x):
                          ...     return x*x + 4*x + 4
                          ...
                          >>> dis.dis(parabola)
                            2           0 LOAD_FAST                  0 (x)
                                        3 LOAD_FAST                  0 (x)
                                        6 BINARY_MULTIPLY
                                        7 LOAD_CONST                 1 (4)
                                       10 LOAD_FAST                  0 (x)
                                       13 BINARY_MULTIPLY
                                       14 BINARY_ADD
                                       15 LOAD_CONST                 1 (4)
                                       18 BINARY_ADD
                                       19 RETURN_VALUE


Each instruction is exactly three bytes             Opcodes have friendly (ish) mnemonics


                                                                                            6
Example: Arithmetic Operations
>>> def parabola(x):
                                          •   We don’t know the type of x!
...
...
        return x*x + 4*x + 4
                                              •   How does BINARY_MULTIPLY
>>> dis.dis(parabola)
  2           0 LOAD_FAST         0 (x)
                                                  know how to perform
              3 LOAD_FAST         0 (x)           multiplication?
              6 BINARY_MULTIPLY


                                              •
              7 LOAD_CONST        1 (4)
             10 LOAD_FAST
             13 BINARY_MULTIPLY
                                  0 (x)           What is I pass a string?
             14 BINARY_ADD
             15 LOAD_CONST
             18 BINARY_ADD
                                  1 (4)
                                          •   Note the lack of registers; the
             19 RETURN_VALUE                  Python virtual machine is stack-
                                              based

                                                                                 7
Things the Interpreter Doesn’t Do:
     Typed Method Dispatch

•   The python interpreter does not know anything about how to add
    two numbers (or objects, for that matter)

•   Instead, it simply maintains a stack of objects, and when it comes time
    to perform an operation, asks them to perform the operation

•   The result gets pushed onto the stack



                                                                              8
Flow Control



               9
Flow Control
>>> def abs(x):
...
...
        if x < 0:
                 x = -x
                                                  •       Jumps can be relative or absolute
...
...
        return x

>>> dis.dis(abs)
                                                  •       Relevant opcodes:
  2           0 LOAD_FAST
              3 LOAD_CONST
                                      0 (x)
                                      1 (0)           •    JUMP_FORWARD

                                                      •
              6 COMPARE_OP            0 (<)
              9 POP_JUMP_IF_FALSE    22                    POP_JUMP_IF_[TRUE/FALSE]

  3          12   LOAD_FAST           0 (x)           •    JUMP_IF_[TRUE/FALSE]_OR_POP
             15   UNARY_NEGATIVE
             16   STORE_FAST          0 (x)           •    JUMP_ABSOLUTE
             19   JUMP_FORWARD        0 (to 22)
                                                      •    SETUP_LOOP
  4     >>   22 LOAD_FAST             0 (x)
             25 RETURN_VALUE                          •    [BREAK/CONTINUE]_LOOP


                                                                                              10
Managing Namespaces



                      11
Simple Namespaces
>>> def example():
...     variable = 1
...     def function():
...             print 'function'
...     del variable
...     del function
...
>>> dis.dis(example)
  2           0 LOAD_CONST         1 (1)           •   Variables, functions, etc. are all
              3 STORE_FAST         0 (variable)        treated identically
  3           6 LOAD_CONST         2 (<code object b at 0x10c545930, file "<stdin>", line 3>)

                                                  •
              9 MAKE_FUNCTION      0
             12 STORE_FAST         1 (function)        Once the name is assigned to the
  5          15 DELETE_FAST        0 (variable)        object, the interpreter completely
  6          18 DELETE_FAST        1 (function)
                                                       forgets everything about it except
             21 LOAD_CONST         0 (None)            the name
             24 RETURN_VALUE

                                                                                                12
Turning Code Objects into
  Functions and Classes


                            13
Functions First!
>>> def square(inputfunc):
...     def f(x):
...              return inputfunc(x) * inputfunc(x)
...     return f
...
>>> dis.dis(square)
  2           0 LOAD_CLOSURE              0 (inputfunc)
              3 BUILD_TUPLE               1
              6 LOAD_CONST                1 (<code object f at 0x10c545a30, file "<stdin>", line 2>)
              9 MAKE_CLOSURE              0

                                                         •
             12 STORE_FAST                1 (f)
                                                             The compiler generates code
  4          15 LOAD_FAST                1 (f)
             18 RETURN_VALUE                                 objects and sticks them in
                                                             memory


                                                                                                       14
Now Classes!
>>> def make_point(dimension, names):
...     class Point:
...             def __init__(self, *data):
...                     pass
...             dimension = dimensions
...     return Point
...
>>> dis.dis(make_point)
  2           0 LOAD_CONST               1   ('Point')
              3 LOAD_CONST               3   (())
              6 LOAD_CONST               2   (<code object Point at 0x10c545c30, file "<stdin>", line 2>)
              9 MAKE_FUNCTION            0
             12 CALL_FUNCTION            0
             15 BUILD_CLASS                             BUILD_CLASS()
             16 STORE_FAST               2   (Point)

  6          19 LOAD_FAST                2 (Point)      Creates a new class object. TOS is the methods
             22 RETURN_VALUE                            dictionary, TOS1 the tuple of the names of the base
                                                        classes, and TOS2 the class name.



                                                                                                              15
Other Things


•   Exceptions

•   Loops

    •   Technically flow control, but they’re a little more involved




                                                                      16
Now, We Have Some Fun



                        17
What to Do With Our Newly
Acquired Knowledge of Dark
          Magic?


                             18
Write your own Python
     interpreter!


                        19
Static Code Analysis!



                        20
Understand How PyPy Does It!



                               21
Buy Me Alcohol!
Or at least provide me with pleasant conversation




                                                    22
Slideshare-only Bonus Slide:
    Exception Handling!


                               23
>>> def list_get(lst, pos):                                    •   The exception context is pushed by
...     try:                                                       SETUP_EXCEPT
...
...
                return lst[pos]
        except IndexError:                                         •   If an exception is thrown, control jumps to the
...             return None
                                                                       address of the top exception context, in this case
...     # there is an invisible “return None” here                     opcode 15
>>> dis.dis(list_get)
  2           0 SETUP_EXCEPT            12 (to 15)                 •   If there is no top exception context, the
                                                                       interpreter halts and notifies you of the error
  3           3
              6
                  LOAD_FAST
                  LOAD_FAST
                                         0 (lst)
                                         1 (pos)
                                                               •   The yellow opcodes check if the exception thrown
                                                                   matches the type of the one in the except
              9   BINARY_SUBSCR
             10   RETURN_VALUE                                     statement, and execute the except block
             11
             12
                  POP_BLOCK
                  JUMP_FORWARD          18 (to 33)             •   At END_FINALLY, the interpreter is responsible for
                                                                   popping the exception context, and either re-raising
  4     >>   15   DUP_TOP                                          the exception, in which case the next-topmost
             16   LOAD_GLOBAL            0 (IndexError)            exception context will trigger, or returning from the
             19   COMPARE_OP            10 (exception match)       function
             22   POP_JUMP_IF_FALSE     32
             25   POP_TOP                                      •   Notice that the red opcodes will never be executed
             26
             27
                  POP_TOP
                  POP_TOP                                          •   The first: between a return and a jump target
                                                                   •   The second: only reachable by jumping from dead
  5          28   LOAD_CONST             0 (None)                      code.
             31   RETURN_VALUE
        >>   32   END_FINALLY                                      •   CPython’s philosophy of architectural and
        >>   33   LOAD_CONST             0 (None)                      implementation simplicity tolerates such minor
             36   RETURN_VALUE                                         inefficiencies

                                                                                                                            24
Thanks!



          25

Python opcodes

  • 1.
    The Python Interpreteris Fun and Not At All Terrifying: Opcodes name: Alex Golec twitter: @alexandergolec not @alexgolec : ( email: akg2136 (rhymes with cat) columbia dot (short for education) this talk lives at: blog.alexgolec.com 1
  • 2.
    Python is Bytecode-Interpreted • Your python program is compiled down to bytecode • Sort of like assembly for the python virtual machine • The interpreter executes each of these bytecodes one by one 2
  • 3.
    Before we Begin • This presentation was written using the CPython 2.7.2 which ships with Mac OS X Mountain Lion GM Image • The more adventurous among you will find that minor will details differ on PyPy / IronPython / Jython 3
  • 4.
    The Interpreter isResponsible For: • Issuing commands to objects and maintaining stack state • Flow Control • Managing namespaces • Turning code objects into functions and classes 4
  • 5.
    Issuing Commands toObjects and Maintaining Stack State 5
  • 6.
    The dis Module >>> def parabola(x): ... return x*x + 4*x + 4 ... >>> dis.dis(parabola) 2 0 LOAD_FAST 0 (x) 3 LOAD_FAST 0 (x) 6 BINARY_MULTIPLY 7 LOAD_CONST 1 (4) 10 LOAD_FAST 0 (x) 13 BINARY_MULTIPLY 14 BINARY_ADD 15 LOAD_CONST 1 (4) 18 BINARY_ADD 19 RETURN_VALUE Each instruction is exactly three bytes Opcodes have friendly (ish) mnemonics 6
  • 7.
    Example: Arithmetic Operations >>>def parabola(x): • We don’t know the type of x! ... ... return x*x + 4*x + 4 • How does BINARY_MULTIPLY >>> dis.dis(parabola) 2 0 LOAD_FAST 0 (x) know how to perform 3 LOAD_FAST 0 (x) multiplication? 6 BINARY_MULTIPLY • 7 LOAD_CONST 1 (4) 10 LOAD_FAST 13 BINARY_MULTIPLY 0 (x) What is I pass a string? 14 BINARY_ADD 15 LOAD_CONST 18 BINARY_ADD 1 (4) • Note the lack of registers; the 19 RETURN_VALUE Python virtual machine is stack- based 7
  • 8.
    Things the InterpreterDoesn’t Do: Typed Method Dispatch • The python interpreter does not know anything about how to add two numbers (or objects, for that matter) • Instead, it simply maintains a stack of objects, and when it comes time to perform an operation, asks them to perform the operation • The result gets pushed onto the stack 8
  • 9.
  • 10.
    Flow Control >>> defabs(x): ... ... if x < 0: x = -x • Jumps can be relative or absolute ... ... return x >>> dis.dis(abs) • Relevant opcodes: 2 0 LOAD_FAST 3 LOAD_CONST 0 (x) 1 (0) • JUMP_FORWARD • 6 COMPARE_OP 0 (<) 9 POP_JUMP_IF_FALSE 22 POP_JUMP_IF_[TRUE/FALSE] 3 12 LOAD_FAST 0 (x) • JUMP_IF_[TRUE/FALSE]_OR_POP 15 UNARY_NEGATIVE 16 STORE_FAST 0 (x) • JUMP_ABSOLUTE 19 JUMP_FORWARD 0 (to 22) • SETUP_LOOP 4 >> 22 LOAD_FAST 0 (x) 25 RETURN_VALUE • [BREAK/CONTINUE]_LOOP 10
  • 11.
  • 12.
    Simple Namespaces >>> defexample(): ... variable = 1 ... def function(): ... print 'function' ... del variable ... del function ... >>> dis.dis(example) 2 0 LOAD_CONST 1 (1) • Variables, functions, etc. are all 3 STORE_FAST 0 (variable) treated identically 3 6 LOAD_CONST 2 (<code object b at 0x10c545930, file "<stdin>", line 3>) • 9 MAKE_FUNCTION 0 12 STORE_FAST 1 (function) Once the name is assigned to the 5 15 DELETE_FAST 0 (variable) object, the interpreter completely 6 18 DELETE_FAST 1 (function) forgets everything about it except 21 LOAD_CONST 0 (None) the name 24 RETURN_VALUE 12
  • 13.
    Turning Code Objectsinto Functions and Classes 13
  • 14.
    Functions First! >>> defsquare(inputfunc): ... def f(x): ... return inputfunc(x) * inputfunc(x) ... return f ... >>> dis.dis(square) 2 0 LOAD_CLOSURE 0 (inputfunc) 3 BUILD_TUPLE 1 6 LOAD_CONST 1 (<code object f at 0x10c545a30, file "<stdin>", line 2>) 9 MAKE_CLOSURE 0 • 12 STORE_FAST 1 (f) The compiler generates code 4 15 LOAD_FAST 1 (f) 18 RETURN_VALUE objects and sticks them in memory 14
  • 15.
    Now Classes! >>> defmake_point(dimension, names): ... class Point: ... def __init__(self, *data): ... pass ... dimension = dimensions ... return Point ... >>> dis.dis(make_point) 2 0 LOAD_CONST 1 ('Point') 3 LOAD_CONST 3 (()) 6 LOAD_CONST 2 (<code object Point at 0x10c545c30, file "<stdin>", line 2>) 9 MAKE_FUNCTION 0 12 CALL_FUNCTION 0 15 BUILD_CLASS BUILD_CLASS() 16 STORE_FAST 2 (Point) 6 19 LOAD_FAST 2 (Point) Creates a new class object. TOS is the methods 22 RETURN_VALUE dictionary, TOS1 the tuple of the names of the base classes, and TOS2 the class name. 15
  • 16.
    Other Things • Exceptions • Loops • Technically flow control, but they’re a little more involved 16
  • 17.
    Now, We HaveSome Fun 17
  • 18.
    What to DoWith Our Newly Acquired Knowledge of Dark Magic? 18
  • 19.
    Write your ownPython interpreter! 19
  • 20.
  • 21.
  • 22.
    Buy Me Alcohol! Orat least provide me with pleasant conversation 22
  • 23.
    Slideshare-only Bonus Slide: Exception Handling! 23
  • 24.
    >>> def list_get(lst,pos): • The exception context is pushed by ... try: SETUP_EXCEPT ... ... return lst[pos] except IndexError: • If an exception is thrown, control jumps to the ... return None address of the top exception context, in this case ... # there is an invisible “return None” here opcode 15 >>> dis.dis(list_get) 2 0 SETUP_EXCEPT 12 (to 15) • If there is no top exception context, the interpreter halts and notifies you of the error 3 3 6 LOAD_FAST LOAD_FAST 0 (lst) 1 (pos) • The yellow opcodes check if the exception thrown matches the type of the one in the except 9 BINARY_SUBSCR 10 RETURN_VALUE statement, and execute the except block 11 12 POP_BLOCK JUMP_FORWARD 18 (to 33) • At END_FINALLY, the interpreter is responsible for popping the exception context, and either re-raising 4 >> 15 DUP_TOP the exception, in which case the next-topmost 16 LOAD_GLOBAL 0 (IndexError) exception context will trigger, or returning from the 19 COMPARE_OP 10 (exception match) function 22 POP_JUMP_IF_FALSE 32 25 POP_TOP • Notice that the red opcodes will never be executed 26 27 POP_TOP POP_TOP • The first: between a return and a jump target • The second: only reachable by jumping from dead 5 28 LOAD_CONST 0 (None) code. 31 RETURN_VALUE >> 32 END_FINALLY • CPython’s philosophy of architectural and >> 33 LOAD_CONST 0 (None) implementation simplicity tolerates such minor 36 RETURN_VALUE inefficiencies 24
  • 25.