KEMBAR78
Pycon 2012 What Python can learn from Java | PDF
What Python can learn from Java
Jonathan Ellis / @spyced

Pycon 2012
(Not a web development perspective)
The power of tools
Remote debugging
✤   -Xdebug
    -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1044

✤   11 steps: “(6) install Wing IDE on the machine on which
    you plan to run your debug program... (8) copy
    wingdbstub.py into the same directory as your source
    files and import it in your Python source ... (10) In
    wingdbstub.py on your debug host, set kWingHostPort ...
    try setting kLogFile variable in wingdbstub.py for log
    additional diagnostic information.”
JMX
Triton
GC in a nutshell
✤   Mark
✤   Sweep
✤   Compact
Aside: reference counting
✤   Only good when allocation + assignment are relatively
    infrequent operations
In short
✤   Python has a terrible GC story
✤   Poor instrumentation makes it worse
Troubleshooting OOM
Runtime profiling
✤   https://github.com/foursquare/heapaudit
✤   https://github.com/mariusaeriksen/heapster
✤   Commercial: AppDynamics, DynaTrace, others
Building blocks
✤   -javaagent
✤   ASM: http://asm.ow2.org/
heapy http://guppy-pe.sourceforge.net/
Collections
✤   Less one-size-fits-all the more you care about performance
    ✤   HashMap
    ✤   ImmutableMap
    ✤   ImmutableSortedMap
    ✤   TreeMap
    ✤   ConcurrentSkipListMap
    ✤   ConcurrentLinkedHashMap
    ✤   SnapTreeMap
    ✤   NonBlockingHashMap
✤   Not to mention BiMap, Multimap, ...
Lists?
✤   ArrayList
✤   CopyOnWriteArrayList
✤   ArrayBlockingDeque
✤   LinkedBlockingDeque
✤   PriorityQueue, PriorityBlockingQueue
✤   SynchronousQueue
✤   TransferQueue (Java7)
Where are these in Python?
✤   Python developers want to write Python, not C
✤   Similar question: “Why not Cassandra in Python?”
✤   PyPy to the rescue?
    ✤   RPython extension methods
Growing a language (1998)
✤   Guy Steele: “I should not design a small language, and I
    should not design a large one. I need to design a language
    that can grow.”
    ✤   http://www.cs.virginia.edu/~evans/cs655/readings/steele.pdf

✤   Currently, Python is not a growable language
Concurrency
✤   ExecutorService, ThreadPoolExecutor
    ✤   FutureTask
    ✤   Also: ScheduledThreadPoolExecutor
✤   ForkJoinPool
Concurrency
✤   For CPU-bound applications, copies are the enemy
    ✤   Useful but dangerous:
        ✤   List.subList, NavigableMap.subMap
✤   Corollary: you need to support threads + shared state
    ✤   Twisted
    ✤   Actor model
    ✤   Multi-process + sysv
✤   Local and remote computation: one size does not fit all
Copies are bad
✤   Copies of large things are especially bad
    ✤   Remember fragmentation?
✤   Iterators (generators) are good
    ✤   Java: ByteBuffer
    ✤   Python: memoryview
The GIL
✤   The GIL offers negligible help writing concurrent code
✤   Consider this trivial race condition:
       if d[k] == v1:
         d[k] = v2
More subtle problems
✤   Which of these is threadsafe?
    ✤   L.append(x)
    ✤   x = x + 1
    ✤   x += 1
More subtle problems
✤   Which of these is threadsafe?
    ✤   L.append(x)
    ✤   x = x + 1
    ✤   x += 1
✤   “The thread safety of python operations depends on the
    compilation of python statements into byte-codes, which
    is an implementation detail and should not be relied
    upon.”
Are we stuck with explicit locks?
✤   ConcurrentMap
    ✤   boolean replace(key, oldValue, newValue)

    ✤   NonBlockingHashMap
✤   BlockingQueue
✤   CopyOnWriteArrayList
Shared state good, Mutable state bad

✤   final: most under-appreciated language feature?
✤   guava: Immutable collections
    ✤   http://code.google.com/p/guava-libraries/
✤   Persistent collections
    ✤   (More accurately, “What Python can learn from Haskell”)
    ✤   http://code.google.com/p/pcollections/
    ✤   SnapTreeMap
Final thoughts
✤   JRuby is the most advanced Ruby implementation
✤   Jython is less popular, but I’d rather write Java than C
✤   Once you can write Python libraries that rival native ones
    for speed, things will get much more interesting

Pycon 2012 What Python can learn from Java

  • 1.
    What Python canlearn from Java Jonathan Ellis / @spyced Pycon 2012
  • 3.
    (Not a webdevelopment perspective)
  • 4.
  • 5.
    Remote debugging ✤ -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1044 ✤ 11 steps: “(6) install Wing IDE on the machine on which you plan to run your debug program... (8) copy wingdbstub.py into the same directory as your source files and import it in your Python source ... (10) In wingdbstub.py on your debug host, set kWingHostPort ... try setting kLogFile variable in wingdbstub.py for log additional diagnostic information.”
  • 6.
  • 7.
  • 9.
    GC in anutshell ✤ Mark ✤ Sweep ✤ Compact
  • 10.
    Aside: reference counting ✤ Only good when allocation + assignment are relatively infrequent operations
  • 11.
    In short ✤ Python has a terrible GC story ✤ Poor instrumentation makes it worse
  • 12.
  • 13.
    Runtime profiling ✤ https://github.com/foursquare/heapaudit ✤ https://github.com/mariusaeriksen/heapster ✤ Commercial: AppDynamics, DynaTrace, others
  • 14.
    Building blocks ✤ -javaagent ✤ ASM: http://asm.ow2.org/
  • 15.
  • 16.
    Collections ✤ Less one-size-fits-all the more you care about performance ✤ HashMap ✤ ImmutableMap ✤ ImmutableSortedMap ✤ TreeMap ✤ ConcurrentSkipListMap ✤ ConcurrentLinkedHashMap ✤ SnapTreeMap ✤ NonBlockingHashMap ✤ Not to mention BiMap, Multimap, ...
  • 17.
    Lists? ✤ ArrayList ✤ CopyOnWriteArrayList ✤ ArrayBlockingDeque ✤ LinkedBlockingDeque ✤ PriorityQueue, PriorityBlockingQueue ✤ SynchronousQueue ✤ TransferQueue (Java7)
  • 18.
    Where are thesein Python? ✤ Python developers want to write Python, not C ✤ Similar question: “Why not Cassandra in Python?” ✤ PyPy to the rescue? ✤ RPython extension methods
  • 19.
    Growing a language(1998) ✤ Guy Steele: “I should not design a small language, and I should not design a large one. I need to design a language that can grow.” ✤ http://www.cs.virginia.edu/~evans/cs655/readings/steele.pdf ✤ Currently, Python is not a growable language
  • 20.
    Concurrency ✤ ExecutorService, ThreadPoolExecutor ✤ FutureTask ✤ Also: ScheduledThreadPoolExecutor ✤ ForkJoinPool
  • 21.
    Concurrency ✤ For CPU-bound applications, copies are the enemy ✤ Useful but dangerous: ✤ List.subList, NavigableMap.subMap ✤ Corollary: you need to support threads + shared state ✤ Twisted ✤ Actor model ✤ Multi-process + sysv ✤ Local and remote computation: one size does not fit all
  • 22.
    Copies are bad ✤ Copies of large things are especially bad ✤ Remember fragmentation? ✤ Iterators (generators) are good ✤ Java: ByteBuffer ✤ Python: memoryview
  • 23.
    The GIL ✤ The GIL offers negligible help writing concurrent code ✤ Consider this trivial race condition: if d[k] == v1: d[k] = v2
  • 24.
    More subtle problems ✤ Which of these is threadsafe? ✤ L.append(x) ✤ x = x + 1 ✤ x += 1
  • 25.
    More subtle problems ✤ Which of these is threadsafe? ✤ L.append(x) ✤ x = x + 1 ✤ x += 1 ✤ “The thread safety of python operations depends on the compilation of python statements into byte-codes, which is an implementation detail and should not be relied upon.”
  • 26.
    Are we stuckwith explicit locks? ✤ ConcurrentMap ✤ boolean replace(key, oldValue, newValue) ✤ NonBlockingHashMap ✤ BlockingQueue ✤ CopyOnWriteArrayList
  • 27.
    Shared state good,Mutable state bad ✤ final: most under-appreciated language feature? ✤ guava: Immutable collections ✤ http://code.google.com/p/guava-libraries/ ✤ Persistent collections ✤ (More accurately, “What Python can learn from Haskell”) ✤ http://code.google.com/p/pcollections/ ✤ SnapTreeMap
  • 28.
    Final thoughts ✤ JRuby is the most advanced Ruby implementation ✤ Jython is less popular, but I’d rather write Java than C ✤ Once you can write Python libraries that rival native ones for speed, things will get much more interesting