KEMBAR78
Clojure - An Introduction for Lisp Programmers | PDF
Clojure
A Dynamic Programming Language for the JVM

An Introduction for Lisp Programmers
                 Rich Hickey
Clojure Objectives
    • A Lisp
    • Functional
     • emphasis on immutability
    • Supporting Concurrency
     • language-level coordination of state
    • Designed for the JVM
     • exposes and embraces platform
2
    • Open Source
Why Lisp?
    • Dynamic
    • Small core
     • Clojure is a solo effort
    • Elegant syntax
    • Core advantage still code-as-data and
      syntactic abstraction
    • Saw opportunities to reduce parens-
      overload
3
What about Common Lisp
          and Scheme?
    • Why yet another Lisp?
    • Limits to change post standardization
    • Core data structures mutable, not extensible
    • No concurrency in specs
    • Good implementations already exist for JVM
      (ABCL, Kawa, SISC et al)
    • Standard Lisps are their own platforms
4
Why the JVM?
    •   VMs, not OSes, are the target platforms of future
        languages, providing:
        •   Type system
            •   Dynamic enforcement and safety
        •   Libraries
            •   Huge set of facilities
        •   Memory and other resource management
            •   GC is platform, not language, facility
        •   Bytecode + JIT compilation
5
Language as platform vs.
         Language + platform
    • Old way - each language defines its own
      runtime
     • GC, bytecode, type system, libraries etc
    • New way (JVM, .Net)
     • Common runtime independent of language
    • Platforms are dictated by clients
     • Huge investments in performance,
        scalability, security, libraries etc.
6
Java/JVM is language + platform
    • Not the original story, but other languages for
        JVM always existed, now embraced by Sun
    •    JVM has established track record and trust level
        • Now open source
    • Interop with other code always required
     • C linkage insufficient these days
     • Ability to call/consume Java is critical
    • Clojure is the language, JVM the platform
7
Why Functional?
    • Easier to reason about
    • Easier to test
    • Essential for concurrency
    • Few dynamic functional languages
     • Most focus on static type systems
    • Functional by convention is not good
      enough

8
Why Focus on Concurrency?
    • Multi-core is here to stay
    • Multithreading a real challenge in Java et al
     • Locking is too hard to get right
    • FP/Immutability helps
     • Share freely between threads
    • But ‘changing’ state a reality for simulations and
      working models
    • Automatic/enforced language support needed
9
Why not OO?
     • Encourages mutable State
      • Mutable stateful objects are the new
         spaghetti code
      • Encapsulation != concurrency semantics
     • Common Lisp’s generic functions proved
       utility of methods outside of classes
     • Polymorphism shouldn’t be based (only) on
       types

10
     • Many more...
What makes Clojure a
     Different Lisp, and Why?
     • More first-class data structures
     • Defined in terms of abstractions
     • Functional
     • Host-embracing
     • Thread aware
     • Not constrained by backwards-
       compatibility
11
Agenda
     • Clojure is a Lisp
     • First-class data structures
     • Abstraction
     • Immutability
     • Clojure’s approach to various Lisp features
     • Concurrency
     • JVM
12
Clojure is a Lisp
     • Dynamic
     • Code as data
     • Reader
     • Small core
     • REPL
     • Sequences
     • Syntactic abstraction
13
Lisp Details
     • Lexically-scoped, Lisp-1
     • CL-style macros and dynamic vars
     • Case-sensitive, Namespaces
     • Dynamically compiled to JVM bytecode
     • No tail call optimization
     • Many names are different
      • fn if def let loop recur do new .
        throw try set! quote var
14
Atomic Data Types
     • Arbitrary precision integers -12345678987654


     • Doubles , BigDecimals
                1.234              1.234M


     • Ratios -22/7


     • Strings -“fred”, Characters -a b c


     • Symbols - fred ethel , Keywords -   :fred :ethel


     • Booleans - true false  , Null -
                                    nil


     • Regex patterns    #“a*b”


15
First-class data structures
      • Lists inadequate for most programs
      • Lisps have others, but second class
       • No read/print support
       • Not structurally recursive
         • cons is non-destructive but vector-push
            is not
      • Lack of first-class associative data
        structures a disadvantage vs. all new
        dynamic languages
16
Clojure Data Structures
     • Lists
     • Vectors
     • Maps - hashed and sorted
     • Sets - hashed and sorted
     • All with read/print support
     • All structurally recursive
      • Uniform ‘add’ - conj
17
Data Structures
     • Lists - singly linked, grow at front
       •   (1 2 3 4 5), (fred ethel lucy), (list 1 2 3)


     • Vectors - indexed access, grow at end
       •   [1 2 3 4 5], [fred ethel lucy]


     • Maps - key/value associations
       •   {:a 1, :b 2, :c 3}, {1 “ethel” 2 “fred”}


     • Sets  #{fred ethel lucy}


     • Everything Nests
18
Data Structure Literals
     (let [avec [1 2 3 4]
           amap {:fred "ethel"}
           alist (list 2 3 4 5)]
       [(conj avec 5)
        (conj amap [:ricky "lucy"])
        (conj alist 1)

        ;the originals are intact
        avec amap alist])

     [[1 2 3 4 5] {:ricky "lucy", :fred "ethel"} (1 2 3 4 5)
      [1 2 3 4] {:fred "ethel"} (2 3 4 5)]




19
Abstractions
     • Standard Lisps define too many fundamental
       things in terms of concrete types
     • Limits generality
     • Limits extensibility
     • Marries implementation details
     • No excuse today
      • Many efficient abstraction mechanisms
20
Abstracting away the
         Cons cell - Seqs
     • Cons cell details - 2 slot data structure, car/
       cdr, arbitrary contents, mutable
     • Seq abstraction interface: first and rest
     • Vast majority of functions defined for cons
       cells can instead be defined in terms of
       first/rest

     • An object implementing first/rest can be
       created for vast majority of data structures
21
       using (seq x)
Benefits of Abstraction
     • "It is better to have 100 functions operate
         on one data structure than to have 10
         functions operate on 10 data structures." -
         Alan J. Perlis
     • Better still -100 functions per abstraction
     •   seq implemented for all Clojure
         collections, all Java collections, Strings,
         regex matches, files etc.
     •   first/rest   can be lazily computed
22
Clojure’s Abstractions
     • Sequences, Associative Maps, Indexed Vectors,
       Sets
     • Functions (call-ability)
      • Maps/vectors/sets are functions
     • Many implementations
      • Extensible from Java and Clojure
     • Large library
     • Functions + implementations -> F*I
23
Lazy Seqs
     • first and rest not produced until requested
     • Define your own lazy seq-producing
       functions using the lazy-cons macro
     • Seqs can be used like iterators or
       generators of other languages
     • Lazy and concrete seqs interoperate - no
       separate streams library
      ;the library function take
      (defn take [n coll]
        (when (and (pos? n) (seq coll))
          (lazy-cons (first coll)
                     (take (dec n) (rest coll)))))
24
Clojure is (Primarily)
           Functional
     • Core data structures immutable
     • Core library functions have no side effects
     • let-bound locals are immutable
     • loop/recur functional looping construct

25
Sequences
     (drop 2 [1 2 3 4 5]) -> (3 4 5)

     (take 9 (cycle [1 2 3 4]))
     -> (1 2 3 4 1 2 3 4 1)

     (interleave [:a :b :c :d :e] [1 2 3 4 5])
     -> (:a 1 :b 2 :c 3 :d 4 :e 5)

     (partition 3 [1 2 3 4 5 6 7 8 9])
     -> ((1 2 3) (4 5 6) (7 8 9))

     (map vector [:a :b :c :d :e] [1 2 3 4 5])
     -> ([:a 1] [:b 2] [:c 3] [:d 4] [:e 5])

     (apply str (interpose , "asdf"))
     -> "a,s,d,f"

     (reduce + (range 100)) -> 4950
26
Maps and Sets
     (def m {:a 1 :b 2 :c 3})

     (m :b) -> 2 ;also (:b m)

     (keys m) -> (:a :b :c)

     (assoc m :d 4 :c 42) -> {:d 4, :a 1, :b 2, :c 42}

     (merge-with + m {:a 2 :b 3}) -> {:a 3, :b 5, :c 3}

     (union #{:a :b :c} #{:c :d :e}) -> #{:d :a :b :c :e}

     (join #{{:a 1 :b 2 :c 3} {:a 1 :b 21 :c 42}}
           #{{:a 1 :b 2 :e 5} {:a 1 :b 21 :d 4}})

     -> #{{:d 4, :a 1, :b 21, :c 42}
          {:a 1, :b 2, :c 3, :e 5}}
27
# Norvig’s Spelling Corrector in Python
# http://norvig.com/spell-correct.html

def words(text): return re.findall('[a-z]+', text.lower())

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model

NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
    n = len(word)
    return set([word[0:i]+word[i+1:] for i in range(n)] +
               [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] +
               [word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet] +
               [word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet])

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=lambda w: NWORDS[w])
; Norvig’s Spelling Corrector in Clojure
     ; http://en.wikibooks.org/wiki/Clojure_Programming#Examples

     (defn words [text] (re-seq #"[a-z]+" (.toLowerCase text)))

     (defn train [features]
       (reduce (fn [model f] (assoc model f (inc (get model f 1))))
               {} features))

     (def *nwords* (train (words (slurp "big.txt"))))

     (defn edits1 [word]
       (let [alphabet "abcdefghijklmnopqrstuvwxyz", n (count word)]
         (distinct (concat
           (for [i (range n)] (str (subs word 0 i) (subs word (inc i))))
           (for [i (range (dec n))]
             (str (subs word 0 i) (nth word (inc i)) (nth word i) (subs word (+ 2 i))))
           (for [i (range n) c alphabet] (str (subs word 0 i) c (subs word (inc i))))
           (for [i (range (inc n)) c alphabet] (str (subs word 0 i) c (subs word i)))))))

     (defn known [words nwords] (for [w words :when (nwords w)]    w))

     (defn known-edits2 [word nwords]
       (for [e1 (edits1 word) e2 (edits1 e1) :when (nwords e2)]    e2))

     (defn correct [word nwords]
       (let [candidates (or (known [word] nwords) (known (edits1 word) nwords)
                            (known-edits2 word nwords) [word])]
         (apply max-key #(get nwords % 1) candidates)))
29
Persistent Data Structures
     • Immutable, + old version of the collection is
       still available after 'changes'
     • Collection maintains its performance
       guarantees for most operations
       • Therefore new versions are not full copies
     • All Clojure data structures persistent
       • Allows for Lisp-y recursive use of vectors
         and maps
       • Use real maps for things like environments,
         rather than a-lists
30
Recursive Loops
     •   No mutable locals in Clojure
     •   No tail recursion optimization in the JVM
     •   recur op does constant-space recursive looping
     •   Rebinds and jumps to nearest loop or function frame
         (defn zipm [keys vals]
           (loop [map {}
                  [k & ks :as keys] keys
                  [v & vs :as vals] vals]
             (if (and keys vals)
               (recur (assoc map k v) ks vs)
               map)))

         (zipm [:a :b :c] [1 2 3]) -> {:a 1, :b 2, :c 3}

         (apply hash-map (interleave [:a :b :c] [1 2 3]))
         (into {} (map vector [:a :b :c] [1 2 3]))
31
Equality
     • = is value equality
     • identical? is reference equality
      • rarely used in Clojure
     • = as per Henry Baker’s egal
      • Immutable parts compare by value
      • Mutable references by identity
32
nil/false/eos/‘()
                      Clojure            CL        Scheme        Java
         nil             nil            nil/‘()        -          null
      true/false     true/false           -          #t/#f     true/false
                    nil or false/
     Conditional    everything       nil/non-nil   #f/non-#f   true/false
                        else
      singleton
                         No               ‘()         ‘()         No
     empty list?
     end-of-seq          nil             nil          ‘()       FALSE
     Host null/
                    nil/true/false      N/A          N/A         N/A
      true/false
     Library uses
       concrete          No          cons/vector     pair         No
         types
33
Functions and Arity
              Overloading
     • Multiple distinct bodies in single function object
      • Supports fast 0/1/2...N dispatch with no
         conditional
     • Variable arity with &
       (defn drop-last
         "Return a lazy seq of all but the last n
         (default 1) items in coll"
         ([s] (drop-last 1 s))
         ([n s] (map (fn [x _] x) (seq s) (drop n s))))
34
Vars
     • Similar to CL’s special vars
      • dynamic scope, stack discipline
     • Shared root binding established by def
      • root can be unbound
     • Can be set! but only if first bound using binding
       (not let)
         • Thread-local semantics
     • Functions stored in vars, so they too can be
       dynamically rebound

35     • context/aspect-like idioms
Lisp-1 and defmacro
     • Symbols are simple values
      • Not identities/vars - no value cell
      • But still have constant-time =
     • Reader has no side effects
      • reads plain symbols
     • Symbols can have namespace part
      • my-namespace/foo
36
      • ` resolves and qualifies symbols
Syntax-quote
     'foo -> foo
     `foo -> user/foo
     (= 'foo `foo) -> false

     foo -> Exception: Unable to resolve symbol: foo

     (def foo 5) -> #'user/foo
     foo -> 5

     (= foo user/foo) -> true

     (resolve 'foo) -> #'user/foo
     (resolve 'list) -> #'clojure/list

     (defmacro dummy [x] `(list foo ~x))

     (macroexpand '(dummy foo))
     -> (clojure/list user/foo foo)
37
Pervasive Destructuring
     • Abstract structural binding
     • In let/loop binding lists, fn parameter lists, and any
       macro that expands into a let or fn
     • Vector binding forms destructure sequential things
      • vectors, lists, seqs, strings, arrays, and anything
          that supports nth
     • Map binding forms destructure associative things
      • maps, vectors, strings and arrays (the latter
          three have integer keys)
38
Example: Destructuring
     (let [[a b c & d :as e] [1 2 3 4 5 6 7]]
       [a b c d e])
     -> [1 2 3 (4 5 6 7) [1 2 3 4 5 6 7]]

     (let [[[x1 y1][x2 y2]] [[1 2] [3 4]]]
       [x1 y1 x2 y2])
     -> [1 2 3 4]

     (let [{a :a, b :b, c :c, :as m :or {a 2 b 3}} {:a 5 :c 6}]
       [a b c m])
     -> [5 3 6 {:c 6, :a 5}]

     (let [{i :i, j :j, [r s & t :as v] :ivec}
           {:j 15 :k 16 :ivec [22 23 24 25]}]
       [i j r s t v])
     -> [nil 15 22 23 (24 25) [22 23 24 25]]
39
Polymorphism via Multimethods
      • Full generalization of indirect dispatch
        • Not tied to OO or types
      • Fixed dispatch function which is an arbitrary
          function of the arguments
      •   Open set of methods associated with different
          values of the dispatch function
      •   Call sequence:
          •   Call dispatch function on args to get dispatch
              value
          •   Find method associated with dispatch value
40            •   else call default method if present, else error
Example: Multimethods
     (defmulti encounter (fn [x y] [(:Species x) (:Species y)]))

     (defmethod     encounter      [:Bunny :Lion] [b l] :run-away)
     (defmethod     encounter      [:Lion :Bunny] [l b] :eat)
     (defmethod     encounter      [:Lion :Lion] [l1 l2] :fight)
     (defmethod     encounter      [:Bunny :Bunny] [b1 b2] :mate)

     (def   b1   {:Species    :Bunny :other :stuff})
     (def   b2   {:Species    :Bunny :other :stuff})
     (def   l1   {:Species    :Lion :other :stuff})
     (def   l2   {:Species    :Lion :other :stuff})

     (encounter    b1   b2)   ->   :mate
     (encounter    b1   l1)   ->   :run-away
     (encounter    l1   b1)   ->   :eat
     (encounter    l1   l2)   ->   :fight
41
Metadata
     • Orthogonal to the logical value of the data
     • Symbols and collections support a
       metadata map
     • Does not impact equality semantics, nor
       seen in operations on the value
     • Support for literal metadata in reader
     (def v [1 2 3])
     (def trusted-v (with-meta v {:source :trusted}))

     (:source ^trusted-v) -> :trusted
     (:source ^v) -> nil

     (= v trusted-v) -> true
42
Concurrency
     • Interleaved/simultaneous execution
     • Must avoid seeing/yielding inconsistent data
     • The more components there are to the data,
       the more difficult to keep consistent
     • The more steps in a logical change, the more
       difficult to keep consistent
     • Opportunities for automatic parallelism
      • Emphasis here on coordination
43
State - You’re doing it wrong
     • Mutable objects are the new spaghetti code
      • Hard to understand, test, reason about
      • Concurrency disaster
      • Terrible as a default architecture
        • (Java/C#/Python/Ruby/Groovy/CLOS...)
     • Doing the right thing is very difficult
      • Languages matter!
44
Concurrency Mechanisms
     •   Conventional way:
         •   Direct references to mutable objects
         •   Lock and worry (manual/convention)
     •   Clojure way:
         •   Indirect references to immutable persistent
             data structures (inspired by SML’s ref)

         •   Concurrency semantics for references
             •   Automatic/enforced

45
             •   No locks in user code!
Typical OO - Direct
     references to Mutable Objects
                                    foo

                               :a          ?
                               :b          ?
                               :c         42
                               :d          ?
                               :e          6




      •   Unifies identity and value
      •   Anything can change at any time
      •   Consistency is a user problem

46
Clojure - Indirect references
        to Immutable Objects
                   foo                      :a    "fred"
                                            :b   "ethel"
                            @foo            :c      42
                                            :d      17
                                            :e       6




     • Separates identity and value
      • Obtaining value requires explicit
        dereference
     • Values can never change
      • Never an inconsistent value
47
Persistent ‘Edit’
                      foo               :a          "fred"
                                        :b         "ethel"
                               @foo     :c            42
                                        :d            17
                                        :e             6


                                             Structural sharing
                                        :a          "lucy"
                                        :b         "ethel"


     •
                                        :c            42
         New value is function of old   :d            17


     •
                                        :e             6
         Shares immutable structure
     •   Doesn’t impede readers
     •   Not impeded by readers
48
Atomic Update
                   foo                   :a          "fred"
                                         :b         "ethel"
                                         :c            42
                                         :d            17
                                         :e             6

                            @foo
                                              Structural sharing
                                         :a          "lucy"
                                         :b         "ethel"


     •
                                         :c            42
       Always coordinated                :d            17


      • Multiple semantics
                                         :e             6



     • Next dereference sees new value
     • Consumers of values unaffected
49
Clojure References
     • The only things that mutate are references
       themselves, in a controlled way
     • 3 types of mutable references, with different
       semantics:
       • Refs - Share synchronous coordinated
         changes between threads
       • Agents - Share asynchronous autonomous
         changes between threads
       • Vars - Isolate changes within threads
50
Refs and Transactions
     • Software transactional memory system (STM)
     • Refs can only be changed within a transaction
     • All changes are Atomic, Consistent and Isolated
      • Every change to Refs made within a
         transaction occurs or none do
       • No transaction sees the effects of any other
         transaction while it is running
     • Transactions are speculative
      • Will be retried automatically if conflict
51
      • User must avoid side-effects!
The Clojure STM
     •   Surround code with (dosync ...)

     •   Uses Multiversion Concurrency Control (MVCC)

     •   All reads of Refs will see a consistent snapshot of
         the 'Ref world' as of the starting point of the
         transaction, + any changes it has made.

     •   All changes made to Refs during a transaction
         will appear to occur at a single point in the
         timeline.

     •   Readers never impede writers/readers, writers
         never impede readers, supports commute
52
Refs in action
     (def foo (ref {:a "fred" :b "ethel" :c 42 :d 17 :e 6}))

     @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6}

     (assoc @foo :a "lucy")
     -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6}

     @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6}

     (commute foo assoc :a "lucy")
     -> IllegalStateException: No transaction running

     (dosync (commute foo assoc :a "lucy"))
     @foo -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6}


53
Agents
     • Manage independent state
     • State changes through actions, which are
       ordinary functions (state=>new-state)
     • Actions are dispatched using send or send-off,
       which return immediately
     • Actions occur asynchronously on thread-pool
       threads
     • Only one action per agent happens at a time
54
Agents
     • Agent state always accessible, via deref/@, but
       may not reflect all actions
     • Can coordinate with actions using await
     • Any dispatches made during an action are held
       until after the state of the agent has changed
     • Agents coordinate with transactions - any
       dispatches made during a transaction are held
       until it commits
     • Agents are not Actors (Erlang/Scala)
55
Agents in Action
     (def foo (agent {:a "fred" :b "ethel" :c 42 :d 17 :e 6}))

     @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6}

     (send foo assoc :a "lucy")

     @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6}

     (await foo)

     @foo -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6}




56
Clojure is Hosted
     • JVM, not OS, is target platform
     • Java, not C, is interface language
     • Shares GC, memory model, stack, type
       system, exception handling with Java/JVM
       • JVM is the runtime
       • Quite excellent JIT, GC, threading, security
     • Integrated, wrapper-free, ad-hoc Java interop
     • Libraries available for everything
57
Java Integration
     • Clojure strings are Java Strings, numbers are
       Numbers, collections implement Collection,
       fns implement Callable and Runnable etc.
     • Core abstractions, like seq, are Java interfaces
     • Clojure seq library works on Java Iterables,
       Strings and arrays.
     • Implement and extend Java interfaces and
       classes
     • New primitive arithmetic support equals
       Java’s speed.
58
Java Interop
     Math/PI
     -> 3.141592653589793

     (.. System getProperties (get "java.version"))
     -> "1.5.0_13"

     (new java.util.Date)
     -> Thu Jun 05 12:37:32 EDT 2008

     (doto (JFrame.) (add (JLabel. "Hello World")) pack show)

     (into {} (filter #(re-find #"java" (key %))
                      (System/getProperties)))

     -> {"java.specification.version" "1.5",...}

59
Swing Example
     (import '(javax.swing JFrame JLabel JTextField JButton)
             '(java.awt.event ActionListener) '(java.awt GridLayout))

     (defn celsius []
       (let [frame (JFrame. "Celsius Converter")
             temp-text (JTextField.)
             celsius-label (JLabel. "Celsius")
             convert-button (JButton. "Convert")
             fahrenheit-label (JLabel. "Fahrenheit")]
         (.addActionListener convert-button
            (proxy [ActionListener] []
              (actionPerformed [evt]
                 (let [c (. Double parseDouble (.getText temp-text))]
                   (.setText fahrenheit-label
                      (str (+ 32 (* 1.8 c)) " Fahrenheit"))))))
         (doto frame
           (setLayout (GridLayout. 2 2 3 3))
           (add temp-text) (add celsius-label)
           (add convert-button) (add fahrenheit-label)
           (setSize 300 80) (setVisible true))))


60   (celsius)
Experiences on the JVM
     • Main complaint is no tail call optimization
     • HotSpot covers the last mile of compilation
      • Runtime optimizing compilation
      • Clojure can get ~1 gFlop without even
         generating JVM arithmetic primitives
     • Ephemeral garbage is extremely cheap
     • Great performance, many facilities
      • Verifier, security, dynamic code loading
61
Benefits of the JVM
     • Focus on language vs code generation or
       mundane libraries
     • Sharing GC and type system with
       implementation/FFI language is huge benefit
     • Tools - e.g. breakpoint/step debugging, profilers
       etc.
     • Libraries! Users can do UI, database, web, XML,
       graphics, etc right away
     • Great MT infrastructure - java.util.concurrent
62
There’s much more...
     • List (Seq) comprehensions
     • Ad hoc hierarchies
     • Type hints
     • Relational set algebra
     • Parallel computation
     • Namespaces
     • Zippers
63
     • XML ...
Thanks for listening!



      http://clojure.org

       Questions?

Clojure - An Introduction for Lisp Programmers

  • 1.
    Clojure A Dynamic ProgrammingLanguage for the JVM An Introduction for Lisp Programmers Rich Hickey
  • 2.
    Clojure Objectives • A Lisp • Functional • emphasis on immutability • Supporting Concurrency • language-level coordination of state • Designed for the JVM • exposes and embraces platform 2 • Open Source
  • 3.
    Why Lisp? • Dynamic • Small core • Clojure is a solo effort • Elegant syntax • Core advantage still code-as-data and syntactic abstraction • Saw opportunities to reduce parens- overload 3
  • 4.
    What about CommonLisp and Scheme? • Why yet another Lisp? • Limits to change post standardization • Core data structures mutable, not extensible • No concurrency in specs • Good implementations already exist for JVM (ABCL, Kawa, SISC et al) • Standard Lisps are their own platforms 4
  • 5.
    Why the JVM? • VMs, not OSes, are the target platforms of future languages, providing: • Type system • Dynamic enforcement and safety • Libraries • Huge set of facilities • Memory and other resource management • GC is platform, not language, facility • Bytecode + JIT compilation 5
  • 6.
    Language as platformvs. Language + platform • Old way - each language defines its own runtime • GC, bytecode, type system, libraries etc • New way (JVM, .Net) • Common runtime independent of language • Platforms are dictated by clients • Huge investments in performance, scalability, security, libraries etc. 6
  • 7.
    Java/JVM is language+ platform • Not the original story, but other languages for JVM always existed, now embraced by Sun • JVM has established track record and trust level • Now open source • Interop with other code always required • C linkage insufficient these days • Ability to call/consume Java is critical • Clojure is the language, JVM the platform 7
  • 8.
    Why Functional? • Easier to reason about • Easier to test • Essential for concurrency • Few dynamic functional languages • Most focus on static type systems • Functional by convention is not good enough 8
  • 9.
    Why Focus onConcurrency? • Multi-core is here to stay • Multithreading a real challenge in Java et al • Locking is too hard to get right • FP/Immutability helps • Share freely between threads • But ‘changing’ state a reality for simulations and working models • Automatic/enforced language support needed 9
  • 10.
    Why not OO? • Encourages mutable State • Mutable stateful objects are the new spaghetti code • Encapsulation != concurrency semantics • Common Lisp’s generic functions proved utility of methods outside of classes • Polymorphism shouldn’t be based (only) on types 10 • Many more...
  • 11.
    What makes Clojurea Different Lisp, and Why? • More first-class data structures • Defined in terms of abstractions • Functional • Host-embracing • Thread aware • Not constrained by backwards- compatibility 11
  • 12.
    Agenda • Clojure is a Lisp • First-class data structures • Abstraction • Immutability • Clojure’s approach to various Lisp features • Concurrency • JVM 12
  • 13.
    Clojure is aLisp • Dynamic • Code as data • Reader • Small core • REPL • Sequences • Syntactic abstraction 13
  • 14.
    Lisp Details • Lexically-scoped, Lisp-1 • CL-style macros and dynamic vars • Case-sensitive, Namespaces • Dynamically compiled to JVM bytecode • No tail call optimization • Many names are different • fn if def let loop recur do new . throw try set! quote var 14
  • 15.
    Atomic Data Types • Arbitrary precision integers -12345678987654 • Doubles , BigDecimals 1.234 1.234M • Ratios -22/7 • Strings -“fred”, Characters -a b c • Symbols - fred ethel , Keywords - :fred :ethel • Booleans - true false , Null - nil • Regex patterns #“a*b” 15
  • 16.
    First-class data structures • Lists inadequate for most programs • Lisps have others, but second class • No read/print support • Not structurally recursive • cons is non-destructive but vector-push is not • Lack of first-class associative data structures a disadvantage vs. all new dynamic languages 16
  • 17.
    Clojure Data Structures • Lists • Vectors • Maps - hashed and sorted • Sets - hashed and sorted • All with read/print support • All structurally recursive • Uniform ‘add’ - conj 17
  • 18.
    Data Structures • Lists - singly linked, grow at front • (1 2 3 4 5), (fred ethel lucy), (list 1 2 3) • Vectors - indexed access, grow at end • [1 2 3 4 5], [fred ethel lucy] • Maps - key/value associations • {:a 1, :b 2, :c 3}, {1 “ethel” 2 “fred”} • Sets #{fred ethel lucy} • Everything Nests 18
  • 19.
    Data Structure Literals (let [avec [1 2 3 4] amap {:fred "ethel"} alist (list 2 3 4 5)] [(conj avec 5) (conj amap [:ricky "lucy"]) (conj alist 1) ;the originals are intact avec amap alist]) [[1 2 3 4 5] {:ricky "lucy", :fred "ethel"} (1 2 3 4 5) [1 2 3 4] {:fred "ethel"} (2 3 4 5)] 19
  • 20.
    Abstractions • Standard Lisps define too many fundamental things in terms of concrete types • Limits generality • Limits extensibility • Marries implementation details • No excuse today • Many efficient abstraction mechanisms 20
  • 21.
    Abstracting away the Cons cell - Seqs • Cons cell details - 2 slot data structure, car/ cdr, arbitrary contents, mutable • Seq abstraction interface: first and rest • Vast majority of functions defined for cons cells can instead be defined in terms of first/rest • An object implementing first/rest can be created for vast majority of data structures 21 using (seq x)
  • 22.
    Benefits of Abstraction • "It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures." - Alan J. Perlis • Better still -100 functions per abstraction • seq implemented for all Clojure collections, all Java collections, Strings, regex matches, files etc. • first/rest can be lazily computed 22
  • 23.
    Clojure’s Abstractions • Sequences, Associative Maps, Indexed Vectors, Sets • Functions (call-ability) • Maps/vectors/sets are functions • Many implementations • Extensible from Java and Clojure • Large library • Functions + implementations -> F*I 23
  • 24.
    Lazy Seqs • first and rest not produced until requested • Define your own lazy seq-producing functions using the lazy-cons macro • Seqs can be used like iterators or generators of other languages • Lazy and concrete seqs interoperate - no separate streams library ;the library function take (defn take [n coll] (when (and (pos? n) (seq coll)) (lazy-cons (first coll) (take (dec n) (rest coll))))) 24
  • 25.
    Clojure is (Primarily) Functional • Core data structures immutable • Core library functions have no side effects • let-bound locals are immutable • loop/recur functional looping construct 25
  • 26.
    Sequences (drop 2 [1 2 3 4 5]) -> (3 4 5) (take 9 (cycle [1 2 3 4])) -> (1 2 3 4 1 2 3 4 1) (interleave [:a :b :c :d :e] [1 2 3 4 5]) -> (:a 1 :b 2 :c 3 :d 4 :e 5) (partition 3 [1 2 3 4 5 6 7 8 9]) -> ((1 2 3) (4 5 6) (7 8 9)) (map vector [:a :b :c :d :e] [1 2 3 4 5]) -> ([:a 1] [:b 2] [:c 3] [:d 4] [:e 5]) (apply str (interpose , "asdf")) -> "a,s,d,f" (reduce + (range 100)) -> 4950 26
  • 27.
    Maps and Sets (def m {:a 1 :b 2 :c 3}) (m :b) -> 2 ;also (:b m) (keys m) -> (:a :b :c) (assoc m :d 4 :c 42) -> {:d 4, :a 1, :b 2, :c 42} (merge-with + m {:a 2 :b 3}) -> {:a 3, :b 5, :c 3} (union #{:a :b :c} #{:c :d :e}) -> #{:d :a :b :c :e} (join #{{:a 1 :b 2 :c 3} {:a 1 :b 21 :c 42}} #{{:a 1 :b 2 :e 5} {:a 1 :b 21 :d 4}}) -> #{{:d 4, :a 1, :b 21, :c 42} {:a 1, :b 2, :c 3, :e 5}} 27
  • 28.
    # Norvig’s SpellingCorrector in Python # http://norvig.com/spell-correct.html def words(text): return re.findall('[a-z]+', text.lower()) def train(features): model = collections.defaultdict(lambda: 1) for f in features: model[f] += 1 return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word): n = len(word) return set([word[0:i]+word[i+1:] for i in range(n)] + [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + [word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet] + [word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet]) def known_edits2(word): return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words): return set(w for w in words if w in NWORDS) def correct(word): candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word] return max(candidates, key=lambda w: NWORDS[w])
  • 29.
    ; Norvig’s SpellingCorrector in Clojure ; http://en.wikibooks.org/wiki/Clojure_Programming#Examples (defn words [text] (re-seq #"[a-z]+" (.toLowerCase text))) (defn train [features] (reduce (fn [model f] (assoc model f (inc (get model f 1)))) {} features)) (def *nwords* (train (words (slurp "big.txt")))) (defn edits1 [word] (let [alphabet "abcdefghijklmnopqrstuvwxyz", n (count word)] (distinct (concat (for [i (range n)] (str (subs word 0 i) (subs word (inc i)))) (for [i (range (dec n))] (str (subs word 0 i) (nth word (inc i)) (nth word i) (subs word (+ 2 i)))) (for [i (range n) c alphabet] (str (subs word 0 i) c (subs word (inc i)))) (for [i (range (inc n)) c alphabet] (str (subs word 0 i) c (subs word i))))))) (defn known [words nwords] (for [w words :when (nwords w)] w)) (defn known-edits2 [word nwords] (for [e1 (edits1 word) e2 (edits1 e1) :when (nwords e2)] e2)) (defn correct [word nwords] (let [candidates (or (known [word] nwords) (known (edits1 word) nwords) (known-edits2 word nwords) [word])] (apply max-key #(get nwords % 1) candidates))) 29
  • 30.
    Persistent Data Structures • Immutable, + old version of the collection is still available after 'changes' • Collection maintains its performance guarantees for most operations • Therefore new versions are not full copies • All Clojure data structures persistent • Allows for Lisp-y recursive use of vectors and maps • Use real maps for things like environments, rather than a-lists 30
  • 31.
    Recursive Loops • No mutable locals in Clojure • No tail recursion optimization in the JVM • recur op does constant-space recursive looping • Rebinds and jumps to nearest loop or function frame (defn zipm [keys vals] (loop [map {} [k & ks :as keys] keys [v & vs :as vals] vals] (if (and keys vals) (recur (assoc map k v) ks vs) map))) (zipm [:a :b :c] [1 2 3]) -> {:a 1, :b 2, :c 3} (apply hash-map (interleave [:a :b :c] [1 2 3])) (into {} (map vector [:a :b :c] [1 2 3])) 31
  • 32.
    Equality • = is value equality • identical? is reference equality • rarely used in Clojure • = as per Henry Baker’s egal • Immutable parts compare by value • Mutable references by identity 32
  • 33.
    nil/false/eos/‘() Clojure CL Scheme Java nil nil nil/‘() - null true/false true/false - #t/#f true/false nil or false/ Conditional everything nil/non-nil #f/non-#f true/false else singleton No ‘() ‘() No empty list? end-of-seq nil nil ‘() FALSE Host null/ nil/true/false N/A N/A N/A true/false Library uses concrete No cons/vector pair No types 33
  • 34.
    Functions and Arity Overloading • Multiple distinct bodies in single function object • Supports fast 0/1/2...N dispatch with no conditional • Variable arity with & (defn drop-last "Return a lazy seq of all but the last n (default 1) items in coll" ([s] (drop-last 1 s)) ([n s] (map (fn [x _] x) (seq s) (drop n s)))) 34
  • 35.
    Vars • Similar to CL’s special vars • dynamic scope, stack discipline • Shared root binding established by def • root can be unbound • Can be set! but only if first bound using binding (not let) • Thread-local semantics • Functions stored in vars, so they too can be dynamically rebound 35 • context/aspect-like idioms
  • 36.
    Lisp-1 and defmacro • Symbols are simple values • Not identities/vars - no value cell • But still have constant-time = • Reader has no side effects • reads plain symbols • Symbols can have namespace part • my-namespace/foo 36 • ` resolves and qualifies symbols
  • 37.
    Syntax-quote 'foo -> foo `foo -> user/foo (= 'foo `foo) -> false foo -> Exception: Unable to resolve symbol: foo (def foo 5) -> #'user/foo foo -> 5 (= foo user/foo) -> true (resolve 'foo) -> #'user/foo (resolve 'list) -> #'clojure/list (defmacro dummy [x] `(list foo ~x)) (macroexpand '(dummy foo)) -> (clojure/list user/foo foo) 37
  • 38.
    Pervasive Destructuring • Abstract structural binding • In let/loop binding lists, fn parameter lists, and any macro that expands into a let or fn • Vector binding forms destructure sequential things • vectors, lists, seqs, strings, arrays, and anything that supports nth • Map binding forms destructure associative things • maps, vectors, strings and arrays (the latter three have integer keys) 38
  • 39.
    Example: Destructuring (let [[a b c & d :as e] [1 2 3 4 5 6 7]] [a b c d e]) -> [1 2 3 (4 5 6 7) [1 2 3 4 5 6 7]] (let [[[x1 y1][x2 y2]] [[1 2] [3 4]]] [x1 y1 x2 y2]) -> [1 2 3 4] (let [{a :a, b :b, c :c, :as m :or {a 2 b 3}} {:a 5 :c 6}] [a b c m]) -> [5 3 6 {:c 6, :a 5}] (let [{i :i, j :j, [r s & t :as v] :ivec} {:j 15 :k 16 :ivec [22 23 24 25]}] [i j r s t v]) -> [nil 15 22 23 (24 25) [22 23 24 25]] 39
  • 40.
    Polymorphism via Multimethods • Full generalization of indirect dispatch • Not tied to OO or types • Fixed dispatch function which is an arbitrary function of the arguments • Open set of methods associated with different values of the dispatch function • Call sequence: • Call dispatch function on args to get dispatch value • Find method associated with dispatch value 40 • else call default method if present, else error
  • 41.
    Example: Multimethods (defmulti encounter (fn [x y] [(:Species x) (:Species y)])) (defmethod encounter [:Bunny :Lion] [b l] :run-away) (defmethod encounter [:Lion :Bunny] [l b] :eat) (defmethod encounter [:Lion :Lion] [l1 l2] :fight) (defmethod encounter [:Bunny :Bunny] [b1 b2] :mate) (def b1 {:Species :Bunny :other :stuff}) (def b2 {:Species :Bunny :other :stuff}) (def l1 {:Species :Lion :other :stuff}) (def l2 {:Species :Lion :other :stuff}) (encounter b1 b2) -> :mate (encounter b1 l1) -> :run-away (encounter l1 b1) -> :eat (encounter l1 l2) -> :fight 41
  • 42.
    Metadata • Orthogonal to the logical value of the data • Symbols and collections support a metadata map • Does not impact equality semantics, nor seen in operations on the value • Support for literal metadata in reader (def v [1 2 3]) (def trusted-v (with-meta v {:source :trusted})) (:source ^trusted-v) -> :trusted (:source ^v) -> nil (= v trusted-v) -> true 42
  • 43.
    Concurrency • Interleaved/simultaneous execution • Must avoid seeing/yielding inconsistent data • The more components there are to the data, the more difficult to keep consistent • The more steps in a logical change, the more difficult to keep consistent • Opportunities for automatic parallelism • Emphasis here on coordination 43
  • 44.
    State - You’redoing it wrong • Mutable objects are the new spaghetti code • Hard to understand, test, reason about • Concurrency disaster • Terrible as a default architecture • (Java/C#/Python/Ruby/Groovy/CLOS...) • Doing the right thing is very difficult • Languages matter! 44
  • 45.
    Concurrency Mechanisms • Conventional way: • Direct references to mutable objects • Lock and worry (manual/convention) • Clojure way: • Indirect references to immutable persistent data structures (inspired by SML’s ref) • Concurrency semantics for references • Automatic/enforced 45 • No locks in user code!
  • 46.
    Typical OO -Direct references to Mutable Objects foo :a ? :b ? :c 42 :d ? :e 6 • Unifies identity and value • Anything can change at any time • Consistency is a user problem 46
  • 47.
    Clojure - Indirectreferences to Immutable Objects foo :a "fred" :b "ethel" @foo :c 42 :d 17 :e 6 • Separates identity and value • Obtaining value requires explicit dereference • Values can never change • Never an inconsistent value 47
  • 48.
    Persistent ‘Edit’ foo :a "fred" :b "ethel" @foo :c 42 :d 17 :e 6 Structural sharing :a "lucy" :b "ethel" • :c 42 New value is function of old :d 17 • :e 6 Shares immutable structure • Doesn’t impede readers • Not impeded by readers 48
  • 49.
    Atomic Update foo :a "fred" :b "ethel" :c 42 :d 17 :e 6 @foo Structural sharing :a "lucy" :b "ethel" • :c 42 Always coordinated :d 17 • Multiple semantics :e 6 • Next dereference sees new value • Consumers of values unaffected 49
  • 50.
    Clojure References • The only things that mutate are references themselves, in a controlled way • 3 types of mutable references, with different semantics: • Refs - Share synchronous coordinated changes between threads • Agents - Share asynchronous autonomous changes between threads • Vars - Isolate changes within threads 50
  • 51.
    Refs and Transactions • Software transactional memory system (STM) • Refs can only be changed within a transaction • All changes are Atomic, Consistent and Isolated • Every change to Refs made within a transaction occurs or none do • No transaction sees the effects of any other transaction while it is running • Transactions are speculative • Will be retried automatically if conflict 51 • User must avoid side-effects!
  • 52.
    The Clojure STM • Surround code with (dosync ...) • Uses Multiversion Concurrency Control (MVCC) • All reads of Refs will see a consistent snapshot of the 'Ref world' as of the starting point of the transaction, + any changes it has made. • All changes made to Refs during a transaction will appear to occur at a single point in the timeline. • Readers never impede writers/readers, writers never impede readers, supports commute 52
  • 53.
    Refs in action (def foo (ref {:a "fred" :b "ethel" :c 42 :d 17 :e 6})) @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6} (assoc @foo :a "lucy") -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6} @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6} (commute foo assoc :a "lucy") -> IllegalStateException: No transaction running (dosync (commute foo assoc :a "lucy")) @foo -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6} 53
  • 54.
    Agents • Manage independent state • State changes through actions, which are ordinary functions (state=>new-state) • Actions are dispatched using send or send-off, which return immediately • Actions occur asynchronously on thread-pool threads • Only one action per agent happens at a time 54
  • 55.
    Agents • Agent state always accessible, via deref/@, but may not reflect all actions • Can coordinate with actions using await • Any dispatches made during an action are held until after the state of the agent has changed • Agents coordinate with transactions - any dispatches made during a transaction are held until it commits • Agents are not Actors (Erlang/Scala) 55
  • 56.
    Agents in Action (def foo (agent {:a "fred" :b "ethel" :c 42 :d 17 :e 6})) @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6} (send foo assoc :a "lucy") @foo -> {:d 17, :a "fred", :b "ethel", :c 42, :e 6} (await foo) @foo -> {:d 17, :a "lucy", :b "ethel", :c 42, :e 6} 56
  • 57.
    Clojure is Hosted • JVM, not OS, is target platform • Java, not C, is interface language • Shares GC, memory model, stack, type system, exception handling with Java/JVM • JVM is the runtime • Quite excellent JIT, GC, threading, security • Integrated, wrapper-free, ad-hoc Java interop • Libraries available for everything 57
  • 58.
    Java Integration • Clojure strings are Java Strings, numbers are Numbers, collections implement Collection, fns implement Callable and Runnable etc. • Core abstractions, like seq, are Java interfaces • Clojure seq library works on Java Iterables, Strings and arrays. • Implement and extend Java interfaces and classes • New primitive arithmetic support equals Java’s speed. 58
  • 59.
    Java Interop Math/PI -> 3.141592653589793 (.. System getProperties (get "java.version")) -> "1.5.0_13" (new java.util.Date) -> Thu Jun 05 12:37:32 EDT 2008 (doto (JFrame.) (add (JLabel. "Hello World")) pack show) (into {} (filter #(re-find #"java" (key %)) (System/getProperties))) -> {"java.specification.version" "1.5",...} 59
  • 60.
    Swing Example (import '(javax.swing JFrame JLabel JTextField JButton) '(java.awt.event ActionListener) '(java.awt GridLayout)) (defn celsius [] (let [frame (JFrame. "Celsius Converter") temp-text (JTextField.) celsius-label (JLabel. "Celsius") convert-button (JButton. "Convert") fahrenheit-label (JLabel. "Fahrenheit")] (.addActionListener convert-button (proxy [ActionListener] [] (actionPerformed [evt] (let [c (. Double parseDouble (.getText temp-text))] (.setText fahrenheit-label (str (+ 32 (* 1.8 c)) " Fahrenheit")))))) (doto frame (setLayout (GridLayout. 2 2 3 3)) (add temp-text) (add celsius-label) (add convert-button) (add fahrenheit-label) (setSize 300 80) (setVisible true)))) 60 (celsius)
  • 61.
    Experiences on theJVM • Main complaint is no tail call optimization • HotSpot covers the last mile of compilation • Runtime optimizing compilation • Clojure can get ~1 gFlop without even generating JVM arithmetic primitives • Ephemeral garbage is extremely cheap • Great performance, many facilities • Verifier, security, dynamic code loading 61
  • 62.
    Benefits of theJVM • Focus on language vs code generation or mundane libraries • Sharing GC and type system with implementation/FFI language is huge benefit • Tools - e.g. breakpoint/step debugging, profilers etc. • Libraries! Users can do UI, database, web, XML, graphics, etc right away • Great MT infrastructure - java.util.concurrent 62
  • 63.
    There’s much more... • List (Seq) comprehensions • Ad hoc hierarchies • Type hints • Relational set algebra • Parallel computation • Namespaces • Zippers 63 • XML ...
  • 64.
    Thanks for listening! http://clojure.org Questions?