Introduction to Algorithms
6.046J/18.401J
Lecture 8
Prof. Piotr Indyk
Data structures
• Previous lecture: hash tables
– Insert, Delete, Search in (expected)
constant time
– Works for integers from {0…mr-1}
• This lecture: Binary Search Trees
– Insert, Delete, Search (Successor)
– Works in comparison model
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.2
Binary Search Tree
• Each node x has: 9
– key[x]
5 12
– Pointers:
• left[x] 1 6
• right[x]
7
• p[x]
8
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.3
Binary Search Tree (BST)
• Property: for any node x: 9
– For all nodes y in the left
subtree of x:
key[y] ≤ key[x]
5 12
– For all nodes y in the right
subtree of x: 1 6
key[y] ≥ key[x]
7
• Given a set of keys, is BST for
those keys unique? 8
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.4
No uniqueness
7 9
5 9 5 12
1 6 8 12
1 6
7
8
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.5
What can we do given BST ?
• Sort ! 9
• Inorder-Walk(x):
5 12
If x≠NIL then
– Inorder-Walk( left[x] ) 1 6
– print key[x]
– Inorder-Walk( right[x] ) 7
• Output: 8
1 5 6 7 8 9 12
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.6
Sorting, ctd.
• What is the running time of 9
Inorder-Walk?
• It is O(n) 5 12
• Because: 1 6
– Each link is traversed
twice 7
– There are O(n) links 8
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.7
Sorting, ctd.
• Does it mean that we can 9
sort n keys in O(n) time ?
• No 5 12
• It just means that building 1 6
a BST takes Ω(n log n)
time 7
(in the comparison model)
8
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.8
BST as a data structure
• Operations:
– Insert(x)
– Delete(x)
– Search(k)
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.9
Search
Search(x): 9
• If x≠NIL then
– If key[x] = k then return x 5 12
– If k < key[x] then return
Search( left[x] ) 1 6
– If k > key[x] then return
7
Search( right[x] )
• Else return NIL Search(8): 8
Search(8.5):
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.10
Predecessor/Successor
• Can modify Search (into Search’) such that,
if k is not stored in BST, we get x such that:
– Either it has the largest key[x]<k, or
– It has the smallest key[x]>k
• Useful when k prone to errors
• What if we always want a successor of k ?
– x=Search’(k)
– If key[x]<k, then return Successor(x)
– Else return x
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.11
Successor
y
9
Successor(x):
yx
• If right[x] ≠ NIL then 5 12
return Minimum( right[x] )
• Otherwise yx
1 6
– y ← p[x]
– While y≠NIL and x=right[y] do 7 xy
•x←y
x
• y ← p[y] 8
– Return y
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.12
Minimum
Minimum( x ) 9
• While left[x]≠NIL do
5 12
– x ← left[x]
• Return x 1 6
7
8
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.13
Nearest Neighbor
• Assuming keys are numbers
• For a key k, can we find x such that |k-key[x]| is
minimal ?
• Yes:
– key[x] must be either a predecessor or
successor of k
– y=Search’(k) //y is either succ or pred of k
– y’ =Successor(y)
– y’’=Predecessor(y)
– Report the closest of key[y], key[y’], key[y’’]
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.14
Analysis
• How much time does all of 9
this take ?
• Worst case: O(height) 5 12
• Height really important 1 6
• Tree better be balanced
7
8
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.15
Constructing BST
Insert(z):
9
• y ← NIL
• x ← root
• While x ≠ NIL do 5 12
– y←x
– If key[z] < key[x] 1 6 y
then x ← left[x]
else x ← right[x] z 5.5
• p[z] ← y Insert(8.5) 7
• If key[z] < key[y]
then left[y] ← z Insert(5.5) 8
else right[y] ← z 8.5
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.16
Analysis
1
2
• After we insert n elements, 3
what is the worst possible
BST height ? 4
5
• Pretty bad: n-1
6
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.17
Average case analysis
• Consider keys 1,2,…,n, in a random order
• Each permutation equally likely
• For each key perform Insert
• What is the likely height of the tree ?
• It is O(log n)
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.18
Creating a random BST
• n=9 1 2 3 4 5 6 7 8 9
1 2 4 5 6 7 8 9
2 4 5 7 8 9
4 7 9
3 6 8 5 1 2 7 4 9
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.19
Observations
• Each edge corresponds to a random
partition
• Element x has height h ⇒ x participated in
h partitions
• Let hx be a random variable denoting height
of x
• What is Pr[hx >t] , where t=c lg n ?
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.20
Partitions
• A partition is lucky if the ratio is at least 1:3, i.e.,
each side has size ≥ 25%
• Probability of lucky partition is ½
• After log4/3 n lucky partitions the element becomes a
leaf
• hx>t ⇒ in t= c log4/3 n partitions we had <log4/3 n
lucky ones
• Toss t= c log4/3 n coins, what is the probability you
get <k=log4/3 n heads ?
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.21
Concentration inequalities
• CLRS, p. 1118: probability of at most k heads
in t trials is at most (t ) /2t-k
t t-k k
Pr[hx >t] ≤ (k) /2
≤ (et/k)k/2t-k
= (ce)log n/2(c-1) log n
4/3 4/3
= 2lg(ce) log n/2 (c-1) log n
4/3 4/3
= 2 [lg(ce) – (c-1)] * (lg n)/ lg(4/3)
≤ 2 -1.1 lg n = 1/n1.1, for sufficient c
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.22
Final Analysis
• We know that for each x, Pr[hx >t] ≤ 1/n1.1
• We want Pr[h1>t or h2>t or … or hn>t]
• This is at most
Pr[h1>t]+Pr[h2>t] +…+ Pr[hn>t]
≤ n * 1/n1.1
= 1/n0.1
• As n grows, probability of height >c lgn
becomes arbitrarily small
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.23
Summing up
• We have seen BSTs
• Support Search, Successor, Nearest
Neighbor etc, as well as Insert
• Worst case: O(n)
• But O(log n) on average
• Next week: O(log n) worst case
© Piotr Indyk Introduction to Algorithms October 6, 2004 L7.24