Heap and Priority Queue
Li Yin1
                           February 6, 2019
1
    www.liyinscience.com
ii
                                              Heap and Priority Queue
In this chapter, we introduce heap data structures which is essentially an
array object but it can be viewed as a nearly complete binary tree. The
concept of the data structures in this chapter is between liner and non-
linear, that is using linear data structures to mimic the non-linear data
structures and its behavior for higher efficiency under certain context.
0.1    Heap
Heap is a tree based data structures that satisfies heap property but im-
plemented as an array data structure. There are two kinds of heaps: max-
heaps and min-heaps. In both kinds, the values in the nodes satisfy a
heap property. For max-heap, the property states as for each subtree rooted
at x, items on all of its children subtrees of x are smaller than x. Normally,
heap is based on binary tree, which makes it a binary heap. Fig. 1 show a
binary max-heap and how it looks like in a binary tree data structure. In the
following content, we default our heap is a binary heap. Thus, the largest
element in a max-heap is stored at the root. For a heap of n elements the
height is log n).
Figure 1: Max-heap be visualized with binary tree structure on the left, and
implemnted with Array on the right.
                                      iii
    iv                                                     HEAP AND PRIORITY QUEUE
        As we can see we can implement heap as an array due to the fact that
    the tree is complete. A complete binary tree is one in which each level must
    be fully filled before starting to fill the next level. Array-based heap is more
    space efficient compared with tree based due to the non-existence of the child
    pointers for each node. To make the math easy, we iterate node in the tree
    starting from root in the order of level by level and from left to right with
    beginning index as 1 (shown in Fig. 1). According to such assigning rule,
    the node in the tree is mapped and saved in the array by the assigned index
    (shown in Fig. 1). In heap, we can traverse the imaginary binary tree in two
    directions: root-to-leaf and leaf-to-root. Given a parent node with p as
    index, the left child of can be found in position 2p in the array. Similarly,
    the right child of the parent is at position 2p + 1 in the list. To find the
    parent of any node in the tree, we can simply use bp/2c. In Python3, use
    integer division n//2. Note: we can start index with 0 as used in heapq
    library introduced later in this section. Given a node x, the left and right
    child will be 2∗x+1, 2∗x+2, and the parent node will have index (x−1)//2.
        The common application of heap data structure include:
         • Implementing a priority-queue data structure which will be detailed in
           the next section so that insertion and deletion can be implemented in
           O(log n); Priority Queue is an important component in algorithms like
           Kruskal’s for minimum spanning tree (MST) problem and Dijkstra’s
           for single-source shortest paths (SSSP) problem.
         • Implementing heapsort algorithm,
        Normally, there is usually no notion of ’search’ in heap, but only insertion
    and deletion, which can be done by traversing a O(log n) leaf-to-root or root-
    to-leaf path.
    0.1.1      Basic Implementation
    The basic method that a heap supports are insert and pop. Additionally,
    given an array, to convert it to a heap, we need operation called heapify.
        Let’s implement a heap class using list. Because the first element of the
    heap is actually empty, we define our class as follows:
1   c l a s s Heap :
2           d e f __init__ ( s e l f ) :
3                 s e l f . heap = [ None ]
4                 self . size = 0
5           d e f __str__ ( s e l f ) :
6                 out = ’ ’
7                 f o r i in range (1 , s e l f . s i z e + 1) :
8                         out += s t r ( s e l f . heap [ i ] ) + ’ ’
9                 r e t u r n out
     0.1. HEAP                                                                                                       v
     Insert with Floating When we insert an item, we put it at the end of
     the heap (array) first and increase the size by one. After this, we need to
     traverse from the last node leaf-to-root, and compare each leaf and parent
     pair to decide is a swap operation is needed to force the heap property. For
     example, in the min-heap. The time complexity is the same as the height of
     the complete tree, which is O(log n).
 1        d e f _ f l o a t ( s e l f , i n d e x ) : # e n f o r c e min−heap , l e a f −to−r o o t
 2               w h i l e i n d e x // 2 : # w h i l e p a r e n t e x i s t
 3                        p_index = i n d e x // 2
 4                        i f s e l f . heap [ i n d e x ] < s e l f . heap [ p_index ] :
 5                               # swap
 6                                s e l f . heap [ i n d e x ] , s e l f . heap [ p_index ] = s e l f . heap
           [ p_index ] , s e l f . heap [ i n d e x ]
 7                        i n d e x = p_index # move up t h e node
 8          def i n s e r t ( s e l f , val ) :
 9               s e l f . heap . append ( v a l )
10               s e l f . s i z e += 1
11               s e l f . _float ( index = s e l f . s i z e )
     Pop with Sinking When we pop an item, we first save the root node’s
     value in order to get either the maximum or minimum item in the heap.
     Then we simply use the last item to fill in the root node. After this, we
     need to traverse from the root node root-to-leaf, and compare each parent
     with left and right child. In a min-heap, if the parent is larger than its
     smaller child node, we swap the parent with the smaller child. This process
     is called sinking. Same as the insert, O(log n).
 1          d e f _sink ( s e l f , i n d e x ) : # e n f o r c e min−heap , r o o t −to−l e a f
 2                  while 2 ∗ index < s e l f . s i z e :
 3                          l i = 2 ∗ index
 4                          ri = li + 1
 5                          mi = l i i f s e l f . heap [ l i ] < s e l f . heap [ r i ] e l s e r i
 6                          i f s e l f . heap [ i n d e x ] > s e l f . heap [ mi ] :
 7                                 # swap i n d e x with mi
 8                                  s e l f . heap [ i n d e x ] , s e l f . heap [ mi ] = s e l f . heap [ mi ] ,
             s e l f . heap [ i n d e x ]
 9                          i n d e x = mi
10          d e f pop ( s e l f ) :
11                  v a l = s e l f . heap [ 1 ]
12                  s e l f . heap [ 1 ] = s e l f . heap . pop ( )
13                  s e l f . s i z e −= 1
14                  s e l f . _sink ( i n d e x = 1 )
15                  return val
     Now, let us run an example:
 1   h = Heap ( )
 2   l s t = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
 3   for v in l s t :
 4           h. insert (v)
 5   p r i n t ( ’ h e a p i f y with i n s e r t i o n :   ’ , h)
    vi                                                         HEAP AND PRIORITY QUEUE
6   h . pop ( )
7   p r i n t ( ’ a f t e r pop ( ) :   ’ , h)
    The output is listed as:
1   h e a p i f y with i n s e r t i o n : 1 3 5 78 21 45
2   a f t e r pop ( ) : 3 21 5 78 45
    Heapify with Bottom-up Floating Heapify is a procedure that convert
    a list to a heap data structure. Similar to the operation insertion, it uses
    floating. Differently, if we iterating through the list and use insertion opera-
    tion, each time we try to float, the previous list is already a heap. However,
    given an unordered array, we can only get the minimum/maximum node
    floating starting from the bottom. Therefore, we call floating process for
    the elements in the list in reverse order as a bottom-up manner. The up-
    per bound time complexity is O(n log n) because we have n call of the float
    which has an upper bound O(log n). However, we can prove it actually has
    a tighter bound by observing that the running time depends on the height
    of the tree. The proving process is shown on geeksforgeeks 1 .
1             def heapify ( s e l f , l s t ) :
2                 s e l f . heap = [ None ] + l s t
3                 s e l f . size = len ( l s t )
4                 f o r i i n r a n g e ( s e l f . s i z e , 0 , −1) :
5                         s e l f . _float ( i )
             Now, run the following code:
1   h = Heap ( )
2   h . heapify ( l s t )
3   p r i n t ( ’ h e a p i f y with h e a p i f y : ’ , h )
    Out put is:
1   h e a p i f y with h e a p i f y : 1 5 21 78 3 45
             Which way is more efficient building a heap from a list?
              Using insertion or heapify? What is the efficiency of each method?
              The experimental result can be seen in the code.
    When we are solving a problem, unless specifically required for implemen-
    tation, we can always use an existent Python module/package. Here, we
    introduce one Python module: heapq that implements heap data structure
    for us.
         1
             https://www.geeksforgeeks.org/time-complexity-of-building-a-heap/
    0.1. HEAP                                                                            vii
    0.1.2     Python Built-in Library: heapq
    heapq: heapq is a built-in library in Python that implements relevant
    functions to carry out various operations on heap data structure. These
    functions are listed and described in Table 1. To note that heapq is not
    a data type like queue.Queue() or collections.deque(), it is a library (or
    class) that can do operations like it is on a heap. heapq has some other
                                  Table 1: Methods of heapq
     Method                      Description
     heappush(h, x)              Push the value item onto the heap, maintaining the heap in-
                                 variant.
     heappop(h)                  Pop and return the smallest item from the heap, maintaining
                                 the heap invariant. If the heap is empty, IndexError is raised.
     heappushpop(h, x)           Push item on the heap, then pop and return the smallest item
                                 from the heap. The combined action runs more efficiently than
                                 heappush() followed by a separate call to heappop().
     heapify(x)                  Transform list x into a heap, in-place, in linear time.
     heapreplace(h, x)           Pop and return the smallest item from the heap, and also push
                                 the new item. The heap size doesn’t change. If the heap is
                                 empty, IndexError is raised. This is more efficient than heap-
                                 pop() followed by heappush(), and can be more appropriate
                                 when using a fixed-size heap.
     nlargest(k, iterable,       This function is used to return the k largest elements from the
     key = fun)                  iterable specified and satisfying the key if mentioned.
     nsmallest(k,     iter-      This function is used to return the k smallest elements from
     able, key = fun)            the iterable specified and satisfying the key if mentioned.
    functions like merge(), nlargest(), nsmallest() that we can use. Check out
    https://docs.python.org/3.0/library/heapq.html for more detials.
        Now, let us try to heapify the same examplary list as used in the last
    section, [21, 1, 45, 78, 3, 5], we use need to call the function heapify(). The
    time complexity of heapify is O(n)
1    ’ ’ ’ implementing with heapq ’ ’ ’
2   from heapq import heappush , heappop , h e a p i f y
3   h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
4   heapify (h) # inplace
5   p r i n t ( ’ h e a p i f y with heapq : ’ , h )
    The print out is:
1   h e a p i f y with heapq :     [ 1 , 3 , 5 , 78 , 21 , 45]
        As we can see the default heap implemented in the heapq library is
    forcing the heap property of the min-heap. What is we want a max-heap
    instead? In heapq library, it does offer us function, but it is intentionally
    hided from users. It can be accessed like: heapq._[function]_max(). Now,
    let us implement a max-heap instead.
    viii                                                HEAP AND PRIORITY QUEUE
1   # implement a max−heap
2   h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
3   heapq . _heapify_max ( h ) # i n p l a c e
4   p r i n t ( ’ h e a p i f y max−heap with heapq :   ’ , h)
    The print out is:
1   h e a p i f y max−heap with heapq :        [ 7 8 , 21 , 45 , 1 , 3 , 5 ]
       Also, in practise, a simple hack for the max-heap is to save data as
    negative. Also, in the priority queue.
    0.2      Priority Queue
    A priority queue is an extension of queue with properties: (1) additionally
    each item has a priority associated with it. (2) In a priority queue, an item
    with high priority is served (dequeued) before an item with low priority. (3)
    If two items have the same priority, they are served according to their order
    in the queue.
        Heap is generally preferred for priority queue implementation because
    of its better performance compared with arrays or linked list. Also, in
    Python queue module, we have PriorityQueue() class that provided us the
    implementation. Beside, we can use heapq library too. These contents will
    be covered in the next two subsection.
        Applications of Priority Queue:
       1. CPU Scheduling
       2. Graph algorithms like Dijkstra’s shortest path algorithm, Prim’s Min-
          imum Spanning Tree, etc
       3. All queue applications where priority is involved.
    Implement with heapq Library The core function is the ones used to
    implement the heap: heapify(), push(), and pop(). Here we demonstrate
    how to use function nlargest() and nsmallest() if getting the first n largest
    or smallest is what we need, we do not need to heapify() the list as we needed
    in the heap and pop out the smallest. The step of heapify is built in these
    two functions.
1   ’ ’ ’ u s e heapq t o g e t n l a r g e s t and n s m a l l e s t ’ ’ ’
2   l i 1 = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
3   # u s i n g n l a r g e s t t o p r i n t 3 l a r g e s t numbers
4   p r i n t ( " The 3 l a r g e s t numbers i n l i s t a r e : " , end=" " )
5   p r i n t ( heapq . n l a r g e s t ( 3 , l i 1 ) )
6
7   # u s i n g n s m a l l e s t t o p r i n t 3 s m a l l e s t numbers
8   p r i n t ( " The 3 s m a l l e s t numbers i n l i s t a r e : " , end=" " )
9   p r i n t ( heapq . n s m a l l e s t ( 3 , l i 1 ) )
     0.2. PRIORITY QUEUE                                                                               ix
     The print out is:
 1   The 3 l a r g e s t numbers i n l i s t a r e : [ 7 8 , 4 5 , 2 1 ]
 2   The 3 s m a l l e s t numbers i n l i s t a r e : [ 1 , 3 , 5 ]
     Implement with PriorityQueue class Class PriorityQueue() is the
     same as Queue(), LifoQueue(), they have same member functions as shown
     in Table ??. Therefore, we skip the semantic introduction. PriorityQueue()
     normally thinks that the smaller the value is the higher the priority is, such
     as the following example:
 1   import queue #Python3 n e e d s t o u s e queue , o t h e r s Queue
 2   q = queue . P r i o r i t y Q u e u e ( )
 3   q . put ( 3 )
 4   q . put ( 1 0 )
 5   q . put ( 1 )
 6
 7   w h i l e not q . empty ( ) :
 8           next_job = q . g e t ( )
 9           p r i n t ( ’ P r o c e s s i n g j o b : ’ , next_job )
     The output is:
 1   Processing job : 1
 2   Processing job : 3
 3   P r o c e s s i n g j o b : 10
         We can also pass by any data type that supports ’<’ comparison oepra-
     tor, such as tuple, it will use the first item in the tuple as the key.
         If we want to give the number with larger value as higher priority, a
     simple hack is to pass by negative value. Another more professional way is
     to pass by a customized object and rewrite the comparison operator: < and
     == in the class with __lt__() and __eq__(). In the following code, we
     show how to use higher value as higher priority.
 1   c l a s s Job ( o b j e c t ) :
 2           d e f __init__ ( s e l f , p r i o r i t y , d e s c r i p t i o n ) :
 3                    self . priority = priority
 4                    self . description = description
 5                    p r i n t ( ’New j o b : ’ , d e s c r i p t i o n )
 6                    return
 7          # d e f __cmp__( s e l f , o t h e r ) :
 8          #             r e t u r n cmp( s e l f . p r i o r i t y , o t h e r . p r i o r i t y )
 9            ’ ’ ’ c u s t o m i z e t h e comparison o p e r a t o r s ’ ’ ’
10           d e f __lt__( s e l f , o t h e r ) : # <
11                    try :
12                            return s e l f . p r i o r i t y > other . p r i o r i t y
13                    except AttributeError :
14                            r e t u r n NotImplemented
15           d e f __eq__( s e l f , o t h e r ) : # ==
16                    try :
17                            r e t u r n s e l f . p r i o r i t y == o t h e r . p r i o r i t y
18                    except AttributeError :
     x                                                             HEAP AND PRIORITY QUEUE
19                        r e t u r n NotImplemented
20
21   q = Queue . P r i o r i t y Q u e u e ( )
22
23   q . put ( Job ( 3 , ’ Mid−l e v e l j o b ’ ) )
24   q . put ( Job ( 1 0 , ’ Low−l e v e l j o b ’ ) )
25   q . put ( Job ( 1 , ’ Important j o b ’ ) )
26
27   w h i l e not q . empty ( ) :
28           next_job = q . g e t ( )
29           p r i n t ( ’ P r o c e s s i n g j o b : ’ , next_job . p r i o r i t y )
     The print out is:
 1   P r o c e s s i n g j o b : 10
 2   Processing job : 3
 3   Processing job : 1
           In single thread programming, is heapq or PriorityQueue
           more efficient?
           In fact, the PriorityQueue implementation uses heapq under the hood
           to do all prioritisation work, with the base Queue class providing the
           locking to make it thread-safe. While heapq module offers no locking,
           and operates on standard list objects. This makes the heapq module
           faster; there is no locking overhead. In addition, you are free to use
           the various heapq functions in different, noval ways, while the Priori-
           tyQueue only offers the straight-up queueing functionality.
     0.3        Bonus
     Let us take these knowledge into practice with a LeetCode Problem:
         0.1 347. Top K Frequent Elements (medium). Given a non-empty
             array of integers, return the k most frequent elements.
             Example 1 :
             Input : nums = [ 1 , 1 , 1 , 2 , 2 , 3 ] , k = 2
             Output : [ 1 , 2 ]
             Example 2 :
             Input : nums = [ 1 ] , k = 1
             Output : [ 1 ]
             Analysis: to solve this problem, we need to first using a hashmap to
             get information as: item and its freqency. Then, we need to obtain
             the top frequent elements. The second step can be down with sorting,
             or using heap we learned.
0.4. LEETCODE PROBLEMS                                                                             xi
       Solution 1: Use Counter(). Counter() has a function most_common(k)
       that will return the top k most frequent items. However, its complexity
       will be O(n log n).
   1   from c o l l e c t i o n s import Counter
   2   d e f topKFrequent ( s e l f , nums , k ) :
   3         r e t u r n [ x f o r x , _ i n Counter ( nums ) . most_common ( k ) ]
       Solution 2: Use dict and heapq.nlargest(). The complexity
       should be better than O(n log n).
   1   from c o l l e c t i o n s import Counter
   2   import heapq
   3   d e f topKFrequent ( s e l f , nums , k ) :
   4         count = c o l l e c t i o n s . Counter ( nums )
   5         r e t u r n heapq . n l a r g e s t ( k , count . k e y s ( ) , key=count . g e t )
       We can also use PriorityQueue().
   1   from queue import P r i o r i t y Q u e u e
   2   class Solution :
   3   d e f topKFrequent ( s e l f , nums , k ) :
   4         h = PriorityQueue ()
   5
   6        # b u i l d a hashmap ( element , f r e q u e n c y )
   7        temp = {}
   8         f o r n i n nums :
   9               i f n not i n temp :
  10                     temp [ n ] = 1
  11               else :
  12                     temp [ n ] += 1
  13        # put them a s (− f r e q u e n c y , e l e m e n t ) i n t h e queue o r
            heap
  14         f o r key , item i n temp . i t e m s ( ) :
  15               h . put (( − item , key ) )
  16
  17        # g e t t h e top k f r e q u e n t o n e s
  18        ans = [ None ] ∗ k
  19        f o r i in range ( k ) :
  20                _, ans [ i ] = h . g e t ( )
  21        r e t u r n ans
Fibonacci heap With fibonacc heap, insert() and getHighestPriority()
can be implemented in O(1) amortized time and deleteHighestPriority()
can be implemented in O(Logn) amortized time.
0.4      LeetCode Problems
selection with key word: kth. These problems can be solved by
sorting, using heap, or use quickselect
  1. 703. Kth Largest Element in a Stream (easy)
xii                                        HEAP AND PRIORITY QUEUE
      2. 215. Kth Largest Element in an Array (medium)
      3. 347. Top K Frequent Elements (medium)
      4. 373. Find K Pairs with Smallest Sums (Medium
      5. 378. Kth Smallest Element in a Sorted Matrix (medium)
priority queue or quicksort, quickselect
      1. 23. Merge k Sorted Lists (hard)
      2. 253. Meeting Rooms II (medium)
      3. 621. Task Scheduler (medium)