Chapter 7 – Trees
Algorithms and dynamic data structures
A. Chikhaoui                             2022 - 2023
Introduction
   For a dictionary with n (key, value) pairs:
                                                 find       insert    delete
                     Unsorted linked-list        O(n)        O(1)      O(n)
                        Unsorted array           O(n)        O(1)      O(n)
                       Sorted linked list        O(n)        O(n)      O(n)
                         Sorted array         O(log(n))      O(n)      O(n)
                        Balanced tree         O(log(n)) O(log(n)) O(log(n))
                  Is there a data structure?     O(1)        O(1)      O(1)
   It is interesting to have a data structure with constant time (complexity= o(1)) for
   find,insert and delete operations → Hash tables
   hashing is used in:
       ▶   Web searches
       ▶   Databases
       ▶   Compilers
       ▶   passwords
       ▶   , etc.
   1                                                                                      12
Definition
   Hashing is a technique used for storing and retrieving information as quickly as possible.
   It is used to perform optimal searches
                                                        9
                                                        8
                                                        7
           x1     x4                                    6
                                 hash function
           x2                     index=h(xi )          5
    x3                                                  4
                                                        3
                                                        2
                                                        1
                                                        0
   h is called: hash function such that h: S → [0...N − 1]. It maps a set of objects
   S = {x1 , x2 , ...xm } into a table of size N.
   It is a good idea to pick a prime as the table size to have a better distribution of values.
   h(x) represents the primary address of x (it is an index of T)
   Example of hash function is MOD function.
            2                                                                                     12
Collisions
   Example: Let suppose that we have a hash table T and its size is 100. The hash function
   is h(x) = x%100. If x1 = 22 and x2 = 122.
   x1 and x2 will be mapped to the same location based our hash function.
   This is called Collision
   Definition: Collision is the condition where two items or more are stored in the same
   location.
   When collisions occur, we need to “handle” them using Collision Resolution
   Techniques (see later).
   Collisions can be reduced with a selection of a good hash function.
                   3                                                                         12
Characteristics of Good hash functions
   A good hash function should have the following characteristics:
    ▶   Minimize collision
    ▶   Be easy and quick to compute
    ▶   Distribute key values evenly in the hash table
    ▶   Use all the information provided in the key
    ▶   Have a high load factor for a given set of keys
             Load factor: The load factor of a hash table is the number of items stored in the table divided
             by the size of the table. This is the decision parameter used when we want to rehash or expand
             the existing hash table entries.
                             4                                                                                 12
Collision Resolution Techniques
   The process of finding an alternate location is called collision resolution.
   the most popular collision resolution tchniques are :
     ▶ Direct Chaining: An array of linked lists
             Separate chaining
     ▶ Open Addressing: Array-based implementation
             Linear probing (linear search)
             Quadratic probing (nonlinear search)
             Double hashing (use two hash functions)
                                     5                                            12
Separate Chaining
   Collision resolution by chaining combines linked representation with hash table.
   All keys that map to the same table location are kept in a singly-linked list.
  Operations:
   ▶ f ind: The search for x begins in the cell
     h(x). If x is not in that cell, we search in
     the corresponding linked list.
   ▶ insert: the insertion begins with the
     search for the key x. If x does not exist,
     we add it in the cell h(x) if it is empty
     otherwise we add it in the linked list
     associated with this cell.
   ▶ delete: if x exists, delete it and
     rearrange the table.
   Complexity: In the worse case O(n), in the average case O(1).
                                                6                                     12
Linear Probing
   In linear probing, we search the hash table sequentially, starting from the original hash
   location.
   If a location is occupied, we check the next locations (h(x)-1, h(x)-2,...) Until finding an
   empty location.
   ( We wrap around from the last table location to the first table location if necessary, it
   means the next location of 0 is N-1: circular array).
   There must always be at least one empty cell in the table (one position is sacrificed)
   Example: insert the following items into an array of size=10, h(x)=x%size:
   55, 21, 63, 103, 10, 82, 18.
                                                    7                                             12
Linear Probing
                 8   12
Linear Probing
                 8   12
Linear Probing
                 8   12
Linear Probing
                 8   12
Linear Probing
                 9   12
Linear Probing
                 9   12
Linear Probing
                 9   12
Linear Probing
                 9   12
Linear Probing
   Operations:
    ▶ f ind: The search for x begins in the cell h(x), if the cell doesn’t match, look elsewhere. If if
      x does not exist in the table, the find operation stops in the empty cell.
    ▶ insert: compute h(x), if the cell is full, find another by sequentially searching for the next
      available cell: h(x)-1, h(x)-2, ...
    ▶ delete: Search x. Test all the data above (circularly) until you find an empty cell and
      possibly move some values. Clear a cell.
   Complexity: In the worse case O(n), in the average case O(1).
                                                                                  10                      12
Quadratic Probing
   Same idea as linear probing except that collisions are resolved by examining cells:
   h(x) − 12 , h(x) − 22 , h(x) − 32 , ...
                                                                                 11      12
Double hashing
   Like the Linear probing except that the cyclic sequence of cells to be tested uses a step
   calculated by a 2nd hash function h′ (x).
   Let h(x) be the function used to calculate the primary address and h′ (x) the function
   which calculates the step of the sequence: h(x), h(x) − h′ (x) , h(x) − 2h′ (x) ,
   h(x) − 3h′ (x) , ...
   For the sequence to be circular, the subtractions are done modulo the size of the table.
   As in linear probing, one cell is sacrificed.
                                                                                          12 / 12