KEMBAR78
DAA Module 3 | PDF | Algorithms And Data Structures | Algorithms
0% found this document useful (0 votes)
10 views58 pages

DAA Module 3

The document discusses balanced search trees, focusing on AVL trees and 2-3 trees, which are designed to maintain efficient dictionary operations while avoiding degeneracy. AVL trees are self-balancing binary search trees where the balance factor of each node is constrained, and rotations are used to maintain balance during insertions. 2-3 trees allow nodes to contain multiple keys and ensure all leaves are at the same level, providing logarithmic efficiency for search, insertion, and deletion operations.

Uploaded by

Adwaith Shine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views58 pages

DAA Module 3

The document discusses balanced search trees, focusing on AVL trees and 2-3 trees, which are designed to maintain efficient dictionary operations while avoiding degeneracy. AVL trees are self-balancing binary search trees where the balance factor of each node is constrained, and rotations are used to maintain balance during insertions. 2-3 trees allow nodes to contain multiple keys and ensure all leaves are at the same level, providing logarithmic efficiency for search, insertion, and deletion operations.

Uploaded by

Adwaith Shine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Module-3

Balanced Search Trees

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Balanced Search Trees

▪ Computer scientists have expended a lot of effort in trying to find a structure that
preserves the good properties of the classical binary search tree—principally, the
logarithmic efficiency of the dictionary operations and having the set’s elements
sorted—but avoids its worst-case degeneracy.
▪ They have come up with two approaches.

▪ The first approach is of the instance-simplification variety: an unbalanced binary


search tree is transformed into a balanced one. Because of this, such trees are
called self-balancing.
✔ AVL tree
✔ red-black tree
✔ splay trees

▪ The second approach is of the representation-change variety: allow more than


one element in a node of a search tree.
✔ 2-3 trees
✔ 2-3-4 trees
✔ B-trees

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


AVL trees

AVL trees were invented in 1962 by two Russian scientists, G. M. Adelson-Velsky and E.
M. Landis [Ade62], after whom this data structure is named.

DEFINITION:

▪ An AVL tree is a binary search tree in which the Balance Factor of every node, which
is defined as the difference between the heights of the node’s left and right
subtrees, is either 0 or +1 or −1.
▪ The height of the empty tree is defined as −1.
▪ Of course, the balance factor can also be computed as the difference between the
numbers of levels rather than the height difference of the node’s left and right
subtrees.

Balance Factor = Height of Left Subtree − Height of Right Subtree

▪ For example, the binary search tree in Figure 6.2 a is an AVL tree but the one in
Figure 6.2 b is not.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


DESIGN AND ANALYSIS OF ALGORITHMS Module-3
▪ If an insertion of a new node makes an AVL tree unbalanced, we transform the
tree by a rotation.
▪ A rotation in an AVL tree is a local transformation of its subtree rooted at a node
whose balance has become either +2 or −2.
▪ If there are several such nodes, we rotate the tree rooted at the unbalanced node
that is the closest to the newly inserted leaf.
▪ There are only four types of rotations; in fact, two of them are mirror images of
the other two.
▪ In their simplest form, the four rotations are shown in Figure 6.3.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


First rotation type

▪ The first rotation type is called the single right rotation, or R-rotation. (Imagine
rotating the edge connecting the root and its left child in the binary tree in
Figure 6.3a to the right.)

▪ The symmetric single left rotation, or L-rotation, is the mirror image of the single
R-rotation.
▪ It is performed after a new key is inserted into the right subtree of the right child
of a tree whose root had the balance of −1 before the insertion.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


FIGURE 6.3 Four rotation types for AVL trees with three nodes.
(a) Single R-rotation. (b) Single L-rotation.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


▪ Figure 6.4 presents the single R-rotation in its most general form.
▪ Note that this rotation is performed after a new key is inserted into the left subtree
of the left child of a tree whose root had the balance of +1 before the insertion.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Second rotation type

▪ The second rotation type is called the double left-right rotation (LR rotation).
▪ It is, in fact, a combination of two rotations: we perform the L-rotation of the left
subtree of root r followed by the R-rotation of the new tree rooted at r (Figure 6.5).
▪ It is performed after a new key is inserted into the right subtree of the left child of
a tree whose root had the balance of +1 before the insertion.

▪ The double right-left rotation (RL-rotation) is the mirror image of the double
LR-rotation and is left for the exercises.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


FIGURE 6.3 Four rotation types for AVL trees with three nodes.
(c) Double LR-rotation. (d) Double RL-rotation.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Example:

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


DESIGN AND ANALYSIS OF ALGORITHMS Module-3
▪ An example of Constructing an AVL tree for a given list of numbers is shown in
Figure 6.6.
▪ As you trace the algorithm’s operations, keep in mind that if there are several
nodes with the ±2 balance, the rotation is done for the tree rooted at the
unbalanced node that is the closest to the newly inserted leaf.

Fig Conti…

FIGURE 6.6: Construction of an AVL tree for the list 5, 6, 8, 3, 2, 4, 7 by successive insertions. The
parenthesized number of a rotation’s abbreviation indicates the root of the tree being reorganized.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


FIGURE 6.6: Construction of an AVL tree for the list 5, 6, 8, 3, 2, 4, 7 by successive insertions. The
parenthesized number of a rotation’s abbreviation indicates the root of the tree being reorganized.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


How efficient are AVL trees?

▪ As with any search tree, the critical characteristic is the tree’s height.
▪ It turns out that it is bounded both above and below by logarithmic functions.
▪ Specifically, the height h of any AVL tree with n nodes satisfies the inequalities

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


2-3 Trees
▪ 2-3 trees, introduced by the U.S. computer scientist John Hopcroft in 1970.
▪ A 2-3 tree is a tree that can have nodes of two kinds: 2-nodes and 3-nodes.

2-node:

▪ A 2-node contains a single key K and has two children: the left child serves as the
root of a subtree whose keys are less than K, and the right child serves as the root
of a subtree whose keys are greater than K.
▪ In other words, a 2-node is the same kind of node we have in the classical binary
search tree (Figure).

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


3-node:

▪ A 3-node contains two ordered keys K1 and K2 (K1<K2) and has three children.
▪ The leftmost child serves as the root of a subtree with keys less than K1,
▪ the middle child serves as the root of a subtree with keys between K1 and K2, and
▪ the rightmost child serves as the root of a subtree with keys greater than K2
(Figure).

▪ The last requirement of the 2-3 tree is that all its leaves must be on the same
level.
▪ In other words, a 2-3 tree is always perfectly height-balanced: the length of a path
from the root to a leaf is the same for every leaf.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Searching in a 2-3 tree:

▪ Searching for a given key K in a 2-3 tree is quite straightforward.


▪ We start at the root.
▪ If the root is a 2-node, we act as if it were a binary search tree:
✔ we either stop if K is equal to the root’s key or
✔ continue the search in the left subtree if K is smaller than the root’s key or
✔ continue the search in the right subtree if K is larger than the root’s key.

▪ If the root is a 3 node, we know after no more than two key comparisons whether
the search can be stopped (if K is equal to one of the root’s keys) or in which of
the root’s three subtrees it needs to be continued.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Inserting in a 2-3 tree:

▪ Inserting a new key in a 2-3 tree is done as follows.


▪ First of all, we always insert a new key K in a leaf, except for the empty tree.
▪ The appropriate leaf is found by performing a search for K.

▪ If the leaf in question is a 2-node,


▪ we insert K there as either the first or the second key, depending on whether K is
smaller or larger than the node’s old key.

▪ If the leaf is a 3-node,


▪ we split the leaf in two:
✔ the smallest of the three keys (two old ones and the new key) is put in the
first leaf,
✔ the largest key is put in the second leaf, and
✔ the middle key is promoted to the old leaf’s parent.
▪ If the leaf happens to be the tree’s root, a new root is created to accept the middle
key.
▪ Note that promotion of a middle key to its parent can cause the parent’s overflow
(if it was a 3-node) and hence can lead to several node splits along the chain of the
leaf’s ancestors.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


▪ An example of a 2-3 tree construction is given in Figure 6.8.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


▪ As for any search tree, the efficiency of the dictionary operations depends on the
tree’s height.
▪ So let us first find an upper bound for it.

▪ A 2-3 tree of height h with the smallest number of keys is a full tree of 2-nodes
(such as the final tree in Figure 6.8 for h = 2).
▪ Therefore, for any 2-3 tree of height h with n nodes, we get the inequality.

▪ and hence

▪ On the other hand, a 2-3 tree of height h with the largest number of keys is a full
tree of 3-nodes, each with two keys and three children.
▪ Therefore, for any 2-3 tree with n nodes,

▪ and hence

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


▪ These lower and upper bounds on height h,

▪ imply that the time efficiencies of searching, insertion, and deletion are all in Ɵ
(log n) in both the worst and average case.
▪ We consider a very important generalization of 2-3 trees, called B-trees,

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Transform and Conquer Approach

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Transform and Conquer Approach

Heaps

▪ Heap is a partially ordered data structure that is especially suitable for implementing
priority queues.
▪ Priority queue is a multiset of items with an orderable characteristic called an item’s
priority, with the following operations:
✔ finding an item with the highest (i.e., largest) priority
✔ deleting an item with the highest priority
✔ adding a new item to the multiset

Notion of the Heap Definition:

A heap can be defined as a binary tree with keys assigned to its nodes, one key per node,
provided the following two conditions are met:
1. The shape property—the binary tree is essentially complete (or simply
complete), i.e., all its levels are full except possibly the last level, where only
some rightmost leaves may be missing.
2. The parental dominance or heap property—the key in each node is greater than
or equal to the keys in its children.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Illustration:

▪ The illustration of the definition of heap is shown bellow:


▪ Only the Fig. a is heap.
▪ The Fig. b is not a heap, because the tree’s shape property is violated. The left child of
last sub-tree cannot be empty.
▪ And the Fig. c is not a heap, because the parental dominance fails for the node with
key 5.

Fig. a Fig. b Fig. c

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Properties of Heap

1. There exists exactly one essentially complete binary tree with n nodes. Its height is
equal to [log2n].
2. The root of a heap always contains its largest element.
3. A node of a heap considered with all its descendants is also a heap.
4. A heap can be implemented as an array by recording its elements in the top down,
left-to-right fashion. It is convenient to store the heap’s elements in positions
1 through n of such an array, leaving H[0] either unused or putting there a sentinel
whose value is greater than every element in the heap. In such a representation,
a) the parental node keys will be in the first [n/2] positions of the array, while the
leaf keys will occupy the last [n/2] positions;
b) the children of a key in the array’s parental position i (1≤ i ≤ [n/2]) will be in
positions 2i and 2i + 1, and, correspondingly, the parent of a key in position i (2
≤ i ≤ n) will be in position [n/2].

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Fig. Heap and its array representation

Thus, we could also define a heap as an array H[1..n] in which every element in position
i in the first half of the array is greater than or equal to the elements in positions 2i and
2i + 1,
i.e.,
H[i] ≥ max {H [2i], H [2i + 1]} for i = 1. . . [n/2]

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Constructions of Heap

There are two principal alternatives for constructing Heap.


Bottom-up heap construction
Top-down heap construction

1. Bottom-up Heap Construction

▪ The bottom-up heap construction algorithm is illustrated below.


▪ It initializes the essentially complete binary tree with n nodes by placing keys in
the order given and then “heapifies” the tree as follows:
✔ Starting with the last parental node, the algorithm checks whether the
parental dominance holds for the key in this node. If it does not, the
algorithm exchanges the node’s key K with the larger key of its children
and checks whether the parental dominance holds for K in its new
position. This process continues until the parental dominance for K is
satisfied. (Eventually, it has to because it holds automatically for any key in
a leaf)
✔ After completing the “heapification” of the subtree rooted at the current
parental node, the algorithm proceeds to do the same for the node’s
immediate predecessor.
✔ The algorithm stops after this is done for the root of the tree.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


n=6
i=3
k=3
v=H[k]=H[3]=7

heap=false
j=2*k=6

H[k]🡨H[j]
H[3]🡨H[6]🡨8
k=j=6
H[k]🡨H[6]🡨v🡨7

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Illustration

Bottom-up construction of a heap for the list 2, 9, 7, 6, 5, 8. The double headed arrows
show key comparisons verifying the parental dominance.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Analysis of efficiency - bottom up heap construction algorithm

▪ Assume, for simplicity, that n = 2k − 1 so that a heap’s tree is full, i.e., the largest
possible number of nodes occurs on each level. Let h be the height of the tree.
▪ According to the first property of heaps in the list at the beginning of the section,
h=[log2n] or just [log2(n + 1)] = k − 1 for the specific values of n we are considering.
▪ Each key on level i of the tree will travel to the leaf level h in the worst case of the
heap construction algorithm. Since moving to the next level down requires two
comparisons—one to find the larger child and the other to determine whether the
exchange is required—the total number of key comparisons involving a key on level i
will be 2(h − i).
▪ Therefore, the total number of key comparisons in the worst case will be

▪ where the validity of the last equality can be proved either by using the closed-form
formula for the sum or by mathematical induction on h.

▪ Thus, with this bottom-up algorithm, a heap of size n can be constructed with fewer
than 2n comparisons.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


2. Top-down heap construction algorithm

▪ It constructs a heap by successive insertions of a new key into a previously


constructed heap.
1. First, attach a new node with key K in it after the last leaf of the existing heap.
2. Then shift K up to its appropriate place in the new heap as follows.
a. Compare K with its parent’s key: if the latter is greater than or equal to K,
stop (the structure is a heap); otherwise, swap these two keys and
compare K with its new parent.
b. This swapping continues until K is not greater than its last parent or it
reaches root.
▪ Obviously, this insertion operation cannot require more key comparisons than the
heap’s height.
▪ Since the height of a heap with n nodes is about log2 n, the time efficiency of
insertion is in O (log n).

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Illustration of inserting a new key:
Inserting a new key (10) into the heap is constructed bellow. The new key is shifted up
via a swap with its parents until it is not larger than its parents (or is in the root)

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Delete an item from a Heap:
Deleting the root’s key from a heap can be done with the following algorithm:

Maximum Key Deletion from a heap


1. Exchange the root’s key with the last key K of the heap.
2. Decrease the heap’s size by 1
3. “Heapify” the smaller tree by shifting K down the tree exactly in the same way
we did it in the bottom-up heap construction algorithm. That is, verify the
parental dominance for K: if it holds, we are done; if not, swap K with the larger
of its children and repeat this operation until the parental dominance condition
holds for K in its new position.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Illustration:

The efficiency of deletion is determined by the number of key comparisons needed to


“heapify” the tree after the swap has been made and the size of the tree is decreased by
1.
Since this cannot require more key comparisons than twice the heap’s height, the time
efficiency of deletion is in O (log n) as well.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Heap Sort

▪ Heapsort an interesting sorting algorithm is discovered by J. W. J. Williams.


▪ This is a two- stage algorithm that works as follows.

Stage 1 (heap construction): Construct a heap for a given array.


Stage 2 (maximum deletions): Apply the root-deletion operation n−1 times to the
remaining heap.

▪ As a result, the array elements are eliminated in decreasing order.


▪ But since under the array implementation of heaps an element being deleted is
placed last, the resulting array will be exactly the original array sorted in increasing
order.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Heap sort is traced on a specific input is shown below:

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Analysis of efficiency:

▪ Since we already know that the heap construction stage of the algorithm is in O(n),
we have to investigate just the time efficiency of the second stage.
▪ For the number of key comparisons, C(n), needed for eliminating the root keys from
the heaps of diminishing sizes from n to 2, we get the following inequality:

▪ This means that C(n) ∈ O(n log n) for the second stage of heap sort.
▪ For both stages, we get O(n) + O(n log n) = O(n log n).
▪ A more detailed analysis shows that the time efficiency of heap sort is, in fact, in Θ(n
log n) in both the worst and average cases.
▪ Thus, heap sort’s time efficiency falls in the same class as that of merge sort.
▪ Unlike the latter, heap sort is in-place, i.e., it does not require any extra storage.
▪ Timing experiments on random files show that heap sort runs more slowly than quick
sort but can be competitive with merge sort.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Space and Time Trade-Offs
▪ Space and time trade-offs in algorithm design are a well-known
issue for both theoreticians and practitioners of computing.

▪ In somewhat more general terms, the idea is to preprocess the


problem’s input, in whole or in part, and store the additional
information obtained to accelerate solving the problem afterward.
▪ We call this approach input enhancement.
▪ Following algorithms based on it:
Counting methods for sorting
Boyer-Moore algorithm for string matching and its simplified
version suggested by Horspool

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


▪ The other type of technique that exploits space-for-time trade-offs
simply uses extra space to facilitate faster and/or more flexible
access to the data. We call this approach prestructuring.
▪ This name highlights two facets of this variation of the
space-for-time trade-off: some processing is done before a problem
in question is actually solved but, unlike the input-enhancement
variety, it deals with access structuring.
▪ This approach is illustrate by:
Hashing
Indexing with B-trees

▪ There is one more algorithm design technique related to the


space-for-time trade-off idea: dynamic programming.
▪ This strategy is based on recording solutions to overlapping
subproblems of a given problem in a table from which a solution to
the problem in question is then obtained.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Sorting by Counting

▪ For each element of a list to be sorted, the total number of


elements smaller than this element and record the results in a
table.
▪ These numbers will indicate the positions of the elements in the
sorted list: e.g., if the count is 10 for some element, it should be in
the 11th position (with index 10, if we start counting with 0) in the
sorted array.
▪ Thus, we will be able to sort the list by simply copying its elements
to their appropriate positions in a new, sorted list.
▪ This algorithm is called comparison-counting sort.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Example: Sorting by Comparison Counting

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Time efficiency of this algorithm:

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Distribution Counting
EXAMPLE: Consider sorting the array

▪ whose values are known to come from the set {11, 12, 13} and should not
be overwritten in the process of sorting.
▪ The frequency and distribution arrays are as follows:

Frequencies: Number of times same element repeated in array.

Distribution Values: Frequency of current element + Distribution Values of


previous element. (for first element DV is 0)

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


▪ Note that the distribution values indicate the proper positions for the
last occurrences of their elements in the final sorted array.
▪ If we index array positions from 0 to n − 1,the distribution values must
be reduced by 1 to get corresponding element positions.
▪ It is more convenient to process the input array right to left.
▪ For the example, the last element is 12, and, since its distribution value
is 4, we place this 12 in position 4 − 1 = 3 of the array S that will hold
the sorted list.
▪ Then we decrease the 12’s distribution value by 1 and proceed to the
next (from the right) element in the given array.

Example of sorting by distribution counting

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


DESIGN AND ANALYSIS OF ALGORITHMS Module-3
Input Enhancement in String Matching

▪ The problem of string matching requires finding an occurrence of a


given string of m characters called the pattern in a longer string of n
characters called the text.
▪ Brute-force algorithm: it simply matches corresponding pairs of
characters in the pattern and the text left to right and, if a mismatch
occurs, shifts the pattern one position to the right for the next trial.
▪ Since the maximum number of such trials is n − m + 1 and, in the
worst case, m comparisons need to be made on each of them, the
worst-case efficiency of the brute-force algorithm is in the O(nm)
class.
▪ On average, however, we should expect just a few comparisons
before a pattern’s shift, and for random natural-language texts, the
average-case efficiency indeed turns out to be in O(n + m).

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Horspool’s Algorithm

Example: Searching for the pattern BARBER in some text:

▪ Starting with the last R of the pattern and moving right to left, we
compare the corresponding pairs of characters in the pattern and the text.
▪ If all the pattern’s characters match successfully, a matching substring is
found.
▪ Then the search can be either stopped altogether or continued if another
occurrence of the same pattern is desired.
▪ If a mismatch occurs, we need to shift the pattern to the right.
▪ Clearly, we would like to make as large a shift as possible without risking
the possibility of missing a matching substring in the text.
▪ Horspool’s algorithm determines the size of such a shift by looking at the
character c of the text that is aligned against the last character of the
pattern.
▪ This is the case even if character c itself matches its counterpart in the
pattern.
DESIGN AND ANALYSIS OF ALGORITHMS Module-3
In general, the following four possibilities can occur.

Case 1: If there are no c’s in the pattern—e.g., c is letter S in our example—


we can safely shift the pattern by its entire length (if we shift less, some
character of the pattern would be aligned against the text’s character c that is
known not to be in the pattern):

Case 2: If there are occurrences of character c in the pattern but it is not the
last one there—e.g., c is letter B in our example—the shift should align the
rightmost occurrence of c in the pattern with the c in the text:

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Case 3: If c happens to be the last character in the pattern but there are no
c’s among its other m − 1 characters—e.g., c is letter R in our example—the
situation is similar to that of Case 1 and the pattern should be shifted by the
entire pattern’s length m:

Case 4: Finally, if c happens to be the last character in the pattern and there
are other c’s among its first m − 1 characters—e.g., c is letter R in our
example— the situation is similar to that of Case 2 and the rightmost
occurrence of c among the first m − 1 characters in the pattern should be
aligned with the text’s c:

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


▪ We can precompute shift sizes and store them in a table.
▪ The table will be indexed by all possible characters that can be
encountered in a text, including, for natural language texts, the space,
punctuation symbols, and other special characters.
▪ The table’s entries will indicate the shift sizes computed by the formula

▪ For example, for the pattern BARBER, all the table’s entries will be equal
to 6, except for the entries for E, B, R, and A, which will be 1, 2, 3, and 4,
respectively.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Algorithm for computing the shift table entries:

▪ Initialize all the entries to the pattern’s length m and scan the pattern
left to right repeating the following step m − 1 times: for the jth
character of the pattern (0 ≤ j ≤ m − 2), overwrite its entry in the table
with m − 1 − j , which is the character’s distance to the last character of
the pattern.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Horspool’s algorithm

Step 1: For a given pattern of length m and the alphabet used in both
the pattern and text, construct the shift table as described
above.
Step 2: Align the pattern against the beginning of the text.
Step 3: Repeat the following until either a matching substring is found or
the pattern reaches beyond the last character of the text.
Starting with the last character in the pattern, compare the
corresponding characters in the pattern and text until either all
m characters are matched (then stop) or a mismatching pair is
encountered. In the latter case, retrieve the entry t (c) from the
c’s column of the shift table where c is the text’s character
currently aligned against the last character of the pattern, and
shift the pattern by t (c) characters to the right along the text.

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Pseudocode of Horspool’s algorithm:

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Example:

▪ Consider searching for the pattern BARBER in a text that comprises


English letters and spaces (denoted by underscores).
▪ The shift table, as we mentioned, is filled as follows:

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


Analysis:

▪ Worst-case efficiency of Horspool’s algorithm is in O(nm).


▪ But for random texts, it is in Ɵ(n)

DESIGN AND ANALYSIS OF ALGORITHMS Module-3


DESIGN AND ANALYSIS OF ALGORITHMS Module-3

You might also like