Data Structures
Software development
Data Structures
• A data structure is any data representation and its associated
operations. Even an integer or floating point number stored on the
computer can be viewed as a simple data structure. More typically, a
data structure is meant to be an organization or structuring for a
collection of data items. A sorted list of integers stored in an array is
an example of such a structuring.
• Given sufficient space to store a collection of data items, it is always
possible to search for specified items within the collection, print or
otherwise process the data items in any desired order, or modify the
value of any particular data item.
Data Structures
• Thus, it is possible to perform all necessary operations on any data
structure. However, using the proper data structure can make the
difference between a program running in a few seconds and one
requiring many days.
• A solution is said to be efficient if it solves the problem within the
required resource constraints. Examples of resource constraints
include the total space available to store the data — possibly divided
into separate main memory and disk space constraints — and the
time allowed to perform each subtask.
Data Structures
• A solution is sometimes said to be efficient if it requires fewer
resources than known alternatives, regardless of whether it meets
any particular requirements.
• The cost of a solution is the amount of resources that the solution
consumes.
Data Structures
• When selecting a data structure to solve a problem, you should follow
these steps.
• 1. Analyse your problem to determine the basic operations that must
be supported. Examples of basic operations include inserting a data
item into the data structure, deleting a data item from the data
structure, and finding a specified data item.
• 2. Quantify the resource constraints for each operation.
• 3. Select the data structure that best meets these requirements
Test for equality
• The most important basic operators are comparison and assignment,
i.e., the test for equality (and for order in the case of ordered types),
and the command to enforce equality. The fundamental difference
between these two operations is emphasized by the clear distinction
in their denotation throughout this text.
• Test for equality: x = y (an expression with value TRUE or FALSE)
Assignment to x: x := y (a statement making x equal to y)
The concept of data types
• A new, primitive type is definable by enumerating the distinct values
belonging to it. Such a type is called an enumeration type. Its
definition has the form
• TYPE T = (c1, c2, ... , cn)
• T is the new type identifier, and the ci are the new constant
identifiers.
• Examples
• TYPE shape = (rectangle, square, ellipse, circle)
• TYPE colour = (red, yellow, green)
• TYPE sex = (male, female)
• TYPE weekday = (Monday, Tuesday, Wednesday, Thursday, Friday,
Saturday, Sunday)
• TYPE currency = (franc, mark, pound, dollar, shilling, lira, yen)
• TYPE destination = (hell, purgatory, heaven)
• TYPE vehicle = (train, bus, automobile, boat, airplane)
• The definition of such types introduces not only a new type identifier, but at the
same time the set of identifiers denoting the values of the new type. These
identifiers may then be used as constants throughout the program, and they
enhance its understandability considerably. If, as an example, we introduce
variables s, d, r, and b.
• VAR s: sex
• VAR d: weekday
• VAR r: rank
then the following assignment statements are possible:
s: = male
d: = Sunday
r: = major
b: = TRUE
List, Stacks and Queues
• If your program needs to store a few things — numbers, payroll
records, or job descriptions for example — the simplest and most
effective approach might be to put them in a list.
• We define a list to be a finite, ordered sequence of data items known
as elements. “Ordered” in this definition means that each element
has a position in the list. (We will not use “ordered” in this context to
mean that the list is sorted.) Each list element has a data type. In the
simple list implementations discussed in this chapter, all elements of
the list have the same data type, although there is no conceptual
objection to lists whose elements have differing data types if the
application requires it.
Lists
• A list is said to be empty when it contains no elements. The number
of elements currently stored is called the length of the list. The
beginning of the list is called the head, the end of the list is called the
tail. There might or might not be some relationship between the
value of an element and its position in the list. For example, sorted
lists have their elements positioned in ascending order of value, while
unsorted lists have no particular relationship between element values
and positions.
Arrays
• The array is probably the most widely used data structure; in some
languages it is even the only one available. An array consists of
components which are all of the same type, called its base type; it is
therefore called a homogeneous structure.
• The array is a random-access structure, because all components can
be selected at random and are equally quickly accessible. In order to
denote an individual component, the name of the entire structure is
augmented by the index selecting the component. This index is to be
an integer between 0 and n-1, where n is the number of elements,
the size, of the array
Array-based list
• There are two standard approaches to implementing lists,
1. the array-based list
2. the linked list.
Stacks
• The stack is a list-like structure in which elements may be inserted or
removed from only one end. While this restriction makes stacks less
flexible than lists, it also makes stacks both efficient (for those
operations they can do) and easy to implement. Many applications
require only the limited form of insert and remove operations that
stacks provide. In such cases, it is more efficient to use the simpler
stack data structure rather than the generic list
Stacks
• Despite their restrictions, stacks have many uses. Thus, a special
vocabulary for stacks has developed. Accountants used stacks long before
the invention of the computer. They called the stack a “LIFO” list, which
stands for “Last-In, FirstOut.” Note that one implication of the LIFO policy is
that stacks remove elements in reverse order of their arrival.
• It is traditional to call the accessible element of the stack the top element.
Elements are not said to be inserted; instead they are pushed onto the
stack. When removed, an element is said to be popped from the stack.
• As with lists, there are many variations on stack implementation. The two
approaches presented here are array-based and linked stacks, which are
analogous to array-based and linked lists, respectively.
Queues
• Like the stack, the queue is a list-like structure that provides restricted
access to its elements. Queue elements may only be inserted at the
back (called an enqueuer operation) and removed from the front
(called adequeue operation). Queues operate like standing in line at a
movie theatre ticket counter. If nobody cheats, then newcomers go
to the back of the line. The person at the front of the line is the next
to be served. Thus, queues release their elements in order of arrival.
Accountants have used queues since long before the existence of
computers. They call a queue a “FIFO” list, which stands for “First-In,
First-Out.”
Queues
• There are two implementations for queues: the array-based queue
and the linked queue.
Binary Trees
• Tree structures permit both efficient access and update to large
collections of data. Binary trees in particular are widely used and
relatively easy to implement. But binary trees are useful for many
things besides searching. Just a few examples of applications that
trees can speed up include prioritizing jobs, describing mathematical
expressions and the syntactic elements of computer programs, or
organizing the information needed to drive data compression
algorithms.
Binary Tree
• A binary tree is made up of a finite set of elements called nodes. This
set either is empty or consists of a node called the root together with
two binary trees, called the left and right subtrees, which are disjoint
from each other and from the root. (Disjoint means that they have no
nodes in common.) The roots of these subtrees are children of the
root. There is an edge from a node to each of its children, and a node
is said to be the parent of its children.
• A leaf node is any node that has two empty children. An internal node
is any node that has at least one non-empty child.
Binary Tree
Example:
An example binary tree. Node A is the root. Nodes B and C are A’s children. Nodes B
and D together form a subtree. Node B has two children: Its left child is the empty
tree and its right child is D. Nodes A, C, and E are ancestors of G. Nodes D, E, and F
make up level 2 of the tree; node A is at level 0. The edges from A to C to E to G
form a path of length 3. Nodes D, G, H, and I are leaves. Nodes A,B,C,E,and F are
internal nodes. The depth of I is3. The height of this tree is 4.
Binary Trees
• Two restricted forms of binary tree are sufficiently important to
warrant special names. Each node in a full binary tree is either
• (1) an internal node with exactly two non-empty children or
• (2) a leaf. A complete binary tree has a restricted shape obtained by
starting at the root and filling the tree by levels from left to right.
Binary Search Trees
• A BST is a binary tree that conforms to the following condition, known
as the Binary Search Tree Property: All nodes stored in the left
subtree of a node whose key value is K have key values less than K. All
nodes stored in the right subtree of a node whose key value is K have
key values greater than or equal to K.
Examples
• Two Binary Search Trees for a collection of values. Tree(a)results if
values are inserted in the order 37, 24, 42, 7, 2, 40, 42, 32, 120. Tree
(b) results if the same values are inserted in the order 120, 42, 42, 7,
2, 32, 37, 24, 40.
Example
• Consider searching for the node with key value 32 in the tree of the
previous Figure.
• Because 32 is less than the root value of 37, the search proceeds to
the left subtree. Because 32 is greater than 24,we search in 24’s right
subtree. At this point the node containing 32 is found. If the search
value were 35, the same path would be followed to the node
containing 32. Because this node has no children, we know that 35 is
not in the BST.
Sorting
• Sorting is generally understood to be the process of rearranging a
given set of objects in a specific order. The purpose of sorting is to
facilitate the later search for members of the sorted set.
• Given a set of records r1, r2, ..., rn with key values k1, k2, ..., kn, the
Sorting Problem is to arrange the records into any order s such that
records rs1, rs2,..., rsn have keys obeying the property ks1 ≤ ks2 ≤ ... ≤
ksn. In other words, the sorting problem is to arrange a set of records
so that the values of their key fields are in non-decreasing order. As
defined, the Sorting Problem allows input with two or more records
that have the same key value. Certain applications require that input
not contain duplicate key values.
Insertion sort
• Imagine that you have a stack of phone bills from the past two years
and that you wish to organize them by date. A fairly natural way to do
this might be to look at the first two bills and put them in order. Then
take the third bill and put it into the right order with respect to the
first two, and soon. As you take each bill, you would add it to the
sorted pile that you have already made. This naturally intuitive
process is the inspiration for our first sorting algorithm, called
Insertion Sort. Insertion Sort iterates through a list of records. Each
record is inserted in turn at the correct position within a sorted list
composed of those records already processed.
Bubble Sort
• It is a relatively slow sort, it is no easier to understand than Insertion
Sort, it does not correspond to any intuitive counterpart in
“everyday” use, and it has a poor best-case running time.
• Bubble Sort consists of a simple double for loop. The first iteration of
the inner for loop moves through the record array from bottom to
top, comparing adjacent keys. If the lower-indexed key’s value is
greater than its higher-indexed neighbour, then the two values are
swapped. Once the smallest value is encountered, this process will
cause it to “bubble” up to the top of the array.
Bubble sort
• The second pass through the array repeats this process. However,
because we know that the smallest value reached the top of the array
on the first pass, there is no need to compare the top two elements
on the second pass. Likewise, each succeeding pass through the array
compares adjacent elements, looking at one less value than the
preceding pass
Selection Sort
• Consider again the problem of sorting a pile of phone bills for the past
year. Another intuitive approach might be to look through the pile
until you find the bill for January, and pull that out. Then look through
the remaining pile until you find the bill for February, and add that
behind January. Proceed through the ever-shrinking pile of bills to
select the next one in order until you are done. This is the inspiration
for our last Θ(n2) sort, called Selection Sort. The ith pass of Selection
Sort “selects” the ith smallest key in the array, placing that record into
position i.
Selection sort
• In other words, Selection Sort first finds the smallest key in an
unsorted list, then the second smallest, and so on. Its unique feature
is that there are few record swaps. To find the next smallest key value
requires searching through the entire unsorted portion of the array,
but only one swap is required to put the record in place.
Searching
• Search can be viewed abstractly as a process to determine if an
element with a particular value is a member of a particular set. The
more common view of searching is an attempt to find the record
within a collection of records that has a particular key value, or those
records in a collection whose key values meet some criterion such as
falling within a range of values.
Searching
• A successful search is one in which a record with key kj = K is found.
An unsuccessful search is one in which no record with kj = K is found
(and no such record exists). An exact-match query is a search for the
record whose key value matches a specified key value. A range query
is a search for all records whose key value falls within a specified
range of key values.
Hashing
• The process of finding a record using some computation to map its key
value to a position in the table is called hashing. Most hashing schemes
place records in the table in whatever order satisfies the needs of the
address calculation, thus the records are not ordered by value or
frequency. The function that maps key values to positions is called a hash
function and is usually denoted by h. The array that holds the records is
called the hash table and will be denoted by HT. A position in the hash
table is also known as a slot. The number of slots in hash table HT will be
denoted by the variable M, with slots numbered from 0 to M −1. The goal
for a hashing system is to arrange things such that, for any key value K and
some hash function h, i = h(K) is a slot in the table such that 0 ≤ h(K) < M,
and we have the key of the record stored at HT[i] equal to K.
Hashing
• Hashing only works to store sets. That is, hashing can not be used for
applications where multiple records with the same key value are
permitted. Hashing is not a good method for answering range
searches. In other words, we cannot easily find all records (if any)
whose key values fall within a certain range. Nor can we easily find
the record with the minimum or maximum key value, or visit the
records in key order. Hashing is most appropriate for answering the
question, “What record, if any, has key value K?” For applications
where access involves only exact-match queries, hashing is usually
the search method of choice because it is extremely efficient when
implemented correctly
Hashing
• Finding a record with key value K in a database organized by hashing
follows a two-step procedure:
• 1. Compute the table location h(K).
• 2. Starting with slot h(K), locate the record containing key K using (if
necessary) a collision resolution policy