KEMBAR78
Data Structures Using C Compress | PDF | Data Type | Integer (Computer Science)
100% found this document useful (1 vote)
691 views323 pages

Data Structures Using C Compress

This document appears to be the table of contents for a book about data structures using C. It lists 13 chapters that cover fundamental concepts like data representation, arrays, strings, pointers, stacks and queues, recursion, lists, sorting, searching, trees, and graphs. It also acknowledges help from others and thanks readers' families for their support during the writing process. The book aims to teach C and data structures in a balanced way for beginners or a one-year course.

Uploaded by

PRABHANJAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
691 views323 pages

Data Structures Using C Compress

This document appears to be the table of contents for a book about data structures using C. It lists 13 chapters that cover fundamental concepts like data representation, arrays, strings, pointers, stacks and queues, recursion, lists, sorting, searching, trees, and graphs. It also acknowledges help from others and thanks readers' families for their support during the writing process. The book aims to teach C and data structures in a balanced way for beginners or a one-year course.

Uploaded by

PRABHANJAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 323

/

M -rv *
v-


'1 i

Data
Structures
Using C

\
Samir Kumar Bandyopadhyay
Kashi Nath Dey

ALWAYS LEA R N IN G PEARSON


Data Structures
l/sing C
Other related Pearson Education titles available in LOULI PfllC€ 6DITION
Data Structures Using C
Aaron M Tenenbaum

Datastructures & Program Design in C, 2/e


Robert Kruse, Leung, C.L. Tondo

Data Structures Using C and C++, 2/e


Yedidyah Langsam, Moshe J. Augenstein, Aaron M Tenenbaum

Data Structures and Algorithm Analysis in C, 2/e


Mark Alien Weiss

Data Structures and Algorithm Analysis in C++, 2/e


Mark Allen Weiss

Data Structures & Algorithms


Alfred V. Aho, John E. Hopcroft, Jeffrey D. Ullman

Data Structures Using Java


Yedidyah Langsam, Moshe J. Augenstein, Aaron M. Tenenbaum

Data Abstraction & Problem Solving with Java


Frank M. Carrano, Janet J. Prichard

Data Structures in JAVA


Thomas A. Standish

Data Structures and Software Development in an Object Oriented Domain, Java Edition
Jean-Paul Tremblay, Grant A. Cheston
‘a

For more details log on to W W W .p e a r S O n e d .C O .in


Data Structures
(/sing C

Samir Kumar Bandyopadhyay


Registrar
West Bengal University o f Technology
Reader; D epartm ent o f Computer Science and Engineering
University o f Calcutta

Kashi Nath Dey


Senior Faculty
D epartm ent o f Computer Science and Engineering
University o f Calcutta

PEARSON
Copyright © 2009 Dorling Kindersley (India) Pvt. Ltd.
Licensees of Pearson Education in South Asia

No part of this eBook may be used or reproduced in any manner whatsoever without the publisher’s
prior written consent.

This eBook may or may not include all assets that were part of the print version. The publisher
reserves the right to remove any material in this eBook at any time.

ISBN 9788131722381
eISBN 9789332501362

Head Office: A-8(A), Sector 62, Knowledge Boulevard, 7th Floor, NOIDA 201 309, India
Registered Office: 11 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India
PREFACE

This book aims to cater to beginners who look to learning C and data structure under the same umbrella.
While teaching C and data structure, we felt the need for a balanced book on the subject. In fact, this is the
main impetus for writing such a book.
The book is designed for a one-semester course or a one-year course. It is suitable for courses based
on algorithms and data structures. The prerequisite for using this text is elementary to middle level
knowledge of C programming.
Algorithms in the book are presented in a way that readers can easily understand the method of
solving problems. Concepts are illustrated through examples. All programs in the text are tested. Each
chapter ends with exercises containing questions of varied difficulty levels.
Chapter 1 deals with basic data representation techniques. Chapter 2 concentrates on abstract data
types and structures together with the concepts of implementing a data structure. Chapter 3 covers array
data structures, the simplest and one of the best-knWn linear data structures, and their implementation
details. As applications of array data structures, Chapter 4 deals with string processing and data matching
techniques. Chapter 5 introduces the concept of pointers in C. Why pointers play a key role in algorithm
implementation, and how and when to use them are discussed in detail. Stacks and queues are covered in
Chapter 6. Though they are special type of lists, this chapter deals only with their array implementation.
Expression evaluation is discussed as an application of stack. A rudimentary program for skill testing in
multiplication is presented as an application to queue.
Chapter 7 covers recursion, a problem-solving technique. Lists are defined in Chapter 8. This chapter
elaborates on the concepts of linked lists, their implementation techniques using both arrays and pointers.
Linked list manipulation and list searching is also covered in the chapter. A word indexing program is
explained and presented as an application through both array and linked version. Chapter 9 brings forth
different variants of linked lists. Linked implementation of stacks and queues are presented in this chapter.
It also focuses on major application areas of linked lists. Chapter 10 discusses details of internal sorting
as well as some external sorting algorithms. Given a large number of internal sorting techniques, one
must choose the best alternative for a particular problem.
Chapter 11 deals with various searching methodologies. The concept of trees is introduced in
Chapter 12. It starts with the general tree and then concentrates on binary trees, tree traversal techniques,
binary search trees, AVL trees, and B-trees. Chapter 13 describes graphs. The coverage of graph algorithms
completes the basic understanding of data structure. The chapter discusses only fundamental graph
algorithms.
It is possible that C language will be replaced by a better language in the near future. However, we
think our techniques will remain with our readers.
We acknowledge the help of Dr. S. SenSarma, Reader of the Department of Computer Science and
Engineering, University of Calcutta, for taking an active interest in different ways. We also thank our
colleagues, students, and other members of the Department of Computer Science and Engineering,
University of Calcutta, for providing the right environment during the preparation of the manuscript.
We thank the members of our family without whose help this book could not be written.

Samir Kumar Bandyopadhyay


Kashi Nath Dey
This page is intentionally left blank.
CONTENTS

Preface v

1. FUNDAMENTALS OF DATA REPRESENTATION_____________________________ 1


1.1 Basic Concepts o f Data Representation 1
1.2 Data Type 2
1.3 Data Abstraction and Abstract Data Types 3
1.4 System-defined Data Type 4
1.5 Primitive Data Structures and Their Representation 5
Exercises 7

2. FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS_______________9


2.1 Introduction to Data Structure 9
2.2 Algorithm for Data Structure 9
2.3 Notation for Algorithm 10
2.4 Modularisation to Algorithm Design 18
2.5 Analysis o f Algorithms 22
2.6 Structured Programming 25
Exercises 27

3. ARRAYS___________________________________________________________ 29
3.1 Linear Arrays 29
3.2 Arrays in C 29
3.3 Initializing Arrays 32
3.4 Insertion and Deletion 34
3.5 Multidimensional Arrays 36
3.6 Row-major and Column-major Order 38
Exercises 50

4- STRING PROCESSING AND PATTERN MATCHING_________________________ 52


4.1 Introduction to String Processing 52
4.2 String Representation 53
4.3 String Manipulation 54
4.4 Pattern Matching 60
4.5 The Brute-Force Algorithm 61
4.6 Kunth-Morris-Pratt Algorithm 63
viii ■ CONTENTS ■

4.7 Boyer-Moore Algorithm 67


Exercises 69

5. POINTERS_________________________________________________________ 71
5.1 Introduction 71
5.2 Fundamentals and Defining Pointers 71
5.3 Type Specifiers and Scalars for Pointers 72
5.4 Operations Using Pointers 73
5.5 Passing Pointers to Functions 73
5.6 Pointers and Arrays, Pointer Arithmetic 74
5.7 Pointers and Two-dimensional Arrays 78
5.8 Array o f Pointers 80
5.9 Pointers to Pointers 83
5.10 Pointers to Functions 84
5.11 Command the Arguments 85
Exercises 88

6. STACKS AND QUEUES_______________________________________________ 90


6.1 Introduction to Stack 90
6.2 Array Implementation o f Stacks 93
6.3 Application o f Stack 98
6.4 Introduction to Queue 111
6.5 Queue Implementation Using Arrays 112
Exercises 118

7. RECURSION_______________________________________________________ 119
7.1 Basic Concepts o f Recursion 119
7.2 Recursion Implementation 124
7.3 The Tower o f Hanoi 126
7.4 Time and Space Requirements 132
7.5 Recursion vs Iteration 134
7.6 Examples 135
7.7 Cost o f Recursion 140
Exercises 142

8. LISTS_____________________________________________________________144
8.1 Sequential Lists 144
8.2 Linked Lists 146
■ CONTENTS ■ ix

8.3 List Implementations 152


8.4 Application o f Linked List (Array Based Implementation) 162
8.5 Pointer Based Implementation o f Linked Lists 168
8.6 Application o f Linked List (Pointer Based Implementation) 173
Exercises 176

9. LINKED LISTS— VARIANTS__________________________________________ 178


9.1 Linked Stacks 178
9.2 Linked Queues 180
9.3 Variants o f Linked Lists 182
9.4 Applications o f Linked Lists 189
Exercises 199

10. SORTING__________________________________________________________200
10.1 Introduction 200
10.2 Sorting Techniques 200
10.3 Sorting on Multiple Keys 229
Exercises 237

11. SEARCHING_______________________________________________________ 238


11.1 Introduction 238
11.2 Sequential Search 238
11.3 Binary Search 240
11.4 Indexed Sequential Search 243
11.5 Hashing Schemes 246
Exercises 251

12. TREES____________________________________________________________253
12.1 Fundamental Terminologies 253
12.2 Binary Trees 255
12.3 Traversals o f Binary Tree 256
12.4 Threaded Binary Tree 260
12.5 Binary Search Trees 263
12.6 AVL Trees 274
12.7 B-Trees 282
Exercises 292
x ■ CONTENTS ■

13. GRAPHS__________________________________________________________ 294


13.1 Introduction 294
13.2 Graph Fundamentals 295
13.3 Graph Representation 297
13.4 Graph Traversal 298
Exercises 309

INDEX 310
FUNDAMENTALS OF DATA REPRESENTATION

Data structure is the study of concrete implementations of frequently occurring abstract data
types. An abstract data type is a set, together with a collection of operations on the elements of
the set. There are several terms we need to define carefully before we proceed to different types
of data structures such as arrays, stacks, linked list, and so on.
The meaning of data representation is introduced in Section 1.1. In Section 1.2 there are
definitions of data types, data object, and data structure. The notion of abstraction is very im-
portant in computing. We are particularly interested in its application to data stored in a digital
computer. In Section 1.3 we will introduce the concept of data abstraction and abstract data
types.
A data type is an abstract concept defined by a set of logical properties. Once such an
abstract data type is defined, it is important to know how to implement it in a machine. Section
1.4 describes the system-defined data types. Section 1.5 will highlight the concepts of primitive
data structure. C language is used in this book since it is used globally and continues to grow in
popularity.

1.1 BASIC CONCEPTS OF DATA REPRESENTATION


The study of any aspect in computer science involves the processing of information. Data is
defined as a raw fact but information is called processed data. A data value is a piece of data
that we can consider as a single entity. We might consider the integer value 123 as a single
value. If a data value can be decomposed into component parts, we call each part a component
element. An atomic data value is a piece of data that we choose to consider as a single, non-
decomposable entity. For example, the integer 45923 may be considered as a single decompos-
able entity. If we wish to decompose it into 4, 5, 9, 2, and 3, we may do so.
A natural level at which to stop the decomposition of data values stored in a digital
storage medium is the bit. Logically, we may think of a bit as a data element that must have at
any time one of the two values, and we will assign it the numeric values 0 and 1. Of course, we
may decompose these if we wish. If the value is stored on a magnetic disc, for example, it is
represented by an electromagnetic signal which is recorded on or in the disc surface. Taking the
abstract point of view, we will ignore how the values are physically stored. We might think of
this point as one boundary between hardware and software.
In computers the most widely used method for storing integers is binary number system.
The base of this system is 2. Each bit position represents a power of 2 with a 2° in LSB (least
significant bit), 21 next to LSB, and so on. For example, 10010 represents the integer n bit x 2° x
0 + 21 x 1 + 22 x 0 + 23 x 0 + 24 x 1 = 18. In this representation a string of n bit represents integer
numbers between 0 and 2n - 1. The negative binary numbers are stored in a two-complement
form. Given n bits, the range of numbers that can be represented is - 2 (n_1) to 2 (n-1) -1.
2 ■ DATA STRUCTURES USING C ■

Real numbers, in computers, are stored in a floating-point notation. In this representa-


tion, a real number is expressed in two parts, mantissa and exponent. The base of an exponent
is usually fixed, and the mantissa and exponent vary to represent different real numbers. For
example, the decimal number 125.55 could be represented as 12,555xl0~2. The mantissa is 12,555
and the exponent is -2. The advantage of this representation is that it can be used to represent
numbers with extremely large or extremely small absolute values. Usually in a 32-bit word
length, 24 bits are reserved for mantissa and 8 bits for exponent. The size of mantissa and expo-
nent depends on the machine configuration.
Data is not always interpreted numerically but is often stored in a non-numeric form. The
number of bits necessary to represent a character in a particular computer is called the byte size
and a group of bits of that number is called a byte. For character representation two types of
code are normally used, American Standard Code for Information Interchange (ASCII) and
Extended Binary Coded Decimal Interchange Code (EBCDIC). Both use a byte to represent a
character. So 256 possible characters can be represented using these codes with a size of 1 byte.
For example, in ASCII the capital letter 'A' is represented by the decimal number 65.
In computers, the internal representation of an integer or real or character is a string of bit
pattern. For example, the bit string 01100110 can be interpreted as the number 66 (in binary
coded decimal), which represents the character 'B'. A method of interpreting a bit pattern is
often called a data type. We use several data types such as binary, real, and so on, in the context
of their representation in the computer. In the next section, we will describe the basic concept of
data types related to abstraction of data.

DATATYPE
A data type is a collection of values along with a set of operations defined on those values. The
essence of a type is that it attempts to identify qualities common to a group of individuals or
objects that distinguish it as an identifiable class or kind. In a programming language, the data
type of a variable is the set of values that the variable may assume. The basic data types vary
from language to language.
Let us look at two classes of data types. A simple, or basic, data type is made up of values
that cannot be decomposed. In 'C', they are int (for integers), floa t (for real), char (for char-
acters), and so on. A composite data type, also called a data structure, is one in which the ele-
ments of the data type can be decomposed into either simple data types or other composite data
types. Examples of composite types include the familiar array and structure in C language. In
data structure the values of data types are decomposable, and we must therefore be aware of
their internal construction. There are two essential ingredients to any object that can be decom-
posed—it must have component elements and it must have structure, the rules for relating or
fitting the elements together.
The operations of a structured data type might not only act on the values of the data type,
they might also act on component elements of the data structure. We now present the formal
definitions of some terms that must be known to the readers.
In a programming language, a data type is a term that refers to the nature of data which
variables hold. In C the data types are int, float, char, short, unsigned, double,
long, and so on. These are built-in data types and type de f in C can be used to construct new
data types.
■ FUNDAMENTALS OF DA TA REPRESENTA TION ■ 3

Data object refers to a set of elements, say F. For example, the data object 'float' refers to
F={0, ± -5, ± *6 +....}.
Similary, the data type 'int' in C language refers to data object integers, that is, a variable
of int data type can hold only integer-type data objects.
We are not only interested in the content of data objects but we also need to know the
way they are related. A data structure is a data type whose values are composed of component
elements that are related by some structure. Since a data structure is a data type, it must have a
set of operations on its value. Further, there may be operations that act on its component ele-
ments. We can write program using the operations defined on the data and its structure. We
imagine an abstract data type in our program and we can do so, being concerned with neither
how the data will be represented in the computer nor the details of the code that implements the
operations.
In the next section, we will present data abstraction and abstract data types.

1.3 DATA ABSTRACTION AND ABSTRACT DATA TYPES


One of the most powerful ideas in programming and problem solving is the concept of abstrac-
tion—the ability to view something as a high-level object while temporarily ignoring the enor-
mous amount of underlying detail associated with that object. Another way to represent ab-
straction is viewing something only in terms of its external appearance without regard for its
internal implementation. It is difficult to manage a complex system without abstraction.
We can define an abstraction more formally as an idea that concentrates on the essential
properties of something rather than on concrete realization or actual cases.
In computer science the process of abstraction is to simplify by separating the essential
qualities of data, their structure and operations, from the inessential details of their representa-
tion and implementation. One of the basic problems in computer science is the amount of com-
plexity to be reduced in the software that we might wish to build. Our approach is therefore to
begin the study of each data structure by considering only the specification of its abstract data
type, independent of its representation and implementation. This simplifies the study of the
data structure. Thus we attempt to bring the power of abstraction to bear on the study of data
structure.
The abstract data type approach lends itself in a natural way for separating the specifica-
tion of a data type from its implementation. The implementation can then assure that the integ-
rity of the data structure is protected.
We can think of an abstract data type (ADT) as a mathematical model with a collection of
operations defined on that model. We can define the abstract data type as follows.
An abstract data type indicates a data type that exists as a product of our imagination
and concentrates on the essential properties of the data type, ignoring implementation con-
straints and details.
There are several important advantages associated with the study of data structure from
the point of view of abstract data types. Here we will discuss some of them.
As defined earlier, an abstraction is an idea that concentrates on the essential properties
rather than on concrete realizations or actual cases. The objective is thus to simplify by isolating
the essential qualities of data, their structure and operations, from inessential details of their
representation and implementation. This has the effect of simplifying the study of the data
4 ■ DATA STRUCTURES USING C ■

structure. Thus we attempt to bring the power of abstraction to bear on the study of data struc-
tures. In order to do that, we provide a template to view and discuss each data structure. This
template is called an abstract data structure and consists of three basic components: (i) specifi-
cation, (ii) representation, and (iii) implementation.
Our approach throughout the book is to implement data structures using modules. These
modules here act as black boxes. The user has no direct access to the data structure since the
data structure and the algorithms are encapsulated within the module. The integrity of the data
structure is protected because the user gets control over it through operations that are sepa-
rately specified and implemented. The implementation is done very carefully and in such a way
as to assure preservation of the integrity of the data structure. This is an advantage to users for
designing a software system.
Another advantage is maintainability. Implementation independence frees the user from
non-functional details. The implementation may be changed with no effect on the way in which
the program executes. It may, however, affect the performance, that is, time, space, and main-
tainability. For example, if we change the basic technique used to perform an operation—with-
out changing the operation performed—then the user may see a change in the performance for
the module containing that operation but will not see any change in the results produced. The
user is protected from changes in the way in which operations are implemented.
If an abstract data structure is to be more than mere theoretical interest, it must be imple-
mented. Although the user still deals with the abstract conception of the structure, and indeed,
the notion of abstract data structuring is to guarantee that the user need deal with no more than
abstraction. The implementor must face the problems of representation and implementation.
As was told earlier, the abstraction can be treated as the functional specifications of a black box.
The implementor must design the box in such a way that memory space is not wasted and the
operations are performed simply and efficiently. The implementor must be familiar with the
physical data type and virtual data type. We often implement our data type using some high-
level language. For example, in C language we might define the variables A, B, and C as integer,
real, and character data type,
int A;
float B;
char C [10 0];
We call the above as virtual data type. Eventually any structure is stored in a physical
memory to be operated on a physical machine, that is, computer. The actual physical operations
that the machine can perform are limited to those in its machine language. We will call a data
type at this level a physical data type. Thus abstract data types are implemented with virtual
data types. Virtual data types are translated into physical data types.
In summary, we have understood the basic idea of an abstract data type. Many different
modules can be written that implement the same abstract type. Advantages of abstract data
types are highlighted.

SYSTEM^DEFINED
In defining an abstract data type as a mathematical concept, we do not consider the implemen-
tation issue. Often no implementation, hardware or software, can model a mathematical con-
cept completely. For example, an arbitrarily large integer cannot be represented due to the finite
size of the machine's memory. Thus, it is not the data type 'integer' that is represented by the
■ FUNDAMENTALS OF DA TA REPRESENTA TION ■ 5

hardware but rather the data type 'interger between a and b', where a and b are the minimum
and maximum integers representable by that machine. Once a representation has been chosen
for objects of a particular data type and routines have been written to operate on those represen-
tations, the programmer is free to use that data type to solve a problem. The programmer need
not worry about how the computer is designed and what circuitry is used to execute each in-
struction. The programmer needs know only what instructions are available and how those
instructions can be used. The programmer must know about the data types which are available
in the system.
A data type is an abstract concept by a set of logical properties. We can define a limitless
number of data types but system considerations are necessary before implementing the data
types. In hardware implementation, proper circuitry is necessary to perform the requisite op-
erations and software implementation includes specifications of how such data types are to be
manipulated.
Every computer system has a set of 'native' data types. It is the programmer's responsi-
bility to know what data types are available in the system and how they are stored in memory.
C has the 'usual' simple data types: characters, integers, and numbers with fractional
components. In addition, C allows variants of some of these types. Simple types include char,
int, float, and double. These types differ in the sort of information they contain and in
the amount of storage space allocated to them on different systems, ranging from 1 to 8 bytes.
C's character data type is known as char. C allocated 1 byte for storing a character. We
can store at the most 256 values. The integer data type, i n t , is used to represent whole numbers
within a specified range of values. Variables of type int are usually stored in 2 bytes ranging
from - 32,768 to 32,767. Internal representation of a whole number can be treated as a character
or an integer.
The real numbers are represented by data type float. C allocated four types to repre-
sent variables of floating type. The double type is for double-precision floating-point numbers.
For doubles, C allocates 8 bytes as storage space.
In addition to simple types, C provides other types, which are variations of char and
int. These types are essentially for a different amount of memory to be allocated for storing a
value. Signed types, where numbers can be positive or negative, are standard in most languages.
C also allows us to declare integers as unsigned, so the sign bit is used as part of the number
rather than as a sign indicator. Short and long are requests for versions of the int type for
which different amount of storage may be allocated. The amount of space allocated for these
variants depends on the implementation. For example, short reserves 1 byte while long requires
4 bytes as storage space. Finally, to represent a number as a hexadecimal constant, Ox or OX is
placed before the hexadecimal representation and octal integers start with the digit 0.

1.5 PRIMITIVE DATA STRUCTURES AND THEIR REPRESENTATION


In this section we discuss the primitive data structures which are commonly used to solve prob-
lems with a computer system. Primitive data structure is defined as a structure which can be
operated by machine-level instructions. These structures are the basis of all other types of data
structures.
We begin the discussion of primitive data structures by examining integer and real num-
bers. A quantity representing an object is discrete in nature and can be represented by an inte-
ger. The integer is also used to represent whole numbers. For example, total number of students
6 ■ DATA STRUCTURES USING C ■

in a class, the number of passengers in a train, and so on, are all information items expressible as
integers. Integers are represented in signed magnitude form, signed complement form, and
signed 2's complement form. Although the first form is probably the simplest in concept, other
methods such as two's complement representation are used in modern computer systems to
simplify the design of computer circuitry. The data type integer provided by C can be viewed as
an abstract data type whose specification is as follows.
(a) It reserves 2 bytes for int and unsigned int but 1 byte for short int.
(b) The elements are the whole numbers from - maxinteger to maxinteger. The value of
maxinteger is implementation dependent.
(c) Integers are both ordered and linear.
(d) The set of operations is implementation dependent.
Real numbers can be represented by either fixed-point representation, such as in 15.75 or
floating-point representation, such as -1575 x 102. Floating-point representation is the most com-
mon storage structure used for real data. In this representation, the real number is expressed by
mantissa and exponent. For example, -1575 x 102 is expressed with a mantissa -1575 and expo-
nent 102, with a radix 10. The radix and the number of digit positions represented with a float-
ing-point format vary from one computer to another. Floating-point reals in C can be viewed as
an abstract data type with the following properties.
(a) The values are a finite subset of the real numbers. The actual subset is imple-
mentation dependent.
(b) The structure is ordered.
(c) Typical operations are assignment, arithmetic, relational, and so on.
(d) Four to eight bytes are required to store real numbers.
Usually, the sign is the first bit, that is, MSB (most significant bit) in a floating-point
representation, and by convention 0 denotes a positive number and 1 denotes a negative num-
ber (this is also true in case of integer-number representation). The biased exponent is an ex-
pression of the exponent in a form of notation called excess notation. In a seven-digit field, we
can express non-negative integers in the range 0 to 127 (in excess notation) though we are ca-
pable of representing integers in the range - 64 to + 63 in 2's complement representation. A
floating-point number with an exponent of -25 would have a characteristic of - 25 + 64 = 39 in
excess - 64 notation. Further, the mantissa part of a floating-point number is expressed as a
normalization form, that is, there is no significant digit before the decimal point. For example,
0.4272 is in normalized mantissa whereas 4.272 is not in normalized form.
In addition to a floating-point representation of real numbers, a fixed-point storage rep-
resentation is also possible. Real numbers are stored similar to the structure involving integer
numbers. In C language both representations are available with different control parameters
(such as %f and %e).
A wide variety of character sets or alphabets are handled by most computers. The two
most widely used codes are the ASCII and the EBCDIC. Characters are used as a primitive data
structure since they are useful in expressing much of the non-numeric information which can be
processed by a computer. Each character is stored as a fixed number of bits in the computer's
memory. A common technique for storing characters in a computer's memory is to store each
character in 1 byte. One byte is a sequence of 8 bits. In many digital computers it is the smallest
unit of information that can be addressed directly. Such machines operate efficiently on indi-
vidual characters.
■ FUNDAMENTALS OF DA TA REPRESENTA TION ■ 7

C allocates 1 byte for storing a character. This means that we can have at the most
256 character values. Generally, the byte used to store a character variable is interpreted as
having values ranging from -128 to 127. Depending on the character encoding scheme used,
positive values of a character variable correspond to particular characters. For example, a char-
acter value of 100 corresponds to the character 'd' in the ASCII character set used on most
systems. Negative values generally do not have a 'useful' interpretation. Also, integer numbers
and characters can be used interchangibly with the control parameters %d and %c.
A logical data item is a primitive data structure that can assume the value of either 'true'
or 'false'. C has three classes of operators: arithmetic, relational and logical, and bitwise opera-
tors. The key to the concept of relational and logical operators is the idea of true and false. In C,
true is any value other than 0 and false is 0. Therefore, expressions that use relational or logical
operators will return 1 for true and 0 for false. The three most common logical operators are
'AND' (&&), 'OR' (!!), and 'NOT' (!). If A and B are logical variables, then A && B is true if A and
B both have the value true, otherwise, the result is false. The result of A!!B is false if A and B
have the value false, otherwise the result is true. !A is false if A is true or !A is false if A is true.
Logical variables are used often to represent complex logical expressions and also as terminat-
ing conditions in the loop evaluation. Relational and logical operators always produce a result
that is either 0 or 1 and bitwise operators are used to change the values of variables, not to
evaluate true or false conditions.
The storage representation of logical values is dependent upon the compiler and the
machine for which the compiler is designed. One bit is sufficient to represent true or false but
because of the difficulty most computers have in isolating a single bit, it is common to find an
entire byte. Most computers cannot address a single bit in their memory but must address at-
least a byte. Therefore, 8 bits at a time are fetched into the registers. The single bit representing
the boolean quantity would then have to be isolated by masking out all of the others. Most
designers have chosen to sacrifice some memory space to avoid their complexity and use the
whole byte to represent boolean quantities.
A pointer is a reference to a data structure. It is a word or portion of a word in memory,
which, instead of containing data, contains the address of another word or byte. Pointer is a
single fixed-size data item and it provides a homogenous method of referencing any data struc-
ture. Pointer permits faster addition and deletion of elements to and from a data structure. In
terms of storage representation, addresses are generally assigned a word or half a word of
storage in most computers. Thus, the larger the number of addresses in the computer, the larger
the amount of storage needed to represent an address. In the next chapter we will describe
fundamentals of data structure.

__________________________ E X i E i R C I S E S __________________________

1. What are the two basic data types? How are they defined in C language?
2. Define the term 'data object'. How is it different from abstract data type?
3. How many components are there in abstract data structure? Explain the term 'maintain-
ability' .
4. Why is system-defined data type different from primitive data types? Explain.
5. Describe the specifications of integer data type.
8 ■ DATA STRUCTURES USING C ■

6. What are the different data types that are available in C language?
7. Are the following system-defined data types? Give reasons for your answers.
(a) Files
(b) Pointers
(c) Enumerated data types
8. Explain why it is not possible to support opaque types in C.
9. Describe one or two situations in your everyday life where you use the idea of abstraction
to simplify large tasks that you need to perform.
10. Describe the nature of pointer data type in C.
2

FUNDAMENTALS OF DATA STRUCTURES


— BASIC CONCEPTS

Computer science is primarily concerned with the study of data structures and their transfor-
mation by some techniques. The modern digital computer was invented and intended as a
system that should facilitate and speed-up complicated and time-consuming computations. In
the majority of applications its capability to store and retrieve large amount of information
plays a dominant role in processing information. The information which is available to the com-
puter consists of selected set of data relating to a real-world problem and it is believed that the
desired results can be derived from those set of data. So it is desirable to understand the logical
relationships between the data items in the problem. The possible ways in which the data items
or atoms are logically related define data structures.

2.1 INTRODUCTION TO DATA STRUCTURE f lH H H H H H H H H H I


A data structure is a data type whose values are composed of component elements that are
related by some structure. Since a data structure is a data type, it has a set of operations on its
values. In addition, there may be operations that act on its component elements. A number of
operations can be performed on a data structure, operations for inserting elements into and
deleting elements from a data structure, and operations to access an element from a data struc-
ture. These operations vary functionally for different data structures. The operations associ-
ated with a given data structure depend on how the data structure is represented in memory
and how they are being manipulated by a particular language. The representation of a particu-
lar data structure in the memory of a computer is called a storage structure. For example, there
are a number of possible storage structures for a data structure of an array. It is thus clear that
data structures, their associated storage structures, and the operations on data structure are all
integrally related to the particular problem. In this book we will examine different types of data
structure and their implementation through C language.
A number of applications, using various kind of data structure, will be discussed in a
comprehensive manner throughout the book.The choice of an algorithm description notation
must be crucial since they play a vital role in implementing applications. In the next section,
algorithm, a fundamental notion, will be discussed.

2.2 ALGORITHM FOR DATA STRUCTURE


An algorithm represents an abstract level the steps that a computer takes to do a job. We re-
quired that only the steps of an algorithm be well-understood, and stipulated that the expres-
sion of an algorithm may vary with the level of understanding of its audience.
10 ■ DATA STRUCTURES USING C ■

An algorithm is a formal step-by-step method for solving problems. An algorithm should


satisfy the following properties.
• An algorithm consists of a sequence of instructions.
• Each instruction should be unambiguous.
• Each instruction should comprise a finite set of instructions.
• The algorithm should terminate after a finite number of steps.
• The algorithm may have an input but it should produce an output.
It is now necessary to introduce the concept of the mathematical tools needed to analyze
the algorithms and data structures that will be discussed in the rest of the chapters. The analysis
of algorithms is a critically important issue in computer science. The data structures that we
would discuss (e.g. array, stacks, queues, link lists etc.) are not only mathematically interesting
but we also claim that they play a vital role in developing efficient algorithms for such tasks as
insertion, deletion, searching, sorting, and pattern matching. How do we show that this claim is
valid? We should demonstrate the algorithms for a given application without depending on
informal arguments, without considering special cases, and without being influenced by the
efficiency of the programming language used to encode the algorithm or the hardware used to
run it. We introduce a technique which is the fundamental tool for evaluating the efficiency
properties of algorithms. The notion introduced here is used throughout the study of data struc-
tures and algorithms. The notion includes complexity measures, order notation, detail timing
analysis, and space complexity analysis.
The mathematical notation used in this book has been selected from notation commonly
used in data structures. This notation tends to differ only slightly from that in general use in the
mathematical literature. For example, log x may be written as lg x in some places. Other nota-
tions are defined at the point of first use with explanation. Algorithm notations, if any used, will
be discussed at the time of presentation.
In describing algorithms, we emphasize upon certain points. First, algorithms should be
concise and compact to facilitate verification of their correctness. Verification involves observ-
ing the performance of the algorithm with a carefully selected set of test cases. These test cases
should attempt to cover all the exceptional cases likely to be encountered by the algorithm.
Second, an algorithm should be efficient. They should not unnecessarily use memory locations
nor should they require an excessive number of logical operations.
In the next section, we give a description of the algorithm notation.

2.3 NOTATION FOR ALGORITHM


Once we have an appropriate mathematical model for a problem, we can formulate an algo-
rithm in terms of that model. The notation used to present algorithms is widely used with minor
variation in the literature on data structures. Normally, we follow the order (as given below)
throughout the book for describing a method.
(i) Basic concept of the method.
(ii) Illustration of the method with suitable data.
(iii) Algorithm for the method.
(iv) C program for the method.
Let us write a simple algorithm for finding maximum from a set of n positive numbers.
■ FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS ■ 11

We assume that numbers are stored in an array X. We hope that these instructions are suffi-
ciently clear so that the reader grasps our intention.
Algorithm 2.1: Searching a maximum from an array X
Input: An array, X, with n elements.
Output: Finding the largest element, MAX, from the array X.
Step 1: Set MAX=0 / * Initial value of MAX* /
Step 2: For j=l to n do
Step 3: If (X[j]>MAX) then MAX=X[j]
end for
Step 4: Stop
Each algorithm in this book is given a number and a title. The title immediately follows
the algorithm number on the same line. Inputs and outputs are described next. The body of the
algorithm consists of a set of numbered steps (the word 'step' before the number). Comments
(similar to C comments) may appear in steps of an algorithm to help the reader in understand-
ing the details. For example, the remark /* initial value of MAX */ appears at Step 1. Different
constructs such as for-do-end, if-then, while-do, and so on are used very similar to
pseudolanguage. It is important to emphasize that data structures are language independent.
Pseudocode is a general tool that allows notation similar to any high-level language.
An algorithm can be described in many ways. As described earlier, pseudocode can also
be used to represent an algorithm. Another way we can express an algorithm is through a graphi-
cal form of notation such as flowcharts. In case of complex decisions, it is difficult to understand
the decisions either in flowcharts or through pseudocode. Decision table is an alternative analy-
sis tool for indicating complex relationships and solutions. In view of this, we start our discus-
sion through the basic concepts of flowcharting for expressing an algorithm.

2.3.1 Flowcharts
A flowchart is a pictorial representation of an algorithm. It serves as a means of recording,
analyzing, and communicating problem information. Programmers often use a flowchart be-
fore writing a program. It is not always mandatory to draw a flowchart. In practice, sometimes,
drawing of the flowchart and writing of code in a high-level language go side by side.
Two kinds of flowcharts are used—program flowchart and system flowchart. A program
flowchart (also called a flowchart) shows the detailed processing steps within one computer
program and the sequence in which those steps must be executed. Different symbols are used in
a flowchart to denote the different operations that take place in a program. Terminal symbol
C 3 shows clearly the beginning and ending of the program. The symbol / / denotes the
input/output operation. Any manipulating or processing of data within the computer is ex-
pressed by the processing symbol EZH. In a flowchart the decision symbol <(^> is used to specify
a conditional branch or decision-making step. Connector symbols O are used in a flowchart to
denote exit to or entry from another part of the flowchart.
A system flowchart shows the procedures involved in converting data on input media to
data in output form. Emphasis is placed on the data-flow into or out of a computer program,
the forms of input and the forms of output. A system flowchart makes no attempt to depict the
function-oriented processsing steps within a program. A system flowchart may be constructed
12 ■ DATA STRUCTURES USING C ■

by the systems analyst as part of the problem definition. However, algorithms in data structure
are always expressed in the form of flowcharts.

(a) System flow chart (b) Program flow chart

Fig. 2.1 System and program flowcharts for monthly billing

C } Terminal: Beginning, end, or point of interruption in a program


o Connector: Entry from, or exit to, another part of the flowchart
/ / Input/ output: Any function involving an input/output device
i i Process: A group of one or more instructions that perform a processing
function
m Punched card: All varieties of punched cards
i^ - 1 Document: Paper documents and reports of all kinds

O Decision: A point in the program where a branch to an alternative path is


possible
->T Flow line: Direction of processing or data flow
<Z> Preparation: A group of one or more instructions that sets the stage for subsequent
processing
--c Annotation: Descriptive comments or explanatory notes provided for clarification

Fig. 2.2 System and program flowchart symbols


■ FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS ■ 13

A system flowchart for monthly billing is show in Fig. 2.1(a) to emphasize a distinction
between a system flowchart and a flowchart, a flowchart showing the detailed processing steps
in the monthly billing program is given in Fig. 2.1(b).
In drawing flowcharts we call directly our attention to the standard flowcharting symbols and
techniques recommended by the American National Standards Institute (ANSI) and its interna-
tional counterpart, the International Organization for standardization (ISO). These symbols are
used throughout the book. These symbols are summarized in Fig. 2.2.
The program flowchart in Fig. 2.1 has one serious drawback; it shows how to compute
the monthly statement for only one customer. Generally, a computer program is written to
perform a particular operation or sequence of operations many times. To provide for this, a
program flowchart can be made to curve back on itself, that is, a sequence of processing steps
can be executed repeatedly on a different set of data. In effect, a program loop is formed. We
now present the modified flowcharts in Fig. 2.3.

Fig. 2.3 Program loop through unconditional jump


2.3.2 Pseudocode
Pseudocode is referred to as a pseudolanguage or an informal design language. Its primary
function is to enable the programr to express his/her ideas about program logic in a very natu-
ral english-like form. H e/she is free to concentrate on the solution algorithm rather than on the
form and constraints within which it must be stated. The intended result is an unambiguous
solution to the problems.
Pseudocode allows a programr to express his/her thoughts in regular english phrases,
with each phrase representing a programming process that must be accomplished in a specific
program module. The phrases almost appear to be programming language statements, thus the
name 'pseudocode'. However, unlike programming language statements, a pseudocode has no
rigid rules; only a few optional keywords for major processing functions are recommended.
Therefore, programmers can express their thoughts in an easy, natural, straightforward man-
ner, but at a level of detail which allows pseudocode to be directly convertible into program-
ming-language code. Fig. 2.4 provides pseudocode for salesperson payroll program.
14 ■ DATA STRUCTURES USING C ■

begin
Read a salesperson payroll record
do while there is more data
multiply sales by commission rate
if sales is greater than quota
then add 10% bonus to commission
endif
add commission to salary
write a report line
read a salesperson payroll record
enddo
end

Fig. 2.4 Pseudocode for salesperson payroll report processing

Certain words in a pseudocode are significant. 'Input' or 'read' a record means that data
is made available to the computer for processing. The input data is generally in the form of a
record, for which several fields of data pertaining to a person or thing are given as one-line
input or one item. If the data pertained to employee records, a record might contain the em-
ployee identification number, the department to which that employee is assigned, the number
of hours worked, the rate per hour, and the tax deduction.
The word 'set' or 'assign' is often used in a pseudocode to initialize values to a desired
amount. The word 'if' in the pseudocode indicates comparison between two items. Sometimes
the words add, subtract, multiply, or divide appear in a pseudocode but this often is the choice
of the programmer. Another word used in the pseudocode is 'print' or 'write'. It indicates that
data is to be prepared as output on the printer. Other words, such as do-enddo, dowhile-endwhile,
are used. We will now illustrate examples of pseudocode.
Example 2.1
b egi n
do
read a rec o r d of three numbe rs
p rin t eleme nts in rec o r d
compute sum of elements
print sum
enddo
end
Example 2.2
beg in
read a r e c ord-ho urs worked, rate, tax
mu l t i p l y rate b y hours w o r k e d a nd set it to gross p a y
compute n e t p a y = gross pay - tax
wr it e hours worked, rate, tax, gross pay, net p a y
end
■ FUNDAMENTALS OF DA TA STRUCTURES — BASIC CONCEPTS ■ 15

We now present another pseudocode for insertion sort with a procedure insertion sort. It
takes as a parameter an array A[l]-A [n] containing a sequence of length n which is to be sorted.
Insertion sort works the way many people sort cards. We start with an empty left hand and
cards face down on the table. We then remove one card at a time from the table and insert it into
the correct position in the left hand. To find the correct position for a card, we compare it with
each of the cards already in the hand from right to left.

/* Pseudocode for insertion sort * /


Insertion Sort (A)
begin
for Jk-2 to length (A) /* Length (A) means length of A */
key<-A[J]
/* Insert A[J] into the sorted sequence A [l] ...A[J-1] */
i*~ J~1
while i > 0 and A[i] > key
A[i+1]<~ A [i]
i <—i-1
endwhile
A[i+1] <- key
endfor
end

Insertion sort
procedure

J <- 2

> Return

[Jl

A [i + 1] <- Key

A[i +1 ]

Fig. 2.5 Flowchart for insertion-sort procedure


16 ■ DATA STRUCTURES USING C ■

The above insertion sort procedure can be graphically described by the flowchart shown
in Fig. 2.5.

2.3.3 Decision Tables


Tables are a familiar, widely used mode of representing and communicating information. When
we buy groceries and other items at the local shop, the shopowner/cashier at the counter refers
to a table that shows a list of items and the price for each. Airlines use rabies that show destina-
tions and distances, or destinations and fares. There are applications too where table plays a
vital role. What is the advantage of using such tables? They are easy-to-follow means of com-
munication. They are also a concise way of representing information. They represent a way of
giving a number of complex decisions in a tabular form; for much the same reasons, tables can
be an effective tool in program development. Sometimes a team programr assigned to a particu-
lar part of a project may set up one or more tables when planning a solution to an algorithm.
Each table serves as a guide during program coding. A table used in this way is known as a
decision table. The purpose of this section is to explain how to use decision tables and how to
construct them. We look at different types of decision tables that can be built and the situations
for which they are useful.
A decision table is used for defining complex program logic. It is preferred by some
programrs because decision tables can be constructed quickly. One advantage of using a deci-
sion table is that it is easier to construct than a flowchart since no symbols are involved. Phrases
are not as difficult to use in a decision table, which is not as limited in space as a flowchart.
A decision table would simply show the possible numbers of dependents as a series of
conditions. The processing steps needed to compute the appropriate deductions would be listed
as actions. As shown in Fig. 2.6, a decision table is divided into four sections. The upper left
portion is called the condition stub. It shows the conditions that are to be considered in reach-
ing a decision.The lower left portion is called the action stub and it shows the actions to be taken
when any of the given sets of conditions is present. The condition entries in the upper-right
portion of the decision table and the action entries in the lower-right portion illustrate clearly
each if-then-else to be followed. To understand a rule, we read vertically the columns that are
a set of condition entry and action entry forms. The number of the rule appears at the top of the
column. More than one action may be required if the applicable conditions may exist. In other
words, both multiple condition entries and multiple action entries for a particular rule are pos-
sible in some cases.

Rules

Condition stub Condition entries

Action stub Action entries

Fig. 2.6 Decision table format

Let us consider some examples to illustrate the use of a decision table.


Example: If an employee's attendance sheet indicates that he or she worked less than ten
hours during the past week, or that he or she was late more than once, or that he or she was
absent on the day before a holiday, then the employee's name should be deleted from the pay-
roll list, otherwise that employee's payroll cheque should be printed.
■ FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS ■ 17

In the above example, three conditions are identified but any one of the three conditions
will produce a certain action. If an employee worked less than 10 hours, his or her name should
be deleted from the payroll list, irrespective of whether or not he or she was late more than once
or absent on the day before a holiday, and so on. The decision table is shown in Fig. 2.7. For each
of the three possible conditions, the condition entry in a column may be Y for 'yes' or N for 'no'.
The action entry contains an 'X' for the satisfying conditions.

Rules
Condition Stub 2 2 3 4 5 6 7 8
1. He or she worked less than Y Y Y Y N N N N
10 hours
2. He or she was late more Y Y N N Y Y N N
than once
3. He or she was absent before Y N Y N Y N Y N
a holiday
Action Stub Action Entries

1. Delete name from payroll list: X X X X X X X

2. Print Payroll Cheque X

Fig. 2.7 Decision table for example


Another very useful and alternative representation of a decision table is decision tree. We now
present decision tree and the corresponding flowchart for the six-coin problem in Figs. 2.8 and
Fig. 2.9. Given coins a, b, c, d, e, and f, we want to find out the heavier coin with the property
that one is heavier but rest are equal.

Fig. 2.8 Flowchart for the heavier coin among coins a, b, c, d, e and f
18 ■ DATA STRUCTURES USING C ■

2.4 MODULARISATION TO ALGORITHM DESIGN


The medium and large sized software projects in real world might be decomposed into as many
as one hundred to one thousand separate pieces. The step in the programming process involves
taking the problem described in the problem specification document and decomposing it into a
collection of interrelated subproblems, each of which is much smaller and much simpler than
our original task. This decomposition of the problem is an example of the design technique
called divide and conquer, which is based on the principle that it is easier to solve many small
problems than to solve one extremely large one.
Modularisation is the formal term for the divide and conquer strategy. (It allows us to
compartmentalise a large problem into collections of independent program units that address
simpler and more coherent subproblems.) Each independent program unit is called a module.
Modules are the functional parts that make the processing work. Ideally, each module works
independently of the other modules, although this sometimes is impossible. Relate the concept
of using modules or steps that could be listed—purchasing of the necessary ingredients, the
preparation of the ingredients, and the baking of the cake. Under each of the three modules, the
details would be listed. Modules are ranked by hierarchy as to importance. The lower the mod-
ule on the structural organisation plan, the more detail is given as to the programming steps
involved. The top module works as a control module, which gives the overall view of the struc-
ture for the program. The program is designed so that at each level of module, more detail is
given.
Each module is coded and tested and is then added to the other modules. This procedure
makes program integration easier since there is a single entry and one exit point per module.
This makes any further modification of a program easy since we know that there is only one
way to enter the module and one way to exit.
Modularisation brings with it a number of distinct benefits. First, it greatly simplifies the
task of modifying or adapting a program unit. Since the only part of a module visible to the rest
of the modules is its external interface, a change to the internal structure of a module should
have no effect on any other module. Modularisation allows us to hide the underlying imple-
mentation details of one program unit from all other program units and protect it from
unauthorised change.
Second, modularisation helps us with a way to create an implementation plan. If we can
create a chart that shows the relationships among these units, we have the means to create such
an implementation plan.
Another advantage of modularisation is that we can verify each individual program unit
■ FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS ■ 19

for corrections as it is developed, instead of having to wait until the entire program is com-
pletely coded. As program units get larger, debugging time increases drastically, eventually
becoming the dominant step in the entire programming project. It is definitely to our advantage
to debug and test a program as a collection of small units rather than one large one.
The important point is that having modularised the problem and developed it in a top-
down fashion, we have a choice of ways to approach the implementation. As we work on one
unit, we know what unit we should work on next and which ones can be effectively postponed.
Without this plan of action, we might write code in some less logical order and thus not have
pieces that work together or that can be tested as a unit. The top-down design method, which
develops as a hierarchical set of tasks, defines a natural set of small sub-units that can be indi-
vidually tested, verified, and integrated into the overall solution. In the next subsection we
describe the top-down design method.

2.4.1 Top-down Design Approach


Top-down design involves starting from the broadest and most general description of what
needs to be done, that is, the problem specification document, and then sub-dividing the origi-
nal problem into collections of modules and abstract data types. Each of these lower-level units
is smaller and simpler than the original task and is more involved with the details of how to
solve the problem rather than what needs to be done. We proceed from high-level goals to
detailed low-level solution mothods.

We consider a program X that can be subdivided into three independent submodules, Xl7
X2, and X3. These submodules, Xv X2, and X3 are further broken down into independent
submodules, and so on. This process is repeated until we obtain modules which are small enough
to be understood and coded quite easily. We can represent it pictorially as shown in Fig. 2.10.
Top-down design is not the one-step process as shown in Fig. 2.10(a). The decomposition
and simplified task is performed over and over. First on the original problem and then on suc-
cessive sub-units until finally we are left with a task that is so elementary that it need not be
simplified any further. The repeated decomposition and simplification of a task into a collection
of simpler sub-tasks is called stepwise refinement. Each of the tasks X in Fig. 2.10(b) represents
a separate program until needed to solve the original problem.
20 ■ DATA STRUCTURES USING C ■

The data structures needed for the solution are also developed in a top-down fashion as
we proceed from general descriptions of abstract data types to operations performed on these
data types to their internal implementation. Next we will decide on the internal structure for
our abstract data types and write the internal module that implements the operations. When we
have finished refining our program units and abstract data types, we would have created a
large number of procedures and data structures, defined in terms of the information they con-
tain and the operation that can be performed on them. Description of these modules and data
structure constitute a major component of the program design document.
Abstraction refers to dealing with an operation from a high-level viewpoint, disregard-
ing its detailed structure. This will help to hide the design details of the lower-level modules to
the higher-level ones. Only the data and control are specified for communication between the
higher-and lower-level modules.
Procedural abstraction and data abstraction are the fundamental tools for managing the
implementation of large programs, both are used in the top-down design method. With proce-
dural abstraction we initially think only about the highest-level functions and procedures needed
to solve the problem. The specification of low-level routines is postponed until later. Each suc-
cessive refinement of the design adds additional detail to the developing solution. So the com-
plexity level increases as we proceed through the design process.
With data abstraction, we initially view a data structure in terms of the external interface
it displays to the user, that is, the operations that can be performed on that structure. Only later
do we begin to concern ourselves with the underlying details of the implementation of that data
type in a given programming language.
Modern program design techniques focus on stepwise refinement. Stepwise refinement
produces software design in a top-down manner. Stepwise refinement is an iterative process.
At each step, the problem to be solved is decomposed into subproblems that are solved sepa-
rately. Thus, if P is the statement of the original problem, p i, p2,..., p n are the statements of the
sub-problems to be solved iteratively. The following is a description of the sort-by-straight se-
lection algorithm by stepwise refinement.
Example 2.3
Step 1
Let n be the length of the arra y A to be sorted
i =l
whi l e i<n do
Place the smallest element at p o s i t i o n i
i=i+l
endwhile
Step 2
Let n be the length of the a r ray A to be sorted
i=l
w h i l e i<n do
j =n
w h i l e j>i do
if (A [i ] >A [j ] )
■ FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS ■ 21

in tercha nge the elements at


p o s i t i o n s j and i
endif
j=j-l
endwhile
i= i+ l
endwhile
Step 3
Let n be the length of the arr a y to be so rted
i=l
wh il e i<n do
j=n
wh i l e j>i do
if ( A [i]> A[j])
x = A [ i ] ,A [i ]=A [j ],A [j ]=x
endif
j=j-l
endwhile
i=i +l
endwhile

2.4.2 Bottom-up Approach


In developing high-level application-oriented software, we will almost certainly need to make
frequent references to simple, low-level procedures that do everyday housekeeping tasks. There-
fore, implementation may not only be top-down but bottom-up as well. The bottom-up design
moves in opposite direction of that in the top-down design method. In this design, the program-
mer identifies a set of essential and crucial low-level routines that would be important to be
available as early as possible. Next, the higher-level modules are built upon the lower-level
modules already designed. If each low-level module is coded as soon as it is designed, it is
called bottom-up programming. In this technique, low-level procedures called utility routine
would be implemented in parallel with or even preceeding the development of the high-level
application unit. When tested and complete, these utilities are put into a program development
library and are available as off-the-shelf programming tools to all the professional staff for the
direction of implementation. When the most critical low-level primitives are finished, the de-
signer may identify additional utility routines that would be desirable to have in the library.
Then implementation usually proceeds in both directions, top-down for the development of
high-level application oriented routines, and bottom-up for the development of utilities and
programming tools to support application development.
In top-down design it is not possible to detect incompatible or unrealisable specification
at an early stage since we postpone all details to the lower stages. This is a serious disadvantage
to the top-down design but can be detected in bottom-up design during the early stages of the
design process. On the other hand, the designer cannot find an exact idea about the entire sys-
tem until the top level is reached.
22 ■ DATA STRUCTURES USING C ■

According to a bottom-up strategy, the design process consists of defining modules that
can be iteratively combined together to form sub-systems. This is typical in the case where we
are reusing modules from a library to build a new system, instead of building such a system
from scratch.
Information hiding proceeds mainly bottom-up. It suggests that we should first recognise
what we wish to encapsulate within a module and then provide an abstract interface to define
the module boundaries. The decision of what to hide inside a module may depend on the result
of some top-down design activity.
Therefore, the programr might choose to solve disjoint parts of the problem directly in a
particular programming language and then combine these parts into a complete problem. In
contrast to bottom-up design, a top-down design technique decomposes a problem into logical
subtasks and each subtask is further decomposed until all the tasks are expressed through a
programming language. The advantage that can accrue from top-down design strategy includes
not only the management of complexity but also an improved ability to test, validate, and
maintain the software that is ultimately produced.

ANALYSIS OF ALGORITHMS f l H H H B H B B H H H IH H
The analysis of algorithms is a critically important issue in computer science. The data struc-
tures that we will be discussing ( i.e. array, stack, queue, etc.) in this book would be introduced
not just because they are mathematically interesting but because we claim that they allow us to
develop more efficient algorithms for such tasks as insertion, deletion, searching, sorting, pat-
tern matching, and others.
Analzing an algorithm has emerged to mean predicting the resources that the algorithm
requires. We are mainly concerned with resources such as memory, communication band width,
or gates but often it is computational time that we want to measure. Analysing even a simple
algorithm can be a challange. Suppose we want to execute a statement x+1 (i.e.x=x+l) in a
C program. We must consider how much time will be required to execute the statement and
how many times it will be processed. The product of these two outcomes will be the total time
taken by the statement. Another statistic is called frequency count and it may vary from one
data set to another. It is impossible to estimate frequency count unless we have knowledge
about the machine structure, machine cycle time, and the speed of the translator. In this book
we will concentrate on developing only the frequency count for the statements. In our analysis
we want to find the order of magnitude of an algorithm. It indicates that we are determining
only those statements which may have the greatest frequency count.
In case of sorting method, we looked at both cases, in which the input array has already
sorted, and the worst case, in which the input array was reverse sorted. The worst case running
time of an algorithm is the upper bound on the running time for any input, the average case is
often as bad as the worst case. Suppose we randomly choose n numbers and apply insertion
sort. If we work out the resulting average case running time, it turns out to be a quadratic
function (for example, ax2+bx+c, for constants a,b, and c) of the input size, just like the worst
case running time.
When we look at an input size very large to make only the order of growth of the running
time relevant, we are studying the asymptotic analysis of algorithms. That is, we are concerned
with how the running time of an algorithm increases with the size of the input in the limit, as the
size of the input increases without bound. In the next subsection we discuss the asymptotic
analysis of an algorithm.
■ FUNDAMENTALS OF DA TA STRUCTURES — BASIC CONCEPTS ■ 23

2.5.1 Asymptotic Analysis


The most obvious way to measure the efficiency of an algorithm would be to run it using a
specific set of data and measure how much CPU time and memory space are needed to produce
a correct solution. However, this approach seems to work satisfactorily for only one particular
set of data and would be unable to predict how the algorithm would perform using a different
set of data, so we need a way to develop a formal technique, called asymptotic analysis and will
provide a guideline that will allow us to state that for any arbitrary data set, one particular
method will probably be better than another.
Let us consider three parameters n, t, and s to represent the size of the problem, process-
ing time needed to get the solution, and total memory space required by the solution. For ex-
ample, in searching and sorting a list, n would most likely be the number of items in the list. The
relationship between n, t, and s can be given as
t =f(n),
s=g(n)
The function/(n) is called the time complexity or order of the algorithm and g(n) is called
the space complexity of the algorithm.
In a practical situation, such formulas are rarely used to analyse algorithms. First, they
are difficult to obtain because they rely on machine-dependent parameters that may not be
applicable to all cases. Second, we do not want to usef(n) and g(n) to compute the exact time or
space requirements for specific data cases. Instead we want to frame a guideline for comparing
and selecting algorithms for data sets of arbitrary size. We can obtain this type of information
by using asymptotic analysis, which is expressed using big -0 notation.
The notation/(n) =0[g(n)\ ( read as/of n equals big-o of g of n) has a precise mathematical
defination.
Definition: f(n)=0[g(n)\ implies that there exists k and no such that \f(n)\<kg(n) for
n>no./(tt)will normally represent the computation time of some algorithm. The statement 'the
computing time of an algorithm is 0{g[n)\ means that the algorithm takes no more than a con-
stant tim eg(n), where n is a parameter that characterises the inputs and/or outputs.
The notation 0(1) means a computing time which is a constant. 0(n) is called linear,
0 (2 2) is called quadratic, 0(n3) is called cubic, 0 (2 ”) is called exponential, and 0(log 2n ) is called
order of log2n.
Table 2.1 shows how the computing times grow with a constant equal to 1.

Table 2.1 Computing Functions


n log2n n log2n rf rf 2"
1 0 0 1 1 2
2 1 2 4 8 4
4 2 8 16 64 16
8 3 24 64 512 256
16 4 64 256 4096 65, 536
32 5 160 1024 32,768 2,147,483,648

Note that 0(n log n) is better than 0 ( n1) but not as good as 0(n). Similarly, if an algo-
rithm takes time 0(log n) it is faster, for sufficiently large n, than if it had taken 0{n).
24 ■ DATA STRUCTURES USING C ■

For reasonably large problems we always want to select algorithms of the lowest order
possible. If algorithm A is 0\f(n)\ and B is 0 [g (n )], then algorithm A is lower order than B if
/( w)< g(n) f°r all n greater than same constant K. For example, 0 (n 2) is lower order than 0 (n 3)
because n2<n3 for all n> 1. Similarly, 0 (n 3) is lower order than O (2”) since n3<2nfor all n >9.
Thus we would definitely want to select an 0 (n 2) algorithm to solve a problem, instead of
an 0 (n 3) or 0 (2 n) algorithm, if such an 0 (n 2) algorithm existed.
Fig. 2.11 shows the behaviour for algorithms of order 0 (2 ”), 0 (n 3/2), 0(5n2), and O(lOOn).

This type of analysis provides the general guidelines we need and thus O notation is the
fundamental technique for describing the efficiency properties of algorithms.
One important system of description used throughout the book is the use of notation of
the form 0[f(n)\ to describe the time or space requirements of running algorithms. If n is a
parameter that features the size of the input to a given algorithm, and if we say the algorithm
runs to completion in 0\f(n)\ steps, we mean that the actual number of steps executed is no
more than a constant timef(n ) for sufficiently large n.

2.5.2 Space Complexity


In space complexity, we need to develop a formula that relates n, the problem size, to s, the
amount of memory space needed to solve the problem. For example, if an algorithm requires a
table large enought to hold all n items, the space complexity is of the order O(n). For inputs of
size n<20, the program with running time 5n3 will be faster than the one with running time
IOOh2. So if the program is to be run mainly on inputs of small size, we would prefer the pro-
gram whose running time was 0(n 3). When n is large, that is, the size of the input increases, the
0 (n 3) program will take significantly more time than the 0 (n 2) program. Both the sequential and
the binary search (to be discussed in sorting and searching chapter) methods require a table
large enough to hold all n items and are therefore 0(n ) in terms of space complexity.
Memory space is inversely proportional to computer time. So we can frequently reduce
space requirements by increasing processing time or conversely reduce processing time by
making more memory available. This situation is referred to as space-time trade-off. For ex-
ample, when we store the elements of an n x n array we need n2memory locations. If most of the
array elements are zero then reduction of space is possible in exchange of additional time for
storing elements in reduced representation. It is always a challenge for computer scientists to
find algorithms that solve give types of problems with the lowest possible complexity class for
time space. So in this book, the discussion and comparison of algorithms for data structures and
space consumed by data representation often comments on the complexity class 0 [ f (n)\ char-
acterizing the time and space resources required.
■ FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS ■ 25

2.6 STRUCTURED PROGRAMMING


The logic of any program can be developed entirely in terms of three types of logic structures:
sequence, selection, and iteration. These logic structures are called the elementary building blocks
and have one key feature in common, namely, a single entry point and a single exit point. A
structured program is one consisting entirely of these elementary building blocks. These struc-
tures are sufficient to express any desired logic.
The theoretical framework for structured programming was initially presented at an
International Colloquium in Israel in 1964. The authors, Bohm and Jacopini, presented their
work in Italian and were essentially ignored in the United States. The english translation of their
paper, published in 1966 in Communications of the A CM , did not gain a great deal of attention
because of its theoretical nature. In 1968, a letter was sent to the editor by Edger W. Dijkstra of
the Netherlands. In this letter, he wrote that 'the quality of programrs is a decreasing function
of the GOTO statements in the programs they produce.' He further suggested that 'the GOTO
statement should be abolished from all high level programming languages.... it is an invitation
to make a mess of one's program.'
Bohm and Jacopini had proved it theoretically that a program can be written without
GOTO statements. Hartlan Mills and F Terry Baker, two Americans, demonstrate the practical
aspects of the then revolutionary technique.
Structured programming techniques began to appear with greater frequency in the middle
and late 1970s. The development was spurred considerably by the widespread introduction of
Edward Yourdon's seminars on the subject. Today it is safe to say that virtually all practitioners
atleast acknowledge the merits of the discipline, and most practice it exclusively.
Three elementary building blocks are depicted in Fig. 2.13. The sequence structure for-
mally specifies that program statements are executed sequentially, in the order in which they
appear. The two blocks, x, and y, may denote anything from single statements to complete
programs.
Selection, also known as IF THEN ELSE, is the choice between two actions. If the condi-
tion is satisfied, block x is executed. If it is not satisfied, block y is executed. The condition is a
single entry point to the structures, and both paths meet in a single exit point.
Iteration (also known as DO WHILE) calls for repeated execution of code while a condi-
tion is true. The condition is tested. If it is true, block x is executed; if it is false, the next sequence
of statements will be executed.
Let us look again at the significance of the building block concept of Bohm and Jacopini.
A block can consist of only one statement, or it can consist of a sequence of single statements, or
it can contain of other blocks that in turn contain other blocks as a part of their structures. The
structure is called nested control structure. A nested structure is shown in Fig. 2.12. A program
or module can be built up in this way. The program itself can then be viewed as a single struc-
ture. A program having only one entry point and one exit point, and in which, for every struc-
ture, there exists a path from entry to exit, the program which includes it is called a proper
program.
26 ■ DATA STRUCTURES USING C ■

Fig. 2.12 Nested control structures

Structure relinquishes control to the next sequential statement. Again, there is one entry
point and one exit point from the structure.

Fig. 2.13 The building blocks of structural programming

Consider again Fig. 2.12. The entry point to Fig. 2.12 is a selection structure to evaluate
the condition. If the condition is false, an iteration structure is entered. If the condition is true, a
sequence structure is executed. Both the iteration and sequence structure meet at a single point,
which in turn becomes the exit point for the initial selection structure.
We have acquired some familiarity with the basic patterns, which are sufficient for any
program. Certain other combinations have proven to be especially useful. One is the combina-
tion of simple sequence and do-while, known as do-until control structure. Even though the
nested if-then-else structure seems to be useful for any number of conditions, it is difficult to
draw such a flowchart or pseudocode. Case structure is a generalization of the nested if
then else pattern with a large number of possible conditions. These two structures are shown in
Fig. 2.14.
■ FUNDAMENTALS OF DATA STRUCTURES — BASIC CONCEPTS ■ 27

Fig. 2.14 The logic of case structure and logic of do-until

In this section, we have used ANSI standard flowcharting symbols within the structure
of visualizing the program login set up within the structured programming control structures.
Some feel that the structural programming is self-explanatory and need not be accompanied by
flowcharts. These persons usually point out that a flowchart often fails to represent the current
status of a program. To many, the use of pseudocode is a viable alternative to flowcharting. It
permits the programr to express required program logic unencumbered by programming lan-
guage rules and constraints.
After a structured program has been designed, it must be expressed in a programming
language which supports the structured construct. The primary language statements directly
analogous to the control structures we have described are available in a particular language. C
language supports these constructs. In the next chapter, array as a data structure will be intro-
duced.

EsXiEiRiCiliSiEiS
1. A function excan be approximated by using the following formula:
ex= 1 + x + x2 /2! + x3 /3! + . . . + xk /k!
Write an algorithm for finding ex, where x is given as input.
2. What is the smallest value of n such that an algorithm whose running time is 100n2 runs
faster than an algorithm whose running time is 2non the same machine?
3. The most common computing times for algorithms are:
0 (1)
O (log2n)
28 ■ DATA STRUCTURES USING C ■

O (n)
O (n log2 n)
O (n2)
O (n3)
O (2n)
Draw a graph for their time complexities in the range 1< n <128 and compare their rate of
growth.
4. Compare the two functions n2and 2n/4 for various values of n. Determine when the second
becomes larger than the first.
5. Given n, a positive integer, determine if n is the sum of all of its divisors; that is, if n is the
sum of all t such that 1< t < n and t divides n. Draw a flowchart for this problem.
6. Write pseudocode for the insertion sort to sort into nonincreasing instead of nondecreasing
order.
7. What are the main advantages of a decision table over a flowchart?
8. Discuss the merits and demerits of top-down and bottom-up approaches to algorithm de-
sign.
9. Imagine that we have an algorithm p whose time complexity we have analyzed and found
to be O [log2(log2n)]. What would be the position of that complexity function in the ordered
list of functions given in Exercise 3?
10. Can we write a 0 (1 ) algorithm to determine n! for 1< n <15?
3
ARRAYS

In this chapter we will be concerned with non-primitive data structures that are linear. A num-
ber of possible storage representations for these linear structures are available. We will concen-
trate here on array structures only. Others will be discussed in the succeeding chapters.

3.1 LINEAR ARRAYS


An array is an ordered set that consists of a fixed number of identical type of objects. No dele-
tion or addition operations are performed on arrays. At best, elements can be changed to a
value that represents an element to be ignored. The setting of an element in an array to zero
means to delete it. The storage representation of array structure is based on sequential alloca-
tion.
An array can be considered as the computer's set of pigeon-holes. Each hole, or element, has the
same attributes (i.e. it can hold the same amount of like information, and has the same name).
An array can be used to hold a table of information — for example, a set of parameters — or a
series of similar results.
The simplest data structure that makes use of computed addresses to locate its elements
is the one-dimensional array, which we have called a vector. Normally, a number of (contigu-
ous) memory locations are sequentially allocated to the vector. Assuming that each element
requires one word of memory, an n-element vector will occupy n consecutive words in memory.
A vector size is fixed and, therefore, requires fixed number of storage locations.
Arrays occur in our daily life, where they are called tables. For example,
(a) A table of train departure time is an array. If we call the array DEP, we can denote the
departure time of the i-th train by DEP. (this terminology is borrowed from mathematics).
(b) A table of the number of days in each month is an array. If we call this array NDAYS,
we can denote by NDAYSmay/the number of days in May.

3.2 ARRAYS INC


Arrays are a data type that is used to represent a large number of homogeneous values.
C language allows both single-dimensional arrays as well as multidimensional arrays. It is an
important property of an array that every element has the same type. Arrays may be of stroage
class automatic, external, or static, but not register. The general format for a single-dimensional
array is
type_specifier variable_name[size];
30 ■ DATA STRUCTURES USING C ■

For example, to create an array called 'A' with ten integer elements, we can declare it as follows.
int A [10];
The indexing of array elements always starts at 0. Thus the above array reserves memory
locations that are referred to by A[0],A[1] ,...,A[9] . This is one of the characteristics of the
C language.
A typical array declaration allocates memory starting from a base address. The array
name is in effect a pointer constant to this base address. To store the elements of the array, the
compiler assigns an appropriate amount of memory starting from a base address. The elements
of an array are accessed using a subscript, also called an index. We can write A[i] to access an
element of the array. More generally, we may write A[expr], where expr is an integral expres-
sion, to access an element of the array. The value of the subscript must lie in the range 0 to size
-1 if the declaration of the array is of the form A[size]. An array subscript value outside this
range will cause a run-time error. This is a common and serious programming error which must
always be avoided since it will cause the program to fail.
Arrays of all types are possible, including arrays of arrays. Strings are just arrays of char-
acters, but they are sufficiently important to be treated separately. We may have an array of any
type: int , char , float, double , another array, a structure, a union, or a pointer to any
type. The size of the array must be a constant. The following example will illustrate the idea:
#define SIZE 25
int A [SIZE +1 ];
To illustrate the concept of an array, let us write a program that fills an array, prints out
values, and sums the elements of the array.
Example 3.1
#include <stdio .h>
#define N 5 /* Size of arra y */
main( )
{
int A[N] ; /* Space for A [ 0 ] ......... A[4] is a l l o c a t e d */
int j,sum=0;
for(i=0; i<N; i + +) A[i]= i*i ; /* Initi alize the ar ray
*/
for(i=0;i<N; i + +) /* D i s p l a y a r r a y elements */
printf(" A[%d]=%d" ,i, A [i] );
for(i =0; i <N; ++ i) /* F i n d total */
sum + = A [i ];
pri ntf (" \n sum=%dn/n", sum);
return ;
}
Output: The output of this programme is
A[0]=0 A [l]= l A[2]=4 A[3]=9 A[4]=16
sum = 30
m ARRAYSm 31

Consider an election contested by four candidates. Let us write a program to read the
ballots and count the votes cast for each candidate. Assume that the candidates are numbered
0 to 3 and each ballot is presented as a line of input containing one of these members. Assume
that there are n number of ballot papers.
Example 3.2

# include < stdio.h>


# define N 50
main ( )
{
int b a l l o t ;
int countO =0,countl= 0;
int count2=0, count3 =0;
int i, invalid = 0;
for (i = 1; i < = N; i ++ )
{
scanf (" %d", & ballot);
switch ( ballot)
{
case 1: countO ++
break
case 2 : countl + +
bre a k
case : c o u n t 2 ++
break
case : count3 ++
break
d e f a u l t : inv alid ++

}
pri ntf ("Candidate-1 % d \n " , countO)
p rin tf (" Candida te-2 % d \n", countl) ;
pri ntf (" Candidate -3 % d \n", count2 )
pri ntf ("Candidate-4 %d\n", count3) ;
p rin tf ("Invalid votes %d \n", invalid);
ret urn ;
}
You might feel that this solution is rather clumsy. Imagine how much worse it would be
if the program were modified to handle ten or fifty candidates. We observe that the variables
countO, countl, count2, count3, and so on are used in exactly the same way, however,
32 ■ DATA STRUCTURES USING C ■

we have a clue for better solution. Let us rename the variables countO as count [ 0 ] ,countl
as count [1 ], and so on. The variables now form an array, with a single variable count. Thus
we can rewrite the program as in Example 3.3.
Example 3.3: The modified program is listed below.

#include < s td io.h>


#define N 50
#define N C A N D I D A T E S 4
main( )
{
int i, count [NCANDIDATES], ballot, invalid = 0;
for(i =0 ; i<4 ; i + + ) /* Initiali ze all vo te count */
count[i]= 0 ;
for(i = 1 ; i < = N ; i + +)
{
s c a n f ( "% d",, & b a l l o t ) ; /* R ea d a b al lot */
/* A d d 1 to the ch osen cand idate' s vo te count unless
ba llot is i n va lid */
if(ballot > 0&& ballot<5) count [ b a l l o t - 1]++;
else
invalid++ ;
for (i = 1 ; i <= N C A N D I D A T E S ; i + +)
pri ntf ( "Candidate % d o b t a i n e d %d \ n", i,
count [— i ] );
p r i n t f ( "Invalid vote count ; %d" ,invalid);
}
}
Note that the refinement of 'Initialize all vote counts to 0' is a loop that assigns 0 to each
element of count in turn. The use of NCANDIDATES makes the program more flexible. If the
program had to be modified to handle more contesting candidates, then only the define state-
ment needs to be altered. This is also true for the n if the number of ballot papers need to be
modified. The above program is much more elegant and moreover, requires no modification if
the number of candidates is changed. It also highlights the utility of arrays instead of using a
number of variables.

3.3 INITIALIZING ARRAYS


A definition of an array reserves one or more cells in internal memory and associates a name
with the cell or cells that the programmer can use to access the cells. An array, like a variable,
can be initialized at definition time. A static or extern array can only be initialized in the defini-
tion. The initial values are enclosed in braces and separated by commas. For example, the fol-
m ARRAYSm 33

lowing definition
static int marks[4]={55,67,92,45};
creates and initializes the array marks as shown in Fig. 3.1 below.

marks 55 67 92 45

[0] [1] [2] [3]


Fig. 3.1 Array marks with values

Note that we can ignore the integer within the brackets if we define and initialize the
static or extern arrays. The compiler will allocate exactly as many cells as are needed to store the
initial values. Thus the above example can also be written as
static int marks[ ]={55,67,92,45};
If, in a definition, we specify fewer initial values than the number of cells, the cells begin-
ning with the first will be initialized with the given values. The rest of the cells will be initialized
to zero. It may be an error to supply more initial values than there are cells. For example, the
following statement will create and initialize values as shown in Fig. 3.2.

static float cost [5] = {4.50, 7.90, 3.25 I;

cost 4.50 7.90 3.25 0 0

[0] [1] [2] [3] [4]


Fig. 3.2 An array cost with float values

A set of ten readings is stored in an array number. Write a program to check whether
each reading is positive or negative. Add all positive readings and store the resulting Sum
variable.
Example 3.4

# i n c l ud e<std io.h>
main( )
{
static int n u m b e r [ 10]={ 50,-30,20,70,-25,19,18, 78,-
225,719};
i n t S u m = 0 ,i ,j = 0;
for(i=0,i<10;i++)
i f ( N u m b e r [ i ] >0)
{
Sum+=Number[i];
j ++;
}
p r i n t f ("Total p o s i t i v e n u m b e r %d \t Sum: %d",
j , Sum);
return;
}
34 ■ DATA STRUCTURES USING C ■

The symbolic name 'Number' here denotes the name of the array. The array is initialized
with ten values. The 'for' loop contains an 'if' statement to test for the positive values in the
array. Finally the 'Sum' will hold result and j will store the total number of positive values in the
given array.

3.4 INSERTION AND DELETION


These two operations are often required in the applications of array processing. Insertion refers
to the operation of adding another element to a linear array, and deletion, on the other hand,
performs the operation of removing one of the elements from the array. This section deals with
C programs for insertion and deletion operations.
Inserting an element at the end of the linear array can be easily performed, provided the
memory space is available to accommodate the additional element. If we want to insert an
element in the middle of the array then it is required that half of the elements must be moved
rightwards to new locations to accommodate the new element and keep the order of the other
elements. Similarly, if we want to delete the middle element of the linear array, then each sub-
sequent element must be moved one location ahead of the previous position of the element. The
deletion operation is somewhat simpler if we want to delete the last element of the array. The
problem of insertion and deletion is more complex if we want to add or delete the first element
of the array. Thus we conclude that if many deletions or insertions are to be made in a collection
of data elements, then a linear array may not be the most efficient way of storing the data.
We now present two example programs in Example 3.5 and Example 3.6 for inserting
and deleting an item 'Item' in an array 'Linear'.
Example 3.5

# include <std io.h>


/* Inse rting an item at Kth p o s i t i o n of an ar ray, w h e r e K<=N */
#define N 100
#define P N*2
main( )
{
int Linear[P], K, item ;
int i; /* Loop index */
/* R ead elements in an a r r a y */
for(i=0 ; i<N; i++ )
scanf ("%d", &Linear[i]) ;
p r i n t f ( " \ n Enter the p o s i t i o n and item \n \n" ) ;
scanf ("%d % d " , &K, &item) ;
/* Insert an element in the array */
i=N-l;
while(i>=K)
{
m ARRAYSm 35

Linear[i+1] = Linear[i] ;
i + +;

}
Linear[K] = Item ;
N++;
/* Print the a r r a y */
for( i=0 ; i<=N ; i + + )
p r i n t f ( "\n Linear[%d] = ", Linear[i] ) ?
}

Example 3.6

/* Ex ample p r o g r a m for d e l e t i n g an element from an array*/


# include < stdio.h>
# defin e N 100
# defin e P 2*N
main ( )
{
int Linear[P], Item, i,k ;
/* R ead elements in an array*/
for(i=0 ; i<N ; i++ )
s c a n f (" % d " , &Linear[i] ) ;
p r i n t f ( " \ n Enter the p o s i t i o n \n \n") ;
scanf ("%d", &K);
/* D e lete an element from the a rray */
Item = Linear[k] ;
for( i=k ; i<N-l ; i++ )
Linear[i] = Linear[i+1];
N ~- ;
for(i=0 ; i<N ; i++ )
p r i n t f ( " \ n Linea r [%d]=", linear [i]> ;
re turn ;
}

Example 3.7: Suppose an array of 10 integer values is given. Find the content of the array on
execution of the following program.

#include < std io.h >


main( )
{
36 ■ DATA STRUCTURES USING C ■

int A [10], i ;
for(i=0; i<10 ; i + +)
scanf ("%d", &A[i] ) ;
ford = 0, i< 9 ; i + + )
A [i + 1 ] = A [i ] ;
for (i=0 ; i<10 ; i++)
pri ntf ( "\n A [%d] = A[i] ) ;
return ;
}

The arrays discussed so far are called one-dimensional arrays, since each element in the array is
referenced by a single subscript. A vector in mathematics can be represented by a one-dimen-
sional array, and a matrix by a two-dimensional array. Most programming languages allow
two-dimensional and three-dimensional arrays by two and three subscripts. In C language we
can use the higher number of dimensions for an array. The general form of a two-dimensional
array declaration in C is as follows:
type_specifier Variable_name[Rows] [Columns];
For example, the following is a two-dimensional array A of integer type with m rows and
n columns.
int A[m][n];
It is assumed here that m and n have already been defined with #de fine compiler direc-
tive. Each element is specified by a pair of integers ( such as j, k) called subscripts, with the
property that
0 <= j<= m - 1 and 0 <= k =< n - 1
There is a standard way of representing a m x n two-dimensional array A where the
elements of A form a rectangular array with m number of rows and n number of columns.

The element A[j] [k] appears in row j and column k. Fig. 3.3 shows the array A which has
3 rows and 4 columns.

Columns

0 1 2 3
Rows 0 A[0] [0] A[0] [1] A[0] [2] A[0] [3]
1 A [l] [0] A [l] [1] A[l][2] A [l] [3]
2 A[2] [0] A[2] [1] A[2][2] A[2] [3]

Fig. 3.3 Two-dimensional array


m ARRAYSm 37

We illustrate the procedure for reading and writing a two-dimensional array with the
following example program given in Example 3.8.
Example 3.8

# i n c l u d e < s t d i o .h>
#define R O W 5
#define C O L U M N 6
main( )
{
Int A [ R O W ] [ C O L U M N ] , i, j ;
/* R e a d the a r r a y A*/
for( i = 0 ;<ROW; i + +)
f o r (j = 0 ; j <C O L U M N ; j++ )
s c a n f (" % d* , & A[ R O W ][ COLUMN]);
/ *Print the a r ra y element s */
f o r (i = 0 ; i<ROW ; i ++ )
{
for( j= 0; j< column; j+ + )
p r i n t f (" A [%d] [%d]" , A [ R O W ] [ COLUMN]) ;
}
}
As mentioned earlier, C language allows arrays with more than two-dimensions. For
example, a three-dimensional integer array is defined as
int x [3][2][5];
An array element x [1][2][3] specifies second plane, the first row, and the fourth column
number. For example, an array of marks of a student in a particular class number might be
indexed as
int A[class_no] [roll_no] [marks];
If we want to refer to an element of the array A, three subscripts are required. The num-
ber of elements in an array is the product of the ranges of all its dimensions. The array x may
contain 3x2x5 = 30 elements. In general, if we multiply together the numbers of brackets in the
definition of an array, we will obtain the total number of memory locations that would be nec-
essary for storing all the elements.
In general, an n dimensional array A is denoted by A [SJ [S2]...[SJ and subscript limits by
0<S1<=U1,0<S2<= U7,...,0< Sn<=Un, where and so on indicate upper bounds on its sub-
script. The array will be stored in memory in a sequence of memory locations. Specifically, the
programming languages will store the array A either in row-major order or in column-major
order. In row-major order, the last subscript changes first, the next-to-last subscript second, and
so on. In case of column-major order, the first subscript varies most rapidly, then the second
subscript, and so on. Suppose that B is an n-dimensional array declared by:
int B[I1]B[I2] ... [In] ;
38 ■ DATA STRUCTURES USING C ■

where lv I2, and other are declared as the ranges of first dimension, second dimension, and so
on. The base (B) is the address of the first element of the B array, that is, B[0][0]...[0]. Assume
that the array B is stored in row-major order and each element of B reserves m bytes. Thus the
address of B[il] [i2]...[in] is written as follows:
base (B) ' ' + m * [ i1*1i * . . * i n + i2*i j * i,*...*i+...+i]
4 n n
The number of elements in the array B is the product of For example, an array X
with seven subscripts is declared as
int X [7] [15] [3] [5] [8] [2] [2] ;
The number of elements array X contains is 7x15x3x5x8x2x2=50,400. In C an integer reserves
two bytes and X array stores 50,400x2 bytes.

3.6 ROW^MAJOR AND COLUMN-MAJOR ORDER


Memory storage locations are not arranged as a rectangular array of elements with rows and
2 columns. Instead, they are arranged in a linear sequence beginning with location 1, and then
2,3,... . For this reason, there must be some manipulation behind the screen when a program
requests the entry in the 3rd row and 4th column of a two-dimensional array. Essentially, the
desired coordinates must be transformed into an address in this linear sequence of memory
location. The nature of the transformation is dependent upon how the programming language
will store the two-dimensional array. It will store the array either (i) column by column or
(ii) row by row. The first one is called column-major order whereas the second one refers to
row-major order. Let A be a two-dimensional array having 3 rows and 4 columns. Then the 12
entries of the array are stored as indicated in Fig. 3.4.

1St. row 2ndrow 3rd, row

A (14) (1,2) (1,3) (1,4) (2,1) (2,2) (2,3) (2,4) (3,1) (3,2) (3,3) (3,4)
l 2 3 4 5 6 7 8 9 10 11 12
Fig. 3.4 Linear storage of data in row-major order

According to this arrangement, the first row would take up the first four locations in the
list allocated for the array, the second row the second four locations, and so on. This arrange-
ment is called row-major order. The element in the 3rd row and 2nd column, will in fact, be
located in the 10th position within the list. In this two-dimensional array, the ith row and
jth column must be transformed into the following position in the list according to C-language.
[4* (i-1) + (j-l)]th
In general, if N is the number of columns in the array, then the entry in the ith row and jth
column is given as the
[N * (i-1) + {j-1)]th
Similarly, Fig. 3.5 indicates the approach for storing elements in the column-major order.

1st column 2nd column 3rd column 4th column

A (1,1) (2,1) (3,1) (1,2) (2,2) (3,2) (1,3) (2,3) (3,3) (1,4) (2,4) (3,4)

1 2 3 4 5 6 7 8 9 10 11 12
Fig. 3.5 Linear storage of data in column-major order
MARRAYSM 39

To access the entry in the ith row and jth column of a two-dimensional array stored in
column-major order, the transformation N1 x (j-1) + i +1 is required, where N1 represents the
number of rows in the array.
Let us consider a problem . An array A has 25 rows and 4 columns. Suppose the two-
dimensional array is stored in column-major order.
Then the position of A[3][2] may be computed as
25 x (4-1) + 2-1=75+1=76
Note that C language stores two-dimensional arrays using row-major order. An n-di-
mensional (Pa x P2...xP J array C is a collection of Fv P2..., Pn data elements in which each ele-
ment is specified by a list of n number of integers, say k v k2,...,kn called subscripts with the
following properties:
0<=k1<P1
1 1 ',0<=k9<P_,....0<=k
2 2' n<Pn
The element of C with subscripts kl,k2,...kn will be denoted by
Ck1k2...kno rC [k 1][k2]...[kn]
The array will be stored in memory in consecutive locations. The programming language
will store the array C either in row-major order or in column-major order. In row-major order
the elements are listed in such a way that the last subscript varies first, the next-to-last subscript
varies second, and so on. In the column-major order, the elements are listed so that the first
subscript varies first, the second subscript second, and so on. Suppose C is a three-dimensional
2x4x3 array. Then C array contains 2x4x3=24 elements. Figs. 3.6(a) and Fig. 3.6(b) indicate the
arrangements in column-major order and row-major order, respectively.

C Subscripts C Subscripts

(14,1) (1,1,1)

(2,1,1) (1,1,2)

(1,2,1) (1,1,3)

(2,2,1) (1,2,1)

• •

• •

• •

(1,4,3) (2,4,2)

(2,4,3) (2,4,3)

(a) Column-major order (b) Row-major order


Fig. 3.6 Arrangement of C array
Consider an array A to be an n x n square matrix. The value n is defined as a constant. We
need to write a program which will do the following:
40 ■ DATA STRUCTURES USING C ■

(a) Find the number of non-zero elements in A.


(b) Find the sum of the elements above the diagonal.
(c) Find the product of the diagonal elements.
A program code in C is given in Example 3.9.
Example 3.9

# include < stdio.h>


# dwf ine n 3
main ( )
{
int A [ n ] [ n ] ,i , j ;
int Number=0, Sum=0, Product=l;
/* R ea d elements of the ar ray — A*/
for(i=0; i<n ; i++ )
f o r (j = 0; j<n; j++ )
s c a n f ( "%d", & A[i][j] ) ;
/* W r ite the array - A */
for(i =0, i<n, i++ )
{
f o r ( j = 0; j<n ; j+ + )
p r i n t f ( " A [%d] [%d]'\ A[i][j] ) ;
printf("\n");
}

/* Se arch for no n- z e r o elements */


for(i=0 ; i<n ; i + + )
f o r (j =0 ; j<n ; j+ + )
i f ( A [i][j])

N u m b e r ++;
/* Sum of elements above diagonal */
f o r (j =1 ; j<n ; j++ )
for(i=0 ; i<j ; i++ )
Sum + = A [i ] tj ];
/* Product of diagonal elements */
for( i=0 ; i<n ; i++ )
Product * = A [i ][i ] ;
p r i n t f ( " \n \n Num b e r = %d \n \n Sum =%d \n\n
Product = %d \n", Number, Sum, Product) ;
}
■ ARRAYS M 41

Sample Input:
1 2 3
A[3] [3]= 4 0 6
7 8 9
Output:
Number = 8
Sum = 11
Product = 0
The program in Example 3.10 prints the elements of a two-dimensional array in row-
major and column-major order.
Example 3.10

# include <stdi o.h>

main ( )

{
int T e m p [5][4],i , j ;

/* Read the a rray Te mp */

for( i = 0 ; i<5; i++ )

for( j = 0; j<4 ; j+ + )

scanf ( " % d " , , & T e m p [ i ] [j ]) ;

/* Print the a rray Temp in r o w-m ajor order */

for( i=0 ; i<5 ; i + +)

f o r ( j = 0 ; j<4 ; j+ + )

p r i n t f ( " %d An", Temp[i][j] );

/* Print the arr a y Temp in col u m n - m a j o r order */

f o r (j = 0; j<4; j++ )

for( i=0 ; i<5 ; i++ )

p r i n t f ( "%d \n", Temp[i][j]) ;

}
A program to illustrate the idea of lower-triangular matrix and tridiagonal matrix is given
in Example 3.11. A square matrix is a lower- triangular if non-zero elements are below the
diagonal. In case of a tridiagonal matrix, all elements of the square matrix other than those on
the major diagonal and on the diagonal immediately above and below this one are zero. For
example, an n-square tridiagonal array B has n elements on the diagonal, n-1 elements above,
and n-1 elements below the diagonal. Thus B contains almost 3n-2 non-zero elements.
42 ■ DATA STRUCTURES USING C ■

Example 3.11

/* This p r o g r a m tests the input m a t r i x to find w h e t h e r it is a


lower t riangular m a t r i x or a tridiagonal m a t r i x */
# include < st dio.h >
ma i n ( )
{
int i lk /l /n #p /r /x #m at [ 2 5 ] ;
p r i n t f ( "\n \n Give the order of the matrix: ") ;
s c a n f ( " % d " , &n) ;
p r i n t f ( "\n Enter the elements of the matrix: \n", ) ;
p = n* n ;
for( i=0 ; i<p ; i + + )
s c a n f ( "%d", , &mat[i]) ;
1=1; r=0; i=l;
w h i l e ( i <= ( p - n) )
{
i + = r;
whi l e (i<n*l)
{
i f ( m a t [ i ] !=0 )
{
p r i n t f ( " \n \n It is not a lower
triangular m a t r i x * / ” );

i=p;
}
i++ ;
}
i++ ;
r++ ;

1 ++ ;
}
if ( i == p-n+1)
pr intf ( "\n It is a lower triangular m a t r i x \n" )
k = 1 ; i = 2 ;
while(i<p-l)
{
w h i l e ( k<= 2)
m ARRAYSm 43

{
i f ( mat[i] )
{
p r i n t f ( " \n\n It is not a tridiagonal
m a t r i x , \n" ) ;
i= p - n ; k = 3 ;

}
k++ , i + +;
}
i=i+n-l; k=l;
}
k++; i + +;
}
i=i+n-l ; k=l;
}
i f ( i== p+1)
p r i n t f ( "\n It is a tridiag onal m a t r i x \n") ;

}
We now present more examples on arrays.
Example 3.12: Given is a decimal number, find its binary equivalent and the number of l's in
the binary number.

/* Decimal to b i n a r y c o n v e r s i o n and */
/* Co u n t i n g of l's in the b i n a r y numb e r */
# include < stdio.h>
# include < m a t h . h >
# def ine SIZE 20
main ( )
{
int r e s u l t [ S I Z E ] , num, temp, count, tag ;
p r i n t f ( " \n \n Enter the decimal n u m b e r :") ;
s c a n f ( "%d",, & nu m ) ;
/* F i n d the b i n a r y n u m b e r */
for( count =0, count <SIZ E ; ++count )
{
temp = num./2 ;
result[count] = num % 2 ;
44 ■ DATA STRUCTURES USING C ■

num = temp ;
i f ( temp == 0 )
{
tag = count ;
break ;
}
}
/* Print the result */
printf( "\n The binary number is :") ;
fort count = tag ; count > = 0 ; — count )
{
printf( "%d, result[count]) ;
i f ( result [count] == 1 )
temp+= 1 ;
}
printf( " \n \n There are %d l's in the
binary number ", temp ) ;
}
We now explain the above program through the following tables. Assume that input

Table 3.1

num count temp result [count] taa


= num /2 = num %2

9 0 4 1
4 1 2 0
2 2 1 0
1 3 0 1 3

Table 3.2

count result [count] result [count] = = 1 temp=temp +1

3 1 Yes 0 + 1= 1
2 0 No 1
1 0 No 1
0 1 Yes 1+1=2
m ARRAYSm 45

Example 3.13: Write a program to generate pascal triangle for a given number of rows. The
output is given in the following form:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1

/* Generate Pascal triangle */


# include < stdio.h >
# include < math.h >
main ( )
{
int h , t [40] [ 40], i,j;
printf( "Enter the height of the Pascal triangle:");
scanf( " %d", &h );
t[0][0]= 1 ;
printf( " % 6d", t [0][0] ) ;
printf( " \n \n" ) ;
for( i =1 ; i< h ; i + +)
{
t [ i ] [ 0 ]= 1 ;
printf( " % 6d", t [ i ][ 0 ] ) ;
for( j a 1 ; j < = i ; j+ + )
printf( " \n \n") ;
}
}
This program generates a Pascal triangle of height h, given as input. The triangle here is
a right-angled triangle. The first column consists of number 1 and the rightmost column also
has a number 1. The intermediate rows and columns consist of numbers whose values are de-
termined by the following formula:
t [i] [j] = t [i-1] [j] + t [i-1] [j-1]
The first row has only one number, that is, 1, so it is printed separately. The other rows
are printed in a nested for loop. The variables i and j start from 1 since 'for loop' prints output
from second row and other rows are generated using the formula.
Example 3.14: Write a program to obtain the secondary numbers by progressively cancelling
the numbers from a list of integers 1,2,3,..., n which are multiples of 2 in the first pass, multiples
of 3 in the second pass, and so on. The process will continue until there are less numbers in the
list than the specified pass.
46 ■ DATA STRUCTURES USING C ■

/* Generate secondary numbers from a list */


# include < stdio.h >
# include < math.h >
main ( )
{
int n, a [200] , i, j, z ;
p r i n t f ( " Enter the number:");
s c a n f ( "%d", &n ) ;
printf ( "\n Write the numbers from 1 to n:");
for( i = 0 ; i < n ; i + +)
p r i n t f ( " %3d", a[i]=i );
p r i n t f ( "\n" ) ;
a[ i+1] = 0 ; /* The last position is used as */
/* a sentinal value */
f o r ( j =2 ; j <= s q r t ( n ) ; j++ )
{
i = j ;
w h i l e ( a [i]!= 0 )
{
z = i ;
w h i l e ( a [z]!= 0 )
{
a[z]= a[ z+1] ;
z = z +1;
}
i = i + j -1;
}
}
p r i n t f ( " \n \n") ;
p r i n t f ( "The secondary numbers are \n") ;
i = 1;
w h i l e ( a [i ] ! = 0 )
p r i n t f ( " \n \n", );
p r i n t f ( " The secondary numbers are \n",);
i = 1 ;
while( a[i] ! = 0 )
■ ARRAYS M 47

printf( " %3d " , a[i]);


i++;
}
printf( "\n" );
}
Input: Enter the number:10
Outputs Write the numbers from 1 to n
1 2 3 4 5 6 7 8 9 10
The secondary numbers are
13 7

Example 3.15: A pair of positive numbers is said to be AMICABLE if the sum of the divisors of
the first number is equal to the second number and the sum of the divisors of the second num-
ber is equal to the first number. The divisors should include 1 but leave the number itself. Write
a program to find if a pair of numbers num [0] and num [1], as input, are AMICABLE or not
AMICABLE.

# include < stdio.h >


# include < math.h >
/* AMICABLE NUMBERS */
main ( )
{
int n u m [2 ] , su m [2 ] , i , j ;
printf( " \n \n Enter the two numbers to be tested" );
scanf( " %d %d", &num[0], &num [1] ) ;
sum[0 ] = 1 ; sum[1] = 1 ;
for( i=0 , i<= 1 ; i++ );
{
/* Search for factors */
for-( j = 2 ; j <= num[i] /2 ; j++ )
i f ( num [i ] % j = = O')
s um[i] + = n u m [i] / j ;
}
If ( num[0] = = sum[l] & & num[lJ...-= = sum[0] )
printf( " \n The numbers are AMICABLE : %d , %d", num
[0] , n u m [1 ] ) ;
else
printf( " \n \n The numbers are not AMICABLE", );
48 ■ DATA STRUCTURES USING C ■

At the beginning, the program prompts for two inputs. Next, the factors of each number
are found using 'for' loop. The number whose factors are to be calculated is divided repeatedly
with 2, 3, 4,..., upto num [i] / 2 where num [i] stores two numbers (i =0 and i = 1). If there is no
remainder then the quotient is added to the variable sum [i] at the end of the second 'for loop.'
The first 'for loop' is used to find sum [i] twice for i = 0 and i= 1, since two numbers have been
taken as input. Lastly, we test for the amicability of the two numbers. It should be noted that the
sum of factors of a number should exclude the number itself. Also since 1 is a factor of every
number, so sum [i] is assigned a value 1 initially for i=0 and i= 1.
Example 3.16: Write a program to sort the list of 100 vouchers in their increasing order of voucher
numbers. Also trap and list out the in-between missing voucher numbers.

# include < stdio.h >


/* Sorting of 100 vouchers in ascending */
/* order of sequence */
main ( )
{
int i , j, Vno[100], temp ;
for( i = 0 ; i < 100 ; i++ )
scanf ( "%d", &Vno[i] ) ;
for( i =0 ; i < 99 ; i++)
f o r ( j = i + 1 ; j < 100; j++)
i f ( V n o [i ] > = Vnot j] )
{
temp = Vno[i] ;
V n o [ i ] = V n o [j ];
Vno[ j] = temp;
}
temp = V n o [0];
for ( i =1 ; i < 100 ; i++ )
{
while ( temp < Vno[i] )
{
printf ( " % d \n", temp + 1 );
temp++ ;
}
temp=Vno [i] ;
}
}
Example 3.17: The three numbers 192,384, and 576 are called triad numbers since they have the
following properties:
m ARRAYSm 49

(a) Each is a three-digit number.


(b) Each of the digits 1,2,..., 9 occurs only once in all the three numbers.
(c) The second number is twice the first and the third number is three times the first.
Write a program to find all triad numbers below 1000.

# include < stdio.h >


/* Triad numbers */
main ( )
{
int a [ q ] , p, i, r, j ;
int H, T. U, X, flag;
p r i n t f ( * \n \n Triad numbers are \n \n" );
f o r ( i = 123 ; i<= 329 ; i + +)
{
flag = 0;
for( P = 0 ; P< q ; P++) /* Initialize with
zeroes */
a [P] = 0 ;
f o r ( j = 1 ; j <3 ; j++)
{
x = j * i;
H = x/100;
r = x % 100;
T = r / 10;
U = r % 10;
if( H != 0 & & T != 0 && U !=0 )
{
a [H-l]= H ;
a [T-l]= T;
a [U- 1]= U;
}
}
f o r ( P =0 ; P<q ; P++)
i f ( a [p] == 0
flag = 1;
if( ! flag )
p r i n t f ( " i = %d, i*2 = %d, i*3 = %d\n", i, i * 2,
i * 3 );
}
}
50 ■ DATA STRUCTURES USING C ■

This chapter described the representation of linear data structure by using one of the
methods of sequential allocation storage. Other methods will be discussed later. Although this
method of allocation is suitable for certain applications, there are many other applications where
the sequential allocation method is unacceptable. This chapter illustrated initially the represen-
tation of arrays in C language. Next, the different operations on arrays were given and ex-
plained the process for initializing elements in the array. Finally, multidimensional arrays and
the concept of row-major order and column-major order were introduced. Examples are given
to illustrate different methods.

E XiE RiCvh SiEsS


1. Why is an array preferred instead of using a number of variables?
2. Write a program to initialize an array letter [26] with twenty-six uppercase characters.
3. What is the error in the definition statement?
static int s [3]={1,2,3,4,5};
4. Write a program to find the kth smallest element in an array number [n], where n and
k are given as input.
5. A company has four salesmen A,B,C, and D. They returned sales order at the end of each
day. There is a possibility that some of the orders may not be returned by the salesmen.
Write a C program to find the missing orders at the end of a particular day. Assume that
there are n number of sales orders to be processed during a day.
6. In an examination there are four subjects — Subject 1, Subject 2, Subject 3, and Subject 4. In
a particular year, 1000 students sat for the examination. Write a program to read marks for
each student and calculate total marks obtained by each student.
7. Assume that an array A has n numeric values. Write a C function to calculate arithmetic
mean or average x of the values x1,x2...xndefined by
x = (xx + x2+ ...+ xn) / n
8. An automobile company uses an array, items, to store the number of automobiles sold each
year starting from 1983 to 1997. Write a C program for each of the following tasks:
(a) To find total sale for all the items.
(b) To print the years in which maximum items were sold.
(c) To print the years in which minimum items were sold.
•9. Professor Guess lives in a hostel where the rooms are sequentially numbered from 1. It is
interesting to note that the sum of the room numbers before the professor's room number is
equal to the sum of the room numbers after his room number. Write a C program to print all
the room numbers starting from 1 to the last room number of the Hostel and store these
numbers into an array, called Hostel.
10. A magic square is a NxN matrix with each element being an integer from 1 to n-square and
the sum of each row, column, and the two main diagonals being the same. Here is a magic
square of N = 3:
6 1 8
7 5 3
2 9 4
■ ARRAYS M 51

Write a C program to generate and print a magic square for a given N, which should be
odd.
11. The following table shows the average rainfall of a city month-wise during a year.

Month Jan Feb M ar A pr May Jun Jul A ug Sep Oct Nov Dec

Rainfall 5 3 2 3 4 8 12 12 11 9 3 4
in
centimetres

Determine the following:


(a) Which months have an average rainfall less than 4 centimeters?
(b) Which months have the highest and which has the lowest average rainfall?
(c) What is the average rainfall during the months June to October?
4
STRING PROCESSING AND PATTERN MATCHING

In the previous chapter, we studied the representation of simple data using an array. These
representation had the property of storing homogeneous type of data. One of the primary inter-
ests of today's computer processing concentrates on string processing, broadly called as text
processing. Such processing usually involves some type of pattern matching. Pattern matching
is the process of finding a pattern within a string of character text. The answer may be (1) whether
a match exists or not, (2) the place of (the first) match, (3) the total number of matches, or (4) the
total number of matches and where they occur. We discuss in this chapter fundamentals of
string representation, string manipulations, string functions, and pattern-matching algorithms.
Three string matching algorithms— straightforward, Kunth-Morris-Pratt, and Boyer-Moore—
are examined and their time complexity will be discussed. Algorithms and C programs for
different string processing methods will be presented.

4.1 INTRODUCTION TO STRING PROCESSING


A string is a special type of linear one-dimensional array. A book can be thought of as a string of
characters. A program can also be looked as a string of characters. String is an important data
structure widely used in computer science for storing and manipulating textual data. Thus it is
important to study techniques for representing and manipulating strings because of their wide-
spread use.
A number of programming languages provide strings as a virtual data type of the lan-
guage. Some languages have a special library for handling string operations. String manipula-
tions can also be performed using arrays and lists. A string differs from the one-dimensional
array in three basic ways: (i) a string is a one-dimensional array of characters that are usually
restricted to the set of printable characters and a few control characters that are useful in text
based applications, (ii) a string is a dynamic data structure which grows in size as new charac-
ters are added and contracts as characters are removed, in contrast to the static fixed-length of
the array structure, and (iii) the length of the string can be obtained through a function.
In some applications, it is necessary to process strings whose length cannot be predicted
in advance of the running of a program. It may arise in text-processing systems or information-
retrieval applications where the size of data to be read is not foreknown. A basic idea for repre-
senting strings of varying size is to use a linked list. A pointer can specify the location of the
block containing next character in a string, allowing the characters to be stored in noncontigu-
ous blocks of memory. Compared to the array representation of strings, the use of pointer makes
string operations considerably easier. Linked representation of text will be discussed in the
subsequent chapter.
We now have several questions. What type of data structures can be useful in storing
character strings? Which operations on characters are to be considered? What are the perfor-
■ STRING PROCESSING AND PATTERN MATCHING ■ 53

mance considerations relating operations to string data structures? How pattern matching op-
erations are performed? In this chapter we will find those answers. This chapter also explores a
number of possible representation choices and presents some well-known algorithms for imple-
menting string-manipulation tasks.

4.2 STRING REPRESENTATION


Strings are an important part of any program. We can use them to write messages to the users,
to read text files, and generally to make interaction with users smoothly. Computers are fre-
quently used for processing or manipulating non-numeric data, called character data. String
manipulating is often a large part of any programming task. Such processing usually involves
some type of pattern matching, that is, to see if a particular word S appears in a given text T. So
it requires to study string-manipulation routines. In this subsection we will look at how strings
are represented and manipulated in C language.
A finite sequence S of zero or more characters is called a string. The number of characters
in a string is called its length. The string with zero characters is called the empty string or the
null string. Normally specific strings will be denoted by enclosing their characters in double
quotation marks. This mark will also act as string delimenter. For example,
"Computer" "It is a programme"
are strings with length 8 and 15 (including blank spaces, if any, since the blank space is also a
character).
Before examining the string-manipulation functions, we first review the internal repre-
sentation of character strings in C.
Character strings in C are stored in contiguous memory locations to hold successive char-
acters. To create a string variable, we must allocate enough space for the number of characters
in the string and the characters are terminated by the null character (\0). For example, if we
want to create a string variable to store the word "hello", we would need an array of at least six
characters. The characters are stored as follows:
0 1 2 3 4 5

'h' 'e' T T 'o' '\0'

The string variable must be declared in the following way:


char String[6]; /* String is the name */
/* of the string variable */
We can also define string constant for a string variable. The technique for initializing a
string is different from the array initialization. For example, if we define the constant vowels,
# define Vowels "aeiouAEIOU"
the constant is stored in memory as
,a''e''i''o''u//A''E',I''0''U''\0'
Another way in which we can define the vowels of the english language through an array is
char Vowels[10]={'a ', 'e', 'i', 'o', 'u', 'A', 'E ',
'I', 'O', 'U',};
The C compiler will append the null character to both types of initialization. So it is not
required in the initialization.
54 ■ DATA STRUCTURES USING C ■

When creating character strings, however, we must append the null character at the end
of the string. It helps to search the end of the string. The following 'for' statement illustrates the
process for finding the end of the string.
for(i=0; s t r i n g [ i ] ! =NULL; i + + )
or
f o r ( i = 0 ; s t r i n g [ i ] ! = ' \0 7 ; i + + )
The word ' NULL' can also be used in place of ' \0 ' .
As we will see later on, the majority of string manipulating functions search for the end of
the string in the above manner. These routines can also be implemented with pointers instead of
character arrays.
The implementation of arrays in C is strongly tied to pointers. In many ways, array in-
dexing simplifies pointer arithmetic for the program. In C, the name of an array is a pointer to
the first element in the array. A pointer references a location in memory and a pointer to a
variable is created by the use of the asterisk (*) before the variable. To access the value contained
in the location referenced by the pointer, the asterisk is used.
Pointers are convenient for string functions. One advantage of using pointers to charac-
ter string is that we need not worry about the dimension arrays to contain the characters. When
pointers to character strings are used in functions, a pointer to the first character in the string is
only to be passed in a similar manner for arrays of characters. Once the pointer is assigned to the
locations of the first character in the string, we can move further through the characters con-
tained in the string by incrementing this pointer.

4.3 STRING MANIPULATION


A string is viewed simply as a sequence or linear array of characters. Various string operations
have been developed which are not normally used with other kinds of array. The importance of
these string manipulation routines will become readily apparent when these routines are used
to implement the complex programs. The development of these programs was greatly simpli-
fied by using the string functions. This subsection discusses these string functions though some
of these functions are available in the standard C library. The next subsection shows how these
operations are used in pattern matching.
The number of characters in a string is called its length. We write a function len (S t r i n g )
for the length of a given string. Thus
l e n ( "Program") = 7 , len ( " Thi s i s " ) = 7
l e n ("") = 0
The function le n (S t r i n g ) is given below:
Example 4.1

/* Returns a count of the number of characters */


/* in the variable string */
len(String)
char s t r i n g [ ];
{
■ STRING PROCESSING AND PATTERN MATCHING ■ 55

int i;
for( i=0; string [i]!='\0' ; i++)
/
return(i) ; /* Number of characters in the string */
}
The substring of a string returns any length of consecutive characters. A substring function
requires three parameters.
(i) The name of the string or the string itself.
(ii) The position of the first character of the substring in the given string.
(iii) The length of the substring or the position of the last character in the substring.
(iv) Copy of substring into another string. We now write the s u b s t r ( S I , K, L , S2 ) function
below:
Example 4.2

/* This function copies the substring of a string */


/* SI beginning in a position K and having a length */
/* L into another string S2 */
Substr(SI,K,L,S2)
char S I [ ], S 2 [ ] ;
int K,L;
{
int i, Ll,j;
Ll= len(Sl) ;
if ( L>0 ScSc L1>L)
Ll=L;
else
if (L>0 ScSc Ll< L)
p r i n t f ("Original string is too short") ;
for( j =0, i=k ; i<Ll ; i + + , j ++)
S2 [j ]= S [i] ;
S 2 [j ]= '\ 0 ' ;
}
Let us illustrate the S u b s t r ( SI , K, L , S 2 ) function with Table 4.1.

Table 4.1 Example of Substring Function

Original String (SI) Substr (SI, K, L, S2) Content of (S2)


'Good Morning' Substr ('Good Morning/ 6,3,S2) 'Mor'
'To be or not to be' Substr ('To be or not to be 4,7,S2) 'be or n'
'The end' Substr ('The end', 4,4,S2) 'end'
/ / a space character.
56 ■ DATA STRUCTURES USING C ■

We now present some more useful functions such as Insert ( ), Index ( ), and remove ( ) .
Example 4.3

/* Insert the contents of the substring into */


/* another string at the character position specified */
Insert(SI, Sub, Position)
char S I [ ], S u b [ ];
int position;
{
int i, j , k, flag ;
char t e m p [80];
/* To check whether the position is valid */
flag = position >= Len(Sl)?l:0 ;
i f (flag)
re t u r n (-1);
/* Copy characters into temporary string */
/* prior to the position */
for(i=0, j=0; i<position , i + +, j++)
temp[j]=S1[i];
/* Append the substring to current contents */
/* of the temporary string */
for(k=0; S u b [ k ] !=#\ 0 '; ++k, j + + ) •
temp [j] = S u b [i ] ;
/* Now append the remainder of the string*/
/* to the temporary string */
w h i l e (t e m p [j + + ]=S1[i + + ])
/
/* Copy contents of the temporary string back */
/* into the original string */
strcpy(Sl, temp); /* Use of C-Library function */
/* to copy the string */
/* from temp to SI */
}
Example 4.4

/* This function returns the position of the first */


/* occurrence of the substring S2 in the string S2 */
■ STRING PROCESSING AND PATTERN MATCHING ■ 57

Index(SI, S2)
char S I [ ], S 2 [ ];
{
int i , j , k ;
for(i=0 ; S I [i ]! = '\0' ; i++)
for(j=i, k=o; S2[k] = = sl[j] ; k+ + , j++ )
if(S2 [k+1]= =' \ 0 ' )
return(i); /* Substring is found */
/* Substring is not found */
re t u r n (-1);
}
We now present two more functions, namely, remove (S ), and rem ove_al l_ b la n k s
( S I ). The first function removes all trailing blanks and the second function removes all blanks
from the given string.
Example 4.5

/* Removes trailing blanks from the string S */


re m o v e (S )
char S [ ];
{
int i=0;
/* Find newline character */
while(S [i++] ! = '\n' )
/
w h i l e ( i> =o&&S[i]!= ' ')
i— ;
if (i>=0)
{
++i ;
S[i] = '\n#; /* Insert new line character*/
/* again*/
++i ;
S[i]='\0/; /* Insert null character */
}
return(i);
}
We now consider another string function called append (S I , S2 ). It appends the con-
tents of SI to the contents of S2. For example, if SI contains 'C' and S2 contains 'program in',
then the content of S2 will be program in C after append (S 1 , S2 ) was called.
58 ■ DATA STRUCTURES USING C ■

Example 4.6

/* Appends the contents of SI to the contents */


/* of S2 */
append(SI,S2)
char S I [ ], S 2 [ ];
{
int i, j ;
/* Find the end of i S2 */
for(i=0; S 2 [i ]! = '\0'; i++)

/* Append SI to S2 T*/
f o r (j = 0; (S 2 [i ]= S 1 [j ]) ! = '\0'; + + i, + + j)
/
}
The above function appends the contents of SI to the contents of S2. We can also utilize
the function to append the contents of S2 to the contents of SI by interchanging the arguments,
that is, append (S2 , S I ) instead of append (S I , S2 ) .
The function Remove_all_blanks () deletes all spaces from a string. Some more ex-
amples are also given.
Example 4.7

/* Remove all of the blanks in the character string S */


Remove_all_blanks (S)'
char S [ ] ;
{
int i, j ;
char temp [80];
/* Remove all blanks in the string*/
for(i = 0, j = 0; S[i]!= '\0#; i + +)
if(S[i]!= #b ' )
t e m p [j + + ]= S [ i ] ;
t e m p [j ]= '\0 7;
strcpy(S,temp);
}
Example 4.8

/* Appends blank spaces prior to the first character*/


/* in string S */
■ STRING PROCESSING AND PATTERN MATCHING ■ 59

/* S— String and N — No. of blank spaces to be inserted */


Pad (S,N)
char S [ ];
int N;
{
int i, count;
char t e m p [80];
for(Count =0; Count <N; Count++)
temp[Count] = "
temp[Count] = '\ 0' ;
/* Append the contents of the string to the */
/* string S */
a p p e n d (S , temp);
strcpy(S, temp);
}
Example 4.9

/* The program extracts substring from text */


char t e m p [80];
main ( )
{
char t e x t [80] ;
int i, m; / * i - String, m- end of substring from initial */
gets (text);
s c a n f ("%d%d",& i # &m) ;
s u b s t r (tex t , i ,m ) ;
p r i n t f (" \n Substring from i =%d to m=%d is : %s", i,m, temp);
}
s u b s t r (text,i ,m)
char t e x t [80] ;
int i , m;
{
int j, k=0 ;
f o r (j = i ; j < = m ; j + + )
temp[k++]=text[j ];
temp[k] = 'Non-
return;
}
60 ■ DATA STRUCTURES USING C ■

Input:
abcdefgh
2 4
Substring from i=2 to m=4 is: cde
Let us now write an algorithm for finding the location where the pattern P is a leftmost substring
of string S.
Algorithm
Input: S and P are two strings with lengths n and m, respectively, and are stored in an array. It
is assumed that the value of m is less than or equal to n.
Output: If the pattern P is a leftmost substring of string S starting at the kth character of S, this
algorithm returns k, otherwise it returns 0.
Stepl: Set k <— 1, j <— 1 and X <—m-n+1
Step 2: If k > x then return 0 and exit____________________
Step 3: If P[j] = S [ k+j-1] and j=m then return k and exit.
Step 4: Repeat Step 5 while P[j]= s[k+j-l] and j<m
Step 5: Set J <— j+1
End of loop.
Step 6: If P[j] # S [k+j-1] then Set k<— k+1 and Set j<— 1
Step 7: Go to Step 2
Step 8: Exit
It is left to the reader as an exercise.

Pattern matching is an important problem that occurs in different areas of computer science
and information processing. The basic objective of the problem is to detect the occurrence of a
particular string of characters (the pattern) as a substring in a sequence of characters (the input
string). For example, the input string 'computer science' contains the pattern 'science' as a
substring. Many text editors and programming languages (such as C, PASCAL, etc.) have facili-
ties for matching strings. Pattern matching is one of the central and most widely studied prob-
lems in theoretical computer science.
There are three basic approaches for implementing pattern-matching algorithms. Each of
them is conceptually simple but the performance of the third one in many cases is the fastest-
known pattern-matching algorithm in both theory and practice.
The first approach, called the brute-force algorithm, is the one that first comes to mind
whenever the pattern-matching problem is addressed. The pattern is placed over the input
string at its extreme left. The pattern and input characters are then scanned to the right for a
mismatch. If a mismatch is found, the pattern is shifted one position to the right and the scan is
started again at the new position of pattern. Whenever a partial match fails a backtracking is
necessary over the pattern.
In the second approach, initial pattern and input string placement are as in the brute-
force algorithm and scanning is done to the right. When a mismatch occurs, however, the pat-
■ STRING PROCESSING AND PATTERN MATCHING ■ 61

tern is shifted to the right in such a way that the scan can be restarted at the point where a
mismatch occurs in the input string. In this case no backtracking is therefore required.
In both approaches the pattern is scanned from left to right and each input character is
checked at least once. This approach achieves great speed by skipping over portions of the
input string that cannot possibly contribute to a match.
In the next subsections we will present three approaches for pattern-matching problems.
For convenience, we will assume P=P1, P2,..., Pmof length m and S= Sv S2,... Snof length n where
p. represents the ith character of the pattern and $ the jth character of the input string.
Note that when we will implement algorithms through C programs, we will use pattern
as P^ Pj,..., Pml and input string as S^ S^..., Sml since the index of array starts from 0 in C.
Further, it is straightforward to generalize these algorithms to locate all occurrences of
the pattern in the input string.

4.5 THE BRUTE-FORCE ALGORITHM


The initial algorithm is the obvious one. First, the pattern and the input string are stored in two
arrays. Next it looks for a match by comparing the patterns pv p2,...,pmwith the m-character
substring Sk, S k+1, •••,Sk+m_1 for each successive value of K from 1 to n-m+1. It compares the
pattern with the substring of the input from left to right until it either matches all of the pattern
or finds that P2#Sdfor some l<i<m and K<j<k+m-l. This is shown in Fig. 4.1.

P l ................. ............. P i ........................... P m

S1 .......... s k..... ......... Sj............ S k + m -l......................S n


j

Fig. 4.1 Illustration of pattern matching

When a mismatch occurs, it slides the pattern one character to the right and starts looking for a
match by comparing P1with Sk+1. The following algorithm illustrates the Brute-Force approach.

4.5.1 Algorithm: Brute-Force Pattern Matching


Input: P and S, arrays of pattern and input string characters. m>0 and n>0 are the number of
characters in P and S, respectively.
Output: If P is the leftmost substring of S at the ith character of S, the algorithm returns the
location, i, of the pattern in the input string. If P is not a substring of S, the algorithm returns a
message 'pattern not found'.
Step 1: Set i ^-1 /* Current guess where pattern begins in input */
Step 2: Set j <—1 /* Pointer into the input string */
Step 3: While ( i<= m and j<= n)
Step 4: If (P[i] = = S [j] then i <— i+1 and j = j+1
else
i <— 1 and j = j-i+2
62 ■ DATA STRUCTURES USING C ■

Step 5: Wend /*End of while loop */


Step 6: If i>m then
return ('Pattern is found at', i)
else
return ('Pattern is not found')
Step 7: Stop
The worst-case execution time occurs when P=am_1b and S=an, that is, for every possible
starting position of the pattern in the input string, all but the last character of the pattern matches
the corresponding character in the input string. In this case 0(mn) comparisons are needed to
determine that the pattern does not occur in the text. In practical situations, the expected perfor-
mance of the brute-force algorithm is usually 0(m+n), but a precise characterization depends on
the statistical properties of the pattern and input string. The algorithm requires a buffer of size
0(m) to hold the pattern and the m-character substring of the input.
Besides its 0(mn) worst-case execution time, the brute-force algorithm has another prop-
erty that makes it undesirable in certain applications. It requires backtracking on the input string.
This leads to inefficiencies if the entire input string is not available in memory and buffering
operations are necessary.
We now present the C program for the Brute-Force algorithm.
Example 4.10

/* Brute-Force algorithm */
# define MAXPATLEN 80
# define MAXTEXLEN 80
# include <stdio.h>
# include <string.h>
main ( )
{
char P[MAXPATLEN] , S[MAXTEXLEN] ;
int m, n ; /* m— Length of P, n- Length of S */
printf("\n Enter the text:");
gets(S);
printf("\n Enter the pattern to be matched:");
gets(P);
m=strlen(P); /* Length of pattern using strlen ( ) */
n= strlen(S); /* Length of text using strlen ( ) */
/* strlen ( ) is a C-Library function */
Brute _ Force (S,P, m, n , ) ; /* Function call */
}
/* Brute-Force function */
■ STRING PROCESSING AND PATTERN MATCHING ■ 63

Brute_Force (S,p,m,n )
char S [ ] , P [ ] ;
int m,n;
{
int i, j , k;
i = 0 ;
j = 0 ;
k = -1 ;
while ( i<m && j<n)
{
k = k+1 ;
if(p[i] == S[j] )
{
i + +;
j ++;
}
else
{
i = 0 ;
j ++ ;
}
}
if( i==m)
printf ("Pattern is found at", k );
else
printf ("Pattern is not found");
return ;
}

4.6 KUNTH-MORRIS-PRATT ALGORITHM W K K R M


The second approach to pattern matching is described by Kunth, Morris, and Pratt (1979). We
refer to it as the Kunth, Morris, and Pratt algorithm. It is conceptually a simple modification of
the previous algorithm. This algorithm preprocesses the pattern and constructs a table showing
how far to move the pattern after each possible miss. The essence of the algorithm is to shift the
patterns to the right bases on the table. To illustrate the algorithm, we will assign specific mean-
ing to the terms match, hit, and miss. A match occurs when a substring of text, S, is same as the
pattern, P, itself. A hit happens when a single character of pattern P is the same as the corre-
sponding character of S. A miss occurs when a character of P is different from a corresponding
64 ■ DATA STRUCTURES USING C ■

character of S. It is possible to recognize whether P is a substring of S in time 0( I P I + ISI),


where IP I and ISI represent the length of P and S, respectively.
Consider the string (S) and pattern shown in Fig. 4.2. This example is taken from Kunth et
al. 1977.

S=' BABCBABCABCAABCABCABCACABC'

P = ' ABCABCACAB'
Fig. 4.2 String and pattern

We will assume that the search begins at the left end of S to be searched.
Suppose S=s2, s2,...sm, and P=p1, p2,...,Pn and assume that we are currently determining
whether or not there is a match beginning at s.. If s.#p1then we proceed by comparing s.+1 and pr
Similarly, if s. = p1and s.+1 # p2 then we may proceed by comparing s.+1=p1 In general, after
starting it is found that p^p^...,p. matches s^s^.^s.and s.+1 * P.+r
The best possible place to recover from this failure would be to slide P over the right so
that as many of its initial characters p^p^.^p.match as many of the final characters of s^s^.-.s. as
possible, provided we can continue matching s.+1 with p.+1from this position. Thus, we want to
find the longest head of P, p^p^./p.is equal to the tail of s^s^.^s., for which s.+1 = pi+r So we find
that the first j+1 symbols of P have been matched. We continue further pattern matching from
this position. To see how the method works, let us consider the following example.
Example: Consider the pattern P= abaababaabaab. The algorithm A produces Table 4.2.

Table 4.2 Pattern and Next Function

Index Pattern Next Function


i Pi
hl
1 a 0
2 b 1
3 a 0
4 a 2
5 b 1
6 a 0
7 b 4
8 a 0
9 a 2
10 b 1
11 a 0
12 a 7
13 b 1

Let us now apply Kunth-Morris-Pratt algorithm (Algorithm B) to look for the pattern P in
the input string abaababaabacabaababaabaab. Initially, the first 11 characters of P and S (the
input string) align successfully. Let i be the pointer of p and j be the pointer of S. When i=j=12,
the algorithm finds a mismatch at the input character C. The first iteration of the inner loop of
■ STRING PROCESSING AND PATTERN MATCHING ■ 65

algorithm B sets i=h12=7 ( from h.of Table 4.2). This has the effect of shifting the pattern five
characters to the right, so position 7 of the pattern is now aligned above position 12 of the input
string. At this point p. still does not match S, so i is set to h7=4. Mismatches continue for i=4,
2,1,0. At this point the inner loop is exhausted without finding a match and the outer loop then
sets i to 1 and j to 13. It is shown in Fig. 4.3.
i i
abaababaabaab
ab aab ab aa bac a b a a b a b a a b a a b
_______________________ j______________________
Fig. 4.3 A successful match

We now present the algorithm for Kunth-Morris-Pratt pattern matching technique.

4.6.1 Algorithm A: Kunth-Morris-Pratt Pattern Matching


This algorithm needs a tail h, called the next function, for locating a match between pattern and
input string. So we describe two algorithms A and B, for next function and pattern matching.
The algorithm B utilizes the algorithm A.
Algorithm A: Compute the next function h for pattern P=p1,p2,...,pm.
Input: P, a string of characters; m>0, the length of P.
Output: h, an array representing the next function for p.
Step V. Set i <—1, j <— 0 and h[l] <— 0
Step 2: While (i<m)
Step 3: While (j>0 and p[i] # p[j] )
Step 4: Set J h [J]
Step 5: Set i <— i+1 and j <— j+1
Step 6: If (p[i] <— p[j]) then h[i] <- h[j]
else
h[i] <— j
Wend /* End of inner while loop */
Wend /* End of outer while loop */
Step 7: Stop

4.6.2 Algorithm B: Kunth-Morris-Pratt Pattern Matching


Input: P and S, arrays of pattern and text characters; m>0 and n>0 the number of characters in
P and S, respectively; h, the next function computed by algorithm A.
Output: Success or failure indicator and if successful, the location of the pattern in the input
string.
Step 1: Set j 4- 1 and i 1
Step 2: While (i<= m and j<= n)
Step 3: While ( i>0 and p [i] # S [j] )
Step 4: Set i <—h [i]
66 ■ DATA STRUCTURES USING C ■

Wend / * End of inner while * /


Step 5: Set i <—i+1 and j <—j+1
Wend / * End of outer while * /
Step 6: If (i>m) then
return ('pattern found')
else
return ('pattern is not found')
Step 7: Stop
We now present the C program for Kunth-Morris-Pratt algorithm.
Example 4.11

/* Kunth-Morris-Pratt */
int h [80], m, n ;
main ( )
{
char pat[80], s[80];
int i ,j ;
gets(s);
puts(s);
gets(pat);
puts(pat);
m=strlen(pat);
n=strlen(s);
i=l;
j=l;
f(pat);
while (i<=m && j<=n)
{
while(( i>0 && (pat[i-l] != s[j—1]))
i = h [i - 1 ];
i+ +;
j+ + 7
}
if (i > m)
p r i n t f ("yes\n");
else
printf("no");
■ STRING PROCESSING AND PATTERN MATCHING ■ 67

f(pat)
char p a t [ ];
{
int i ,j ;
i = l, j =0 ; h = [0]=0;
while(i<= m)
{
while(( j>0) ScSc (pat[i-l]!=pat[j -1]))
j = h [j-1];
i++ ;
j ++;
if (pat[i-1] == pat [ j - 1 ] )
h[i-l] = h[j-1];
else
h[i-1]=j ;
}
for(i=0 ; i<m ; i + +)
p r i n t f (" %3d", h [ i ] );
p r i n t f ("\n");
}

4.7 BOYER-MOORE ALGORITHM ' M H H I


Boyer and Moore came up with another string-matching algorithm that finds whether a pattern
P is a substring of S. It is based on the ingenious observation that is more efficient to compare
characters starting at the right end of the pattern and working to the left. Since this algorithm
starts on the right of the pattern, it has the ability to pass more characters than it examines.
Kunth has proved that the Boyer-Moore algorithm has linear behaviour in the worst case. This
algorithm may be much faster than the previous two algorithms if the pattern length is suffi-
ciently large. The Boyer-Moore algorithm preprocesses the pattern and prepares two tables.
The first table stores the distance from the right end of the pattern to the first occurrence of each
letter of the alphabet. The second one contains a value for each character in the pattern.
The essential features of the Boyer-Moore algorithm can be summarized as follows.
(A) In this method character comparisons are made starting at the right end of the pattern
and moving towards the left. So some characters are never considered.
(B) This method preprocesses the pattern for creating two tables (as stated above). When
a miss occurs, the decision about how far to advance the pattern, is based on two possibilities.
(1) The value of the text character at which the miss occurs and the distance to the left at
which that value first occurs in the pattern.
(2) The value of all text characters that have been hit, together with the value that is
known not to occur because of a miss.
88 ■ DATA STRUCTURES USING C ■

(C) The pattern is moved the distance given by the maximum of (1) and (2) above.
We now explain the essential features of the Boyer-Moore algorithm in terms of S and P.
Suppose that we are checking the characters of P against those of S in right-to-left order.
If all characters of P match those of S beneath, we have found a substring of S to match P.
Initially, we compare pm(P= p1,p2,...,pm) with sm(S=s1,s2,...,smsm+],...,sn). If Smoccurs no-
where in the pattern, then there cannot be a match for the pattern beginning at any of the first
m characters of the input string. We can safely slide over the pattern m characters to the right
and try to match pmwith s2m. So we avoid m-1 unnecessary character comparisons.
Suppose we have just shifted the pattern to the right and are about to compare pmwith sk.
This is shown in Fig. 4.4.
i
4
Pm
S1......... k-m +l - A ......... .........s n
t
j
Fig. 4.4 Shift and compare Pmwith sk

We now present cases corresponding to (1) and (2) above.


Case 1: We found that pmand skdo not match. If the rightmost occurrence of skin the pattern is
Pm-g'we can the pattern g positions to the right to align pmgand skand then resume match-
ing pmwith sk+g. The situation is shown in Fig 4.5.
i
1
P , ........ ....... P m- 8 ...... - P m
S, ...... k-m +l+g S k ........... k+g
...... s n
t
j
Fig. 4.5 Compare pmW'th S K+g

If skdid not occur in the pattern, we would shift the pattern m positions to the right and
start comparing p with sk+m.
Case 2: Suppose that the last m-i characters of the pattern match with the last m -i characters of
the input string ending at position k, that is, p,+l, pi+2,- ,p m= sk_m+i+1, sk_m+i+2,...,sk.
If i=0, we have found a match, otherwise we consider two cases,' that is,' i>0 and p r i
=8,k-m+i
If the rightmost occurrence of the character sk_m+i in the pattern is p. ^ then we can simply shift
the pattern g positions to the right, so that p and sk_m+. align. The pattern matching starts again
uy comparing pmwith sk+g/ as shown in Fig. 4.5.
i
i
Pi....... ....... Pi-g..... ..... pm
S1......
s
k-m +l+g
........... S. .........
k-m+i k+g
.....s n
t
j
Fig. 4.6 Compare sk+gwith pm
■ STRING PROCESSING AND PATTERN MATCHING ■ 69

If p. gis to the right of p.(g<0), then we would instead shift the pattern one position to the
right and resume matching by comparing pmwith sk+r The first covers Case 1 and Case 2 in the
discussion above.
The first table, called dv determines how far to slide the pattern to the right when p.#s.. It
is indexed by characters. The table da is a function of the text character C for which the mis-
match occurred. For every character C^ da[C] is the largest i such that C=p.or C=m if the char-
acter C does not occur in the pattern.
Case 3: The second table, called d2, is a function of the position in the pattern at which the
mismatch occurred. Suppose suffix Pi+1,pi+2,—,pm reoccurs at the substring Pi+1_g/-*vPi+2_g/---Pm_gin
the pattern and P^P^g- If there is more than one such reoccurrence, we take the rightmost one.
In this case a longer shift than Case 2 may be possible by aligning Pi+1_g/-**/Pm_gabove sk m+i+1,...,sk
and restarting the scan by comparing pmwith sk+g(Fig. 4.7).
i
4
P i ................... P i + l - g ......................p m- g ........ - P m

s l. ..... k-m+g+1 k-m+i+1 k k+g


....sn
t
j
Fig. 4.7 Illustration of table d2

The table d2is indexed by positions in the pattern. For every l<i<m (m is the length of
pattern P), d2[i] gives the minimum shift g such that when we align pmover sk+gthe substring
Pi+1_g/—,pm_gof the pattern matches with substring sk_m+i+1,...,skof the input string, assuming p^id
not match sk_m+i. It is left to the reader as an exercise. Also, write a C program for the algorithm.
In this chapter, we have discussed the string-processing and pattern-matching problems. Algo-
rithms are given for solving string-matching problems that have proven useful for text-editing
and text-processing applications. In the next chapter, pointers in C will be discussed.

EmXmEmRmC I SiEnS
1. Write a C program to count all occurrences of a particular word from a given text.
2. Write a program that prints the user specified last n lines of its input. If n is more than the
number of lines in the input, all the lines should be printed. If n is not specified, a default
number of p lines should be printed.
3. Write a function that replaces the first occurrence of a given substring in a source string by
the specified substitution string.
4. Write a C program that converts a given string to its equivalent floating-point equivalent.
5. Write a function check ( s t r , c ) to print C' if C is in the string pointed to be Str. The para-
meter str is a pointer to a char, and the parameter C is a char. If C is not in the str,
return 0.
6. Write a C program to print the word 'Computer' in the following way:
Computer
Compute
70 ■ DATA STRUCTURES USING C ■

Comput
Compu
Comp
Com
Co
C
7. Write a C program that copies its input to output, except that it removes trailing blanks,
leading blanks, and tabs from the end of lines and prints only one line from each group of
adjacent identical lines.
8. Write a function to extract a portion of a character string and print the extracted string.
Assume that p characters are extracted, starting with n characters.
9. Write a function that appends a string to a given string. In the resultant string, any upper-
case character in the original string needs to be converted into lowercase, and vice-versa.
10. Briefly describe the merits and demerits of all the pattern-matching algorithms that were
discussed in this chapter.
5
POINTERS

The use of pointers is one of the most powerful features in C. Pointers are simply variables that
point to another variable. Relation between these two variables is established by the fact that
the value of the pointer variable is the address of the variable it points to. In this chapter we
discuss the basics, understanding the use of pointers.

5*2 FUNDAMENTALS AND DEFINING POINTERS


A pointer is a variable that holds the address of some other variable. Depending on the data
type of the variable, amount of bytes in memory is reserved. In the IBM range of machines a
character variable requires 1 byte of memory, an integer variable requires 2 bytes, and a floating
point-variable requires 4 bytes. Consider the following declarations:

Fig. 5.1 Relation between pointer variable and ordinary variable

To be precise, pointer to c means that the pointer variable that points to c holds the
address of variable c, that is it holds &c. Similarly a pointer to i holds the address of i (&i) and a
pointer of f holds the address of f (&f).
Thus, to get the address of a variable we make use of the & operator, which is an unary
operator. C provides the facility to store such an address to another variable, known as pointer
( because these variables are holding the address of another variable). Consider three pointer
variables: pc, pi, and pf. This means all of them can hold addresses of other variables. Since they
are variables, we must define them before their use. We will see the method of defining such
variables within a moment. As pc, pi, and pf are pointer variables we can easily make the
following assignments:
72 ■ DATA STRUCTURES USING C ■

pc=&c;
pi=&i;
pf =&f ;
Assume the following three variable definitions:
ch ar cc;
in t ii;
float ff;
C provides another unary operator * to get the content of an address. More pricisely, *pc
gives you the contents of address pc and we know that it is a character. So we can easily write
the statements of the form
cc=*pc;
Similarly, we can write
ii=*pi;
ff=*pf;
This highlights the fact that though the variables pc, pi, and pf are pointers, the data type
of their contents differs. More precisely, pc is a pointer that points to a character, pi is a pointer
pointing to an integer, and pointer pf points to a floating-point value. This distinction of point-
ers should be reflected at the time of defining these variables and we define these variables as
char *pc;
int *pi;
float *pf;
Clearly, the sequence of statements,
p i =&i ;
ii=*pi;
is equivalent to the statement
ii= i;
The statements
*pi=*pi+l;
*pi+=l;
and
++*pi;
are identical and increments (by 1) of what pi points to. We should note that pointers never
point to anything useful until they are initialized.

5.3 TYPE SPECIFIERS AND SCALARS FOR POINTERS


We have seen that an integer pointer is defined by the declaration
in t *pi;
Here the name of the pointer variable is pi and its type specifier is i nt . This means that
the pointer pi will be used with an i n t type value. The scalar for the pointer variable pi does
■ POINTERS M 73

not refer to itself, but refers to the element it points to. The scalar size for a pointer is defined by
its type specifier, which is i n t in this case. We may use the s i z e o f () operator to determine
the scalar size of a pointer.
By this we mean that for the pointer definitions
float *pf
double *pd
in t *p i
the scalar size of pf, pd, and pi sets to
s i z e o f ( f l o a t ) =4
s i z e o f ( d o u b l e ) =8
and s i z e o f ( i n t )=2
respectively. Actually the compiler requires this scalar of a pointer to perform the pointer op-
erations properly.

Sometimes we may need to test the pointer variables. Relational operators such as >=, <=,>,
<,==, and != may be applied to pointers only when both operands are pointers, for example,
if ( p n t r l >= p n t r 2 )
{

}
is acceptable, but the following example,
if ( p n t r > 50)
{

}
is invalid. The reason behind this is that the numeric constant 50 is not of pointer type. Note
that the equality (==) and inequality (!=) test operators may also be applied if one of the oper-
ands is a null pointer ( i.e., NULL o r' \ O'). This discussion is not yet sufficient because one may
seek to find the situation when pointers in comparisons are pointing to different data types. In
fact, we should not use these tests when the pointers point to different data types. Unfortu-
nately, some compilers do not detect this sort of errors and it may be very difficult to track the
bugs that crop up due to such pointer comparisons.

5.5 PASSING POINTERS TO FUNCTIONS


We have already seen that C does not provide any direct way for the called function to change
the value of a variable in the calling function. The reason is that, in C, the argument passing
between functions uses the 'call by value' method. So the following function in te r c h a n g e (),
to interchange two values cannot affect the arguments in the calling function.
74 ■ DATA STRUCTURES USING C ■

interchange (int a, int b)


{
int t;

t = a ;
a = b ;
b = t ;
return;
}
This is because the formal parameters a and b hold the private copy of the values of
corresponding actual parameters. To achieve the goal we may pass the addresses of the vari-
ables to the function. For example, to interchange two integers x and y, the function reference
may be of the form
interchange(&x, &y);
and the function definition may be written as the following:
interchange (int *pa, int *pb)
{
int t;

t = *pa;
*pa = *pb;
*pb = t ;
}
As earlier, this function does not have the provision to change the arguments, but as the
arguments are pointers, it can very well alter the contents of the pointers. So by passing pointers
as function arguments we can bypass the problem of the 'call by value' technique.

5.6 POINTERS AND ARRAYS, POINTER ARITHMETIC


One of the striking features of C pointers is its relationship with arrays. In fact, an array refer-
ence is converted to a pointer expression in C. Recall that an array name is constant, which
holds the address of the first element of the array. To get a clear view, let us consider the follow-
ing definitions:
int *px ;
int x [5 ] ;
This implies that px is a pointer variable that can hold the address of an integer, and x is
an array of five integer values. The elements of the array are
x [0 ],x [1 ] ,x [2 ] ,x [3 ],x [4 ]
and x is the array name which is a constant and holds the address of x[0]. Pictorially, it may be
represented as Fig. 5.2.

X —» x[0] x[l] x[2] x[3] x[4]

Fig. 5.2 x is the pointer constant pointing to x[0]


■ POINTERS ■ 75

So the assignments
px=&x[ 0 ] ;
and
px=x;
are equivalent.
Now, if we write the statement

px = px+1; o r px+ +;

it will advance px by 1, meaning px will now point to the next element of the array, that is, x[l].
More generally, if px points to x[0], then (px+i) will point to the element that is i elements after
px and this is true regardless of the size of the elements of the array. This means that * ( px+i) is
identical to x[i]. In fact, *( x+ i) is same as x[i], since x is holding the address of x[0].
To visualize this fact we need to understand the pointer arithmetic little clearly. When a
pointer is added to or subtracted from an integer, the scalar size of the pointer comes in use. In
fact, this integer is scaled by the scalar size. That is, to the compiler the expression

* (px+i)
looks like
* ( p x + ( i * s c a l a r s i z e of p x ) )
= * (px +i * s i z e o f ( i n t ) )
= x [i]
This indicates that (px +i) points to the ith elements of the array if px points to the first
element of the array, irrespective of the type of the array elements.
We should note that a statement like

x = px;

is an illegal statement as x is a constant. Like addition, a pointer may be subtracted by an integer


to point an element before it. We should also note that two pointers pointing to the same type
may be subtracted to yeild the distance between the elements they point to. These features of
pointers are known as address arithmetic in C.
As mentioned in the earlier section we know that when an array is passed as a parameter
to a function, what is passed is the address of the first element of the array. Example 5.1 is
presented for visualization of this fact which is a program for arranging an integer array by
using a function.
In this example we called bubs o r t () function to perform the array sorting. It receives
the array to be sorted together with the array size and sorts the array within the function. Notice
that the array elements may be changed inside the function. Another version of the same prob-
lem is presented in Example 5.2, which uses pointers instead of an array.
76 ■ DATA STRUCTURES USING C ■

Example 5.1: Arranging an integer array in ascending order of sequence.

/* The following program uses Bubble Sorting technique */


#include <stdio.h>
#define SIZE 10
m a i n ()
{
int i ,a[SIZE];

for (i=0; i<SIZE; i++)


scanf ("%d" , &a[i]);
bubsort (a,SIZE);
for (i = 0; i<SIZE;i++)
printf ("%d%c", a[i], (i + l<SIZE)? \ t ' : '\n' )?
}
bubsort (int x[], int limit)
{
int bound ,i, temp, flag ;

bound = limit-1;
do
{
flag = 0;
for ( i=0; i < bound ; i++ )
if ( x[i] > x[i+l] )
{
temp = x [i ];
x[i] = x[ i + 1 ] ;
x[i+l]= temp;
flag = i;
}
bound = flag;
} v/hile ( bound ) ;
return ;
}
The version of the program given in Example 5.2 needs little discussion. The program
receives n (the number of elements to sort) from the user.
■ POINTERS M 77

Example 5.2: Arranging a set of integers in ascending order of sequence.

/* The following program uses Bubble Sorting technique */


/* The program uses pointers instead of arrays */
# include <stdio.h>
ma i n ( )
{
void *calloc(), free();
int i, n, *a;

printf ("Enter number of elements ( integers ) :") ;


scanf ("%d\n" , & n ) ;
a = (int *) calloc(n, sizeof(int)) ;
for ( i=0 ; i<n; i + + )
scanf ( "%d" , a+i );
bubsort (a,n) ;
for (i = 0; i<n; i++)
printf ( "d%%c", a[i], ( i+l<n )? '\t#:'\n');
free (a);
}
bubsort (int *x, int limit)
{
int bound, i, temp, flag ;

bound = limit -1;


do {
flag=0 ;
for ( i=0; i < bound; i++)
if ( *(x+i) > *(x+i+l)) {
temp = *(x+i) ;
*(x+i) = *(x+i+l) ;
*(x+i+l) = temp ;
flag = i ;
}
bound = flag ;
} while ( bound ) ;
return ;
}
The statement
a = (int *) calloc (n , sizeof(int)) ;
78 ■ DATA STRUCTURES USING C ■

is new to us. The call to the function c a l l o c ( a, b ) , which is a library function, reserves a
memory area for storing a number of elements where the size of each element is b bytes and
returns the pointer to this area.
This pointer is of void type. To convert it to a pointer of integer type we write
( i n t *) c a l l o c ( a , b ) ;
which is known as casting. This function is used to allocate storage space dynamically (at
runtime). Another function that may be used to allocate storage space dynamically ismal lo c ()
function. This also returns a pointer to void type. A call to this function looks like
malloc(a)
It reserves a bytes in memory and returns a pointer to this memory area.
Check the first 'for' statement in the example that reads n integer values from standard input
device and stores them in the locations pointed by
a, a +1 , a+2,..., a+ n-1
The function bub s o r t () receives the address of the first element and the number of elements
to sort. The rest of the program is self-explanatory, except for the function call f r e e (a). This
function is used to deallocate the storage space reserved earlier by c a l l o c or m a l l o c which
is pointed by a.

5.7 POINTERS AND TWO-DIMENSIONAL ARRAYS


In the last chapter we have already discussed two-dimensional arrays and have seen their use-
fulness, especially when working with tabular data. If we want to define a two-dimensional
array for storing the names of six different programming languages, we would
define it as
ch ar l a n g [ 6 ] [ 8 ] ;
The above definition states that there are six elements in the array, each of which may
hold upto eight characters. The process of initialization of a two-dimensional array is already
discussed in the last chapter and to do this initialization we would write
s t a t i c ch ar l a n g [ 6 ] [ 8 ] = {
"FORTRAN" , "BASIC" , "COBOL" , "PASCAL" , "C" , "Ada"
};
The first index 6 of this two-dimensional array says that there are six elements (rows) in
the array and the second index 8 says that each element may have at the most eight elements
(columns). We have selected 8 because the number of characters in the longest element FOR-
TRAN has seven characters plus one character more for the NULL character which is the termi-
nator of any string . At the time of initialization of the array we have declared the variable as of
static storage class which may not be required in some compilers ( ANSI C allows auto storage
class variables to be initialized ). To establish how pointers are related with two-dimensional
arrays we first make use of the following program in Example 5.3 which is simple enough and
uses no pointers.
■ POINTERS ■ 79

Example 5.3: Program to print each of the programming languages together with their memory
address.

#include < s t d i o .h>


#define MAX 6
#define LEN 10

main ()
{
static char lang[MAX][LEN] = {
"FORTRAN", "BASIC", "COBOL", "PASCAL", "C", "Ada"
} ;
int i ,j ;

for ( i = 0 ; i<MAX ; i++ )


{
printf ("\n%p", & l a ng[i] [ 0 ] );
for ( j = 0; lang[i] [j ]; j+ + )
printf ( "%c", langti][j]);
}
return 0;
}
Output: The output of this example program in my system is

00AB FORTRAN
00B2 BASIC
00BC COBOL
00C6 PASCAL
00D0 C
00DA Ada

Note that the above program uses two 'for' loops to display the characters in the array. In
the starting pass through the loop, the first print f () displays the address of lang [0][0] and
then falls into the j loop and in this loop it prints " FORTRAN". Then again it executes the outer
loop and continues until all the language names are displayed.
Let us now modify the above program a little by removing the inner loop which is con-
trolled by j and adding a %s (string) conversion character to theprintf () function. After this
modification the 'for' statement looks like
for(i=0; i<MAX; i++)
printf ("\n%p%s", &lang[i][0], lang+i);
In this case, the %p in printf () prints the addresses as before but %s in printf ()
prints a string from the given address lang +i. As we know, pointer arithmetic is always scaled
by the scalar size of the data item being pointed to, and lang holds the first address of a two-
dimensional array. Note that lang +i will be scaled as
lang + i*(scalar size of the two dimensional array)
80 ■ DATA STRUCTURES USING C ■

The scalar for a two-dimensional array is the size of each element multiplied by the sec-
ond dimension in the array definition. So, lang +i converts to
lang + i*(sizeof( char)*8)
=lang + 8i
This is why the expression lang+i in the program increases by 8 each time i is incremented
by 1. In general, for a multidimensional array declaration like
type_specif ier array_name [dl ] [d2 ] ............... [dn];
the scalar size is given by
scalar_size = sizeof (type_specif ier) * d2 * d3 * ..........* dn.
To illustrate this fact, let us consider the declaration
float a [5] [8] [10] [12] ;
The corresponding scalar size is computed as
scalar_size = sizeof(float)*8*10*12
=4*8*10*12
=3840
It is to be noted that the first dimension is not used to determine the scalar size.

5.8 ARRAY OF POINTERS


Instead of going straight to the array of pointers we must understand the difference between a
character array and a character pointer. Consider the following definitions:
ch ar m y a r r a y [] = "array" ;
and ch ar * y o u r p tr = " p o i n t e r " ;
In the former definition 7m y a rra y ' is an array (one dimensional) of characters. The size
of the array will automatically be set by the number of characters in the initialized string plus 1,
and may be depicted as in Fig. 5.3.

myarray a r r a y \o
T

an array
name

Fig. 5.3 Memory map of myarray

The ith element may be referred to as myrray[i]. The name myarray holds the address of
the character 'a', the first element of the array. In the later definition the meaning is completely
different. Here the string constant "pointer" is pointed by the pointer variable yourptr. Pictori-
ally, it may be shown as in Fig. 5.4.

yourptr - p o i n t e r \o
T
a pointer
variable

Fig. 5.4 Memory map of yourptr


■ POINTERS M 81

It should be appreciated that the assignment


y o u r p tr = m y a rra y ;
will just make yourptr to point to the first element of the array myarray, and is not a string copy.
At this point we can start our discussion on array of pointers. Clearly, an array of pointers is
nothing but an ordinary array each of whose elements is a pointer. We may define an array aop
of pointers as
ch ar * a o p[ 5 ] ;
In this case, each element of the array is a pointer to a character. The array elements are
a o p [0 ] , a o p [ 1 ] , a o p [2 ] , a o p [3 ] and, a o p [4 ]
and as we know the array name aop holds the address of aop [0], the address of the first element
of the array.
An array of pointers may also be initialized at the time of its definition, as below.
ch ar * a o p [ 5] = {
"BASIC",
"COBOL",
"FORTRAN",
"PASCAL",
"Ada"
};
In case of such a definiton, the element aop[0] will hold the address of the string constant
"BASIC", aop[l]will hold the address of "COBOL", and so on. Pictorially, it may be viewed as
shown in Fig. 5.5.

aop [0] --------►B A S I c \o


aop [1] --------►C O B O L \o
aop [2] --------►F O R T R A N \o
aop [3] --------►P A S C A L \o
aop [4] --------►A d a \o
Fig. 5.5 Memory map of aop

An example program is given in Example 5.4 to illustrate the working principles of an


array of pointers. This example program is displaying the strings those are pointed by the array
elements. In the definition of array the size of array is not mentioned and is set automatically
(since it is initialized ). To get this size we can make use of the sizeof operator discussed
earlier.
82 ■ DATA STRUCTURES USING C ■

Example 5.4: Program to display the strings pointed by the array elements of an array of pointers.

#include < s t d i o .h>


ma i n ( )
{
char *direction[]={
"North",
"East",
"West",
"South"
};
int i;
#define SIZE (int)sizeof(direction) / sizeof(direction[0])

for (i=0; i<SIZE; i + + )


printf ( "%s%c", direction[i], (i<SIZE-l)?' \n');
}
In the above program note that the printf () statement prints a blank character after
printing North, East, and West but a newline ('\n') after printing South.
Another interesting program is presented in Example 5.5 which prints only the first character of
each of the strings.
Example 5.5: Program to display the first character of the strings pointed by the array elements
of an array of pointers.

#include < s t d i o .h>


main()
{
char * dir e c t i o n [] = {
"North",
"East",
"West",
"South"
};
int i;
#define SIZE (in t )sizeof(direction) / sizeof(directionfO])

for( i=0 ; <SIZE ; i++ )


printf ( "%c", direction[i][0]);

}
■ POINTERS M 83

From Fig. 5.5 it is obvious that *aop gives the pointer to the string "BASIC". Since aop is
an array (aop+i) is the address of aop[i], and hence *(aop+2) will be give pointer to "FOR-
TRAN". So *(aop+2)[0] will give the character F, the first character of "FORTRAN". The pro-
gram in Example 5.6 is written using this concept of pointers and is a combined form of the last
two programs.
Example 5.6: A pointer version of the programs in Example 5.4 and Example 5.5.

# include < s t d i o .h>


m a i n ()
{
char*direction[] ={
"North",
"East",
"West",
"South"
};
int i;
#define SIZE (int)sizeof(direction)/sizeof(direction)[0])

disp_array (SIZE, direction);


printf (" \n =======> " ) ;
disp_lst_char(SIZE, direction);
return;
}
disp_array (int n, char *ptrarr[]
{
while (n— > 0)
p r i n t f ("%s%c", *ptrarr++, (n > 0 ) ?' ':'\n');
return;
}
disp-lst-char (int n, char *ptrarr[]
{
while (n— > 0)
printf ("%c", (*ptrarr++)[0]) ;
printf ("\n") ;
return ;
}

5.9 POINTERSTO POINTERS


Consider again Fig. 5.5 where aop is an array name and holds the address of the first element of
an array which is itself a pointer to a character. Now since aop is a constant we may want to
store it to a variable for some purpose. To do so how should the variable be defined? The an-
84 ■ DATA STRUCTURES USING C ■

swer is simple. Definitely, it is to be stored to a pointer variable which points to a pointer to a


character. This suggest the variable definition of pop as below.
char **pop;
With such a definiton we can safely write an assignment statement like
pop = aop;
Note that pop is nothing but a pointer to a pointer.

5.10 POINTERSTO FUNCTIONS


As we know, the returned value of a function is available within the name of the function.
Actually, a function name is referring a memory location. So in C, we may have the concept of
a pointer to a function also. In fact, this may be passed as an argument to another function.
To define a variable (say ptrtofn) as a pointer to a function which returns a value of type
type_name we write
type_name (*ptrtofn)();
To illustrate the use of such pointers we present a program in Example 5.7. The program reads
two integers from the standard input device and the operation to be performed on these oper-
ands, performs the operation by calling a function that uses an argument which is pointer to a
function, and finally prints the result to the standard output device. The program is self-ex-
planatory.
Example 5.7: Program to simulate a rudimentary calculator by using pointers to functions.

#include <stdio.h>
m a i n ()
{
int x,y,operation,
double result, operate () , add(), subtract (), mul () , d i v i d e O ;

printf ("Enter two integers :" );


scanf ( "%d %d" , &x ,&y ),
operation = g e t c h a r () ;
/ * Throw away character in keyboard buffer */
printf ( "Choose an operation ( +,-,*,/) );
operation = g e t c h a r () ;
switch ( operation )
{
case '+' : result = operate (add, x, y) ;
break ;
case : result = operate (subtract, x, y ) ;
break;
case : result = operate (mul, x, y) ;
break;
■ POINTERS ■ 85

case '/' : result = operate (divide, x, y ) ;


break;
default : printf ("You entered a bad operator \n") ;
e x i t (1);
}
printf ( "The result is = %g/n", result);
return;
}

double operator (double (*pf)(), int a, int b)


{
double value ;
value = (*pf)(a, b) ; /* Basically a function call */
return value;
}
double add ( int p, int q )
{ r e t u r n ( double ) (p+q);
}
double subtract ( int p, int q )
{
return ( double ) (p-q) ;
}
double mul ( int p, int q )
{
return (double ) (p*q) ;
}
double divide ( int p, int q)
{
return ( double ) (p/q);
}

5.11 COMMANDTHE ARGUMENTS


So far we dealt with many C programs, all of which have no arguments in the main () func-
tion. As a matter of fact, the function ma i n () may have two arguments, traditionally written as
argc and argv. These two arguments of function main () are useful when we want to pass the
arguments supplied in the command line. The argument argc is an integer parameter while the
parameter argv is an array of pointers. Each element of this array points to a character. The
values of these arguments are set automatically at the time of execution. Consider that we have
a program whose executable module is named flush, to display the arguments that appear in
the command line (other than the program name).
86 ■ DATA STRUCTURES USING C ■

That is, if we issue the command


flush Dear I Always Remember You
the program flush will display
Dear I Always Remember You
On execution of the program flush, the function main () will get the value of argc as 6 and an
array argv of argc ( in this case 6 ) number of pointers will be created automatically. The picto-
rial view of the argv array will look like that in Fig. 5.6.

argv[0] ------------- ► f 1 u u s h \o

argv[l] ------------- ► D e a r \o

argv[2] ------------- ► I \o

argv[3] ------------- * A 1 w a y s \0

argv[4] ------------- ► R e m e m b e r \o

argv[5] ------------- ► Y o u \o

Fig. 5.6 Pictorial view of argv array

A program code to achieve this is listed below in Example 5.8. As an additional task it
also displays the first characters of all these arguments. It treats the array elements as pointers.
Precisely saying that it is a pointer version program and is essentially same as the program code
listed in Example 5.6, except for a few changes.
Example 5.8: The C code for the program flush.

#include < s t d i o .h>


main(int argc, char *argv[])
{
flush_array (argc, ar gv);
p r i n t f ( "\n=====>") ;
flush_JL_array (argc, argv) ;
return;
}
flush_array(int n, char *ptrarr[])
{
while (n— >0)
p r i n t f ("%s%c", *ptrarr++ ,(n>0)?' ' :'\n');
return;
}
f lush_JL_array (int n, char *ptrarr [] )
{
■ POINTERS M 87

while(n— >0)
p r i n t f ("%c", (ptrarr++)[0]);
printf ("\n");
return;
}

Another C code is presented in Example 5.9 which receives a date in the format dd- mm-
yyyy from the command line and checks whether it is a valid date or not. This program is not
only an example of command line argument, but also covers many aspects relating to pointers
in C. Note that the function convert receives a parameter ptrarr which is a pointer to a pointer to
character. This function highlights the way of changing the value of variables by passing point-
ers to variables.
Example 5.9: The C code to check the validity of date given in command line.

#include <stdio.h>
main(int argc, char *argv[])
{
int d,m,y;
int leap;

if (— argc>0)
{
convert(++argv, &d,&m,&y);
leap = y % 4 == 0 && y % 100 i= 0 I I y% 400 = = 0;
printf ("The date %s is %s \n", *argv,
(valid (d,m,y,leap )) ? "valid" : "not valid" ) ;
}
else
printf ("Usage :: VALIDATE <dd-mm-yyyy>\n" );
.return ;
}
convert (char **ptrarr, int *pd, int *pm, int *py)
{
char *curptr, c;
int n;

curptr = *ptrarr; /* Points to date string */


for (n=0,\ ( (c = *curptr) >= '0' && c < = ' 9 #); curptr++)
n = 10*n + (c-'0');
*pd = n;
++curptr;
88 ■ DATA STRUCTURES USING C ■

for (n=0; ((c=*curptr)>='0' ScSc c<='9'); curptr++)


n = 10*n + (c-'0')
*pm = n;
++curptr;
for (n=0; ((c=*curptr)>=
n = 10*n + (c-'0');
*py = n;
return;
}
valid (int d, int m, int y, int leap)
{
if (d< = OI Id>31 1 lm< = OI lm>12l |y< = 0i ly>3000)
return 0;
if ( (m==4 i lmm= = 6 I Im= = 9 I lm==ll) ScSc d>30)
return 0;
if (m==2 ScSc d>29)
return 0;
if (m= = 2 ScSc leap= = 0 ScSc d>28)
return 0;
return 1;

In this chapter we have not discussed how pointers are related with structures. Lastly,
we make some final observations as follows:
(i) As a single fixed-size data item, pointers provide a homogeneous method of referenc-
ing any data structure regardless of the structures' type or complexity.
(ii) In some instance, pointers permit faster inclusion and deletion of elements to and
from a data structure.

E m X m E >f ' RmCml mSmEmS


1. Write a function to determine the length of a string of characters that is entered by the user
from a standard input device.
2. Consider the following pointer definitions:
(i) int (*ptr) [10] ;
(ii) int *ptr [10 ] ;
How do the definitions differ?
■ POINTERS ■ 89

3. What is the difference between an array name and a variable defined as a pointer?
4. Write a program to read a group of input lines, each containing one word. The program
should print each word that appears in input and the number of times it appeared.
5. Write a function strlast (s t r 1, s tr 2 ) which returns 1 if the string str2 occurs at the
end of the string strl, otherwise the function returns 0.
6. Write a function strsearch which receives two character pointers as arguments to it and
returns a character pointer. The function searches the first string to see whether the second
string appears in it. If it is so, it returns a pointer to where the second string is in the first
string, otherwise, it returns a null pointer.
7. Write a program to read an integer (maximum upto nine digits) and print it in words.
STACKS AND QUEUES

In the earlier chapters we have seen the primitive data structures that are available in C. We
have also looked through arrays and strings. The array data structure is implemented with
storage structure as memory, while string data structure is implemented with arrays as their
storage structures. There are many other simple but important data structures. These are simple
because they can be implemented using arrays as their storage structures. Other implementa-
tions of these data structures are also possible. In this chapter, we have considered two such
important data structures, stacks and queues.

6.1 INTRODUCTION TO STACK


Consider the problem of reading 50 integers and printing them in reverse order. This problem
inherently requires to push the read numbers into an array one by one. Then the numbers are to
be retrieved and displayed one by one from the last of the array in reverse order. This means the
number read last is displayed first and the number read first is displayed last. This type of last-
in-first-out or last-come-first-serve processing is inherent to a wide variety of applications. Con-
sequently, an abstract data structure that incarnates this idea is of great importance. This last-
in-first-out (LIFO) or last-come-first-serve (LCFS) data structure is called a stack data structure.
For illustration, we consider a little serious problem. We know that binary representation of a
data item in memory plays an important role while storing data. Specifically, a positive integer
is stored in memory with its binary representation in base-two. So while storing a positive inte-
ger, the integer in decimal is to be converted to binary. A simple algorithm to convert from
decimal to binary needs to divide the integer number by 2 repeatedly until the quotient be-
comes 0 (zero), and then the remainders generated at each step is taken in reverse order. For
example, for the integer 46 in decimal, the binary equivalent is 101110. (see Fig. 6.1).

Decimal integer Reminder

2 46
2 23
2 11
2
2
2

Binary equivalent by taking


remainder in reverse order

Fig. 6.1 Binary equivalent of integer 46


■ STACKS AND QUEUES ■ 91

Computation Remainder Stack Output

46
top (empty stack)

2 I 46
23
top
2 I 23
1 0 1
11
^top
2 |_11_
1 0 1 1
5
•top
2 I 5
1 0 1 1 1
^top
2 |_2_ 0 0 1 1 1 0
1
^top
2 I 1
1 0 1 1 1 0 1
1\op
0 1 1 1 0 1
I' top
0 1 1 1 10
T top
0 1 1 101
t t op
0 1 1011

0 10111
top
101110
t top (empty stack)

Fig. 6.2 A trace of Algorithm 6.1

This example clearly shows that to convert a decimal positive integer to its binary equiva-
lent, the remainder (after dividing by 2) is to be taken on last-generate-first-take basis which is
nothing but like a stack. Now we define a stack formally. A stack is a list or sequence of data
items in which all insertions and deletions take place at one end, called the top of the stack. With
this background we can now write an algorithm that converts a decimal positive integer to its
binary equivalent and display it. One such algorithm is given below.
Algorithm 6.1: Decimal to binary conversion algorithm
/ *Algorithm to convert a decimal positive integer to its equivalent binary form and display it* /
92 ■ DATA STRUCTURES USING C ■

Step 1: Create a stack of remainders.


Step 2: While (Number is not zero) perform the following:
(i) Compute Remainder when Number is divided by 2.
(ii) Push Remainder on top of the stack of remainders.
(iii) Replace Number by the quotient of the division (Number/2).
Step 3: While (the stack of remainder is not empty) perform the following:
(i) Remove the remainder at stack top.
(ii) Display remainder.
A trace of the above algorithm is given in Fig. 6.2. The arrows in the figure indicate the
stack top.
Operations on stack: The decimal to binary conversion algorithm given above suggests that
we need to have four operations on stack which are the four basic operations that can be per-
formed. These are given below.
(1) create: creates an empty stack.
(2) push: inserts an item onto the top of the stack.
(3) pop: removes the item from stack top.
(4) empty: determines whether a stack is empty.
Each of these operations is implemented by a C function. Consider that a stack of remain-
ders s tack is defined. Then we may have the following functions in C that can implement the
above basic operations.
(i) create(&stack) /* Creates an empty stack called stack */
(ii) push (&stack, item) /* Pushes item into stack at the top */
(iii) pop(&stack) /* Returns the item at stack top */
(iv) empty(stack) /* Returns non-zero value if stack
is empty, otherwise returns zero */'
If we assume that the above functions are available, the C code for Algorithm 6.1 may
simply be written as

create(&stack);
while(number)
{
remainder = number %2;
push(&stack, remainder);
number/=2;
}
p r i n t f ("Equivalent binary representation is") ;
w h i l e (!empty(stack))
{
remainder=pop(&stack);
(printf ("%d", remainder);
}
pu t c h a r ('\n');
■ STACKS AND QUEUES ■ 93

6.2 ARRAY IMPLEMENTATION OP STACKS


As we have already seen, the primary step to implement a data structure is to choose a proper
storage structure. In our case since a stack is nothing but a sequence of data elements, to imple-
ment a stack storage structure we can safely choose an array as its storage structure. Each ele-
ment of the stack will occupy one array element. Let us now try to visualize how a stack looks
after storing the remainders of our decimal to binary conversion algorithm after four steps (say)
for the integer number 46. We consider our stack to be array stack[ ] with its top at position zero
(0). After pushing four (4) remainders the stack will look like as in Fig. 6.3.

stack [0] (r- stack top

stack [1] 1

stack [2] 1

stack [3]

stack [stackumit]

Fig. 6.3 Stack configuration after pushing four remainders

The next remainder that will be generated by our algorithm is 0 for the integer number
46. To push this remainder to stack top we must first shift the values between stack [0] and
stack [3] to stack [1] and stack [4] respectively, and then put the remainder 0 to stack [0]. Clearly,
as the stack grows, this shifting becomes a real overhead. On the other hand, when we try to
pop an element from the stack top we must shift up the stack elements one step each time. This
is required so that we can pop the next element from the stack top. This also creates an overhead
of shifting up the stack elements while popping. So now the question is how to get rid of this
overhead? One trivial solution is to use a variable top to keep track of the array index where the
top element of the stack is stored. In our case, this implementation works like the following. As
and when a remainder is pushed to the stack, first the variable top is increased by 1 and the
remainder is stored in stack [top]. Fig. 6.4(a) shows the stack after storing the first four remain-
ders to the stack.
94 ■ DATA STRUCTURES USING C ■

stack [0] 0 stack [0] 0


stack [1] 1 stack [1] 1
stack [2] 1 stack [2] 1
stack [3] 1 top=3 stack [3] 1
stack [4] 0

stack [stacklimit-1] stack [stacklimit-1]

__________________________ (a)______________________________________________ (b)________________


Fig. 6.4 Revised stack

When the next remainder 0 is pushed, the variable top is increased to 4 from 3 and then
the remainder is stored in stack [4]. This is shown in Fig. 6.4(b). So in this implementation the
storage structure for the stack is an array that holds the stack elements and there is a variable
top that holds an array index indicating the stack top element. This structure tells us to choose
the following declarations and definitions in C.
#define STACKLIMIT ..... /* Maximum size of stack */
typedef int elemtype; */ Type of item in the stack */

struct stacktype {
int top; /* Stack top index */
elemtype i t e m [STACKLIMIT];
In-
struct stacktype stack;
Let us revisit our above implementation with the idea that we want to store either an
integer or a floating-point number or a string. In such a situation the typede f given above will
not be sufficient, because a stack element may be of either int or float or a pointer to character
string type. In this case we need to take the help of union feature of C also. Moreover, we must
keep the information that as to what type of element is stored at a particular array index of the
stack. This discussion suggests to revise our above declaration and definitions as in the follow-
ing.
#define STACKLIMIT ......... /* Maximum size of stack */
#define INT 1
#define FLOAT 2
#define CHAR 3
■ STACKS AND QUEUES ■ 95

struct stackitem {
int itemtype;
union {
int i;
float f;
char *pc;
} element;
};
struct stacktype {
int top ;
struct stackitem item[STACKLIMIT];
In-
struct stacktype stack;
With this background we can now write the C code to implement the four basic stack
operations easily. Here we should notice the fact that when a stack is just created it is empty and
at that time the value of the variable top should be -1 (minus one). This is because when we
push an element to the stack we must increase this top and then store the element to stack.
To create an empty stack we must set the top member of the variable stack to -1. In
fact, this is the only thing that we need to do within the function. So if a pointer to the stack
ptrstk is passed as the argument to function create, then the only statement within the
function should be ptrstk-> top = -1;
The function empty must check whether the top member of the stack which is to be
passed as argument to the function is -1. If that is so, return 1, otherwise return 0. Hence, the
statement within empty is of the form
return ((stack.top == -1)?1:0) ;
The push function must receive two arguments. One of them is a pointer to the stack
within which an element is to be pushed and the other one is the element itself which is to be
pushed. Within the function we must check whether the stack is already full. In such a situation
an error message must be given. Otherwise, we push the element to the stack top. We achieve
this by first increasing the top member of the s tack by 1, and then copying the element to the
top of the stack. The C code to do so is of the form
ptrstk -> top++;
ptrstack -> item[ptrstack->top] = element;
The pop function is just the opposite to push function. It also requires two arguments,
one of which is a pointer to the stack from which an element is to be popped. The second argu-
ment should again be a pointer to element (say ptrelement) which will actually hold the popped
element from the stack. In this case first the element is popped and then the top member of the
stack is to be decreased by 1. The function must check whether the stack is empty before
popping because in such a case it is an error. The program code will look like
if (e m p t y (stack))
96 ■ DATA STRUCTURES USING C ■

printf ("Attempt to pop from an empty stack\n");


else {
*ptrelement = ptrstack->item[ptrstack->top];
ptrstack -> top--;
}
A complete program for decimal to binary conversion using these stack functions is given
in Example 6.1.
Example 6.1

/* C program listing for converting a decimal positive integer to


its equivalent binary form */

#include < s t d i o .h>


#define STACKLIMIT 100
#define INT 1
#define FLOAT 2
#define CHAR 3

struct stackitem {
int itemtype;
union {
int i;
float f;
char *pc;
} element;
In-

struct stacktype {
int top;
struct stackitem i t e m [STACKLIMIT];
};

ma i n ( )
{
int number, c, dum;
struct stackitem info;
struct stacktype stack;
■ STACKS AND QUEUES ■ 97

do {
printf ("Enter the positive integer to convert
scanf ("%d", &number) ;
create (&stack);
while (number)
{
info.itemtype = INT ;
info.element.i = number % 2 ;
push ( &stack, info ) ;
number >> = 1 ;
}
printf ("The equivalent binary representation is :" ) ;
while ( lempty (&stack))
{
pop ( &stack, &info ) ;
printf ("%d", i n f o .e lement.i ) ;
}

printf ("\n\n Once more (Y/N) ?") ;


c=getchar () ;
if ( ( c « g e t c h a r () ) == 'Y* M e == 'y' )
dum = g e t c h a r ();
} while ( tolower(c) == 'y' ) ;
}
create ( struct stacktype *ptrstk )
{
ptrstk -> top = -1;
return;
}

empty (struct stacktype *ptrstk )


{
return (( ptrstk ->top == -1) ? 1 : 0 ) ;
}

pop(struct stacktype *ptrstk# struct stackitem *ptrinfo)


{
if (empty ( ptrstk ))
98 ■ DATA STRUCTURES USING C ■

{
printf ("Convert : illegal attempt to pop from
empty stack \n") ;
exit(1);
}
*ptrinfo = ptrstk->item[(ptrstk -> top
return ;
}
push (struct stacktype *ptrstk, struct stackitem x)
{
if (ptrstk -> top == STACKLIMIT-1)
{
printf ("convert : illegal attempt to push
to full stack\n") ;
exit (1) ;
}
ptrstk -> item[++(ptrstk -> top)] = x ;
return ;
}
By definition, a list does not have any upper limit on the number of members of the list.
Hence a stack should also have no upper limit on the number of members of the stack. So as an
abstract data structure there should not be any condition as to whether a stack is full. But as an
array is of fixed size and here we chose this as the storage structure of the stack, we have to
impose an upper limit on the number of members in the stack. Thus it is clear that this imple-
mentation of stack data structure is not completely a faithful representation. Later in the chap-
ter 9 we will see an alternative representation of stack using linked list that puts no such upper
limit and is more flexible.

6.3 APPLICATION OF STACK


In this section we look through a major application of stack which illustrates various types of
stacks and operations on them. Ordinarily a programming language allows to write arithmetic
expressions of the form
x*y+z
The above expression is written in infix notation in the sense that the operators (binary)
are placed in between the operands. But there are other ways of writing an expression. They are
prefix and postfix notation expressions. In prefix the operator is placed before the operands and
in postfix the operator is placed after the operands.
For example, the infix expression
x*y
■ STACKS AND QUEUES ■ 99

may be written in other forms as follows:


* x y (prefix)
x y * (postfix)
To evaluate an infix expression, many compilers convert this infix expression into its
equivalent postfix form first and then evaluate it, which generates the code for evaluating the
transformed postfix expression. If we examine an infix expression little carefully we see that
parentheses must be used to indicate the priorities of the operators involved in the expression.
For example, in the infix expression 4 * ( 5 - 3 ) , we imposed a higher priority of minus (-)
operator over multiplication (*) operator by introducing parentheses. If we remove the paren-
theses the expression looks like 4 * 5 - 3 and is a completely different expression.
It has been found by the Polish logician Jan Lukasiewicz around 1950s that such paren-
theses are not necessary to set the priorities of operators in postfix notation. This notation is also
called as Reverse Polish Notation (RPN). Because of this fact, evaluation of an expression in
RPN is, in general, much easier than evaluating an infix expression in a mechanical way. Lastly,
the conversion from an infix notation to RPN is straightforward. To illustrate this, let us choose
the infix expression
4* (5 - 3)
The infix expression is scanned in left-to-right order. When an operand is found, it is sent
to the output. Initially, for the above input 4 is encountered and is sent to output immediately.
Next, the * operator is found. At this point, another operand is expected after * on which it must
be applied. So it must be stored and hence * is pushed on to a stack of operators. Note that
before pushing this operator the stack was empty. In general, when an operator is encountered
it must be checked against the top stack element. The operator is pushed to the stack if either the
stack is empty or if the operator is having a higher priority than the stack's top element. In our
case * is pushed to the stack. Next an open parenthesis '( ' is encountered and is pushed to the
stack. Then the operator 5 is found and it is sent to the output. At this stage the output and stack
look like the following.
output stack
~4 5~ | * | ( | • | .-
T

top
Now the o p e r a t o r i s encountered. It is pushed to the stack since the stack top symbol
'('is assumed to have lower priority than any other operator. The operand 3 is found next and
it is sent to output directly. Now the output and stack take the following form,
output stack
4 5 3

top
Finally, the right parenthesis ')' is encountered. When a ')' is encountered, the symbols
are popped from the stack and sent to output until a '(' is found in stack top. This'(' is popped
from the stack but not sent to output. After doing this the output and stack become
output stack
i 5 I"*7] p
T

top
100 ■ DATA STRUCTURES USING C ■

There is no other symbol left in the input now. At this stage, the operators are popped
from stack and sent to output until the stack is empty. Hence the output will look like
output
4 5 3 1 r
and the stack is empty. The output now shows the RPN expression for the given infix expres-
sion. Notice that though there is a set of parentheses in the infix expression, it is not present in
our RPN expression. An algorithm is presented to transform an infix notation arithmetic ex-
pression to its equivalent RPN expression below, in its general form. This algorithm may al-
ways be extended to include logical infix expression conversion by incorporating the logical
operators.
Algorithm 6.2: Infix to RPN conversion algorithm
/* Algorithm to transform an arithmetic infix notation expres
sion to RPN expression */

Step 1: Create an empty stack of operators


Step 2: While not (any error) and not (end of infix expression) do
the following:

Get next token in infix expression;


/* A token may be a constant, variable, ' (', ')' or an
arithmetic operator. */
if (token is ' (' )
Push it onto stack
else if ( token is ') ')
pop stack top element and send it to output
until a '(' is encountered. Pop this '(' but
do not send it to output.
(If no '(' is found and stack becomes empty
it is an error).
else if (token is an operand)
send it to output
else /* it is an operator */
if (stack is empty or token has higher priority
than the top stack element)
push token onto stack
else
repeat
pop and output top stack element;
until (the top stack element is of lower
priority than the token)
push token onto stack;
/* operators have higher priority than a '('
in the stack */
Step 3: if (end of infix expression)
pop stack elements and send to output until the
stack is empty.
■ STACKS AND QUEUESM 101

To illustrate the above algorithm let us choose the infix expression


8 + ((( 7 - 5 ) * ( 9 - 4 ) + 6)/ 4)
Fig. 6.5 shows the execution of Algorithm 6.2 on the above infix expression. Current in-
put position is indicated by an upward arrow ( t ) in the figure.

infix expression output stack remarks

8+(((7-5)*(9-4)+6)/ 4)
t
+(((7-5)*(9-4)+6)/ 4) 8 output 8
t
t top
(((7-5)*(9-4)+6)/ 4) 8 + push +
T
T top
((7-5)*(9-4)+6)/4) 8 + ( push (
T jtop
(7-5)*(9-4)+6)/4) 8 + ( ( push (
t .t
top
7-5)*(9-4)+6)/4) 8 + ( ( ( push (
t .t
top
-5)*(9-4)+6)/4) 87 + ( ( ( output 7
T
t top
5)*(9-4)+6)/ 4) 87 + ( ( ( - push -
T
t top
)*(9-4)+6)/4 875 + ( ( ( - output 5
T
t top
*(9-4)+6)/4) 875- + ( ( pop and output -
t pop (
T top
(9-4)+6)/4) 875- + l( ( * push *

.t
t top
9-4)+6)/4) 875- + !l( ( * ( push (
T
t top
-4)+6)/4) 8 75-9 + ( ( * ( output 9
t fT
top
102 ■ DATA STRUCTURES USING C ■

4)+6)/4) 8 75-9 + ( * push -


( ( -
t
t top
)+6)/4) 8 7 5-94 + ( * output 4
( ( -
.t
T top
+6)/4) 8 75-94- + ( * pop & output -
(
t fT
top &pop (
+ 6)/4) 875-94-* + ( ( pop & output *

t Jtop
6)/4) 8 75-94-* + l< < + push +
T
T top
)/4) 875-94-*6 + ( ( + output 6
t fT
top
/4 ) 875-94-*6 + + ( pop & output +
T t7op also pop (
4) 875-94-*6+ jl+ ( / push /
T
t top
) 8 7 5 - 9 4 - * 6 +4 +l </ output 4
t top
end of infix 8 7 5 - 9 4 - * 6+ 4/ + pop & output /
t T & also pop (
endofinfix 875-94-*6 +4 /+ pop & output +
t t
Fig. 6.5 Execution of Algorithm 6.2

The Figure 6.5 shows that for the infix expression the RPN expression takes the form
8 7 5 - 9 4 - * 6 + 4 /+
Let us take this RPN expression to illustrate the process of evaluating an RPN expression.
To do this, the RPN expression is scanned from left to right. When an operator is found,
the operator and the last two operands are removed from the expression and joined with the
operator to yeild a new sub-expression. This sub-expression is then replaced in RPN expression
at the point from which the operands and the operator are removed. Scanning is then resumed.
Finally there will be only one operand remaining in RPN and there will be no operators. This
operand's value is the value of the expression. The process is executed below with the above
expression. The left to right scan is indicated by underlining.
8 7 5 - 9 4 - * 6 + 4 /+
8 2 9 4 - * 6 + 4/ +
825*6+4 / +
■ STACKS AND QUEUES ■ 103

8 10 6 + 4 / +
8164 / +
84 +
12

After evaluating the RPN expression we reached at the value 12, which is the value of the
expression. This method suggests that in the left to right scan the operands must be stored until
an operator is encountered. At this situation the last two operands must be retrieved to operate
on the operator. This means that the order in which the operands are retrieved is in last-in-first-
out order and hence a stack should be used to store the operands. Thus each time an operand is
found it should be pushed to a stack. When an operator is found, two operands are popped
from the stack, the operator is applied on these operands and the resulting value is pushed back
onto the stack. This process is continued and finally there will be one element in the stack which
will give the value of the expression.
In the following a formal algorithm is presented that evaluates an RPN expression.
Algorithm 6.3: Evaluation of an RPN expression
/* Algorithm to evaluate RPN expression*/
Step 1: Create an empty stack
/* stack elements must be of type of operands */
Step 2: Repeat the following
i. Get next token from RPN expression;
/* Token may be a constant, variable or
arithmetic expression */
ii. If (token is an operand)
push it onto stack
else do the following:
a. pop two elemetns from stack;
(If not available there is an error
due to malformed RPN expression)
b. apply the operator on these two
elements;
c. push the resulting value onto stack;
until (end of RPN expression);
Step 3: Pop the only value on stack top and send it to
output as the value of the expression. (In fact,
there will be only one value in the stack.)
An execution of the above algorithm is given in Fig. 6.6 for illustration purpose. Here also
an upward arrow (t) is used to indicate the current token in RPN expression. The same RPN
expression is considered for a good understanding of the algorithm.
104 ■ DATA STRUCTURES USING C ■

RPN Expression Stack Remarks

875-94-*6+4 / + Stack is empty


T
t top
75-94-*6+4 / + 8 Push 8
t
t top
5-94-*6+4 / + 8 7 Push 7
t
t top
-94-*6+4 / + 8 7 5 Push 5
T
t top
94-*6 +4 / + 8 2 Pop 5 and 7, subtract and push result
T
t top
4-*6+4 / + 8 2 9 Push 9
T
t top
-* 6 + 4 / + 8 2 9 4 Push 4
T
t top
*6 + 4 /+ 8 2 5 Pop 4 and 9, substract and push result
t
T top
6+4 /+ 8 10 Pop 5 and 2, multiply and push result
T
t top
+4 /+ 8 10 6 Push 6
T
t top
4/ + 8 16 Pop 6 and 10, add and push result
t
T top
/+ 8 16 4 Push 4
T
t top
+ 8 4 Pop 4 and 16, divide and push result
t T
top
end of RPN 12 Pop 4 and 8, add and push result
T T
top
end of RPN 12 Value of expression on stack top
T
t top

Fig. 6.6 Execution of Algorithm 6.3


■ STACKS AND QUEUES ■ 105

Example 6.2

/* C listing to convert an infix arithmetic expression


to its equivalent RPN exxpression and evaluate it. */
#include <stdio.h>
#include <string.h>
#define STACKLIMIT 100
#define SIZE 100
#define FALSE 0
#define TRUE 1
#define INT 1
#define FLOAT 2
#define CHAR 3

struct stackitem {
int itemtype;
union {
int i;
float f;
char c;
} element;
};
struct stacktype {
int top;
struct stackitem item [STACKLIMIT];
};
main()
{
char rpn[SIZE], infix[SIZE];
int value;

printf ("Enter the infix expression );


gets(infix); /* read infix expression */
r p n [0] = '\0';
infix_to_rpn{infix, rpn );
printf ("\n The RPN expression is %s\n', rpn );
value = evaluate(rpn); f
printf ("\nThe expression value is = %d\n", value);
return;
106 ■ DATA STRUCTURES USING C ■

create (struct stacktype *ptrstk)


{
ptrstk->top = -1 ;
return;
}

empty (struct stacktype s)


{
return ( (s.top == -1)? 1: 0);
}
pop(struct stacktype *ptrstk, struct stackitem *ptrinfo)
{
if (empty(*ptrstk))
{
printf("convert: illegal attempt to pop from empty stack
Yn") ;
exit(1);
}
*ptrinfo = ptrstk->item[(ptrstk-> top)--];
return;
}

push (struct stacktype *ptrstk, struct stackitem x)


{
if (ptrstk->top == STACKLIMIT - 1)
{
printf ("convert: illegal attempt to push to full stack
\n") ;
exit(1);
}
ptrstk->item[++(ptrstk->top)] = x;
return;
}
priority (char operartor)
{
int p;

switch (operator )
{
■ STACKS AND QUEUES ■ 107

case '( p = 0; break;


case '+
case p = 1; break;
case
case p = 2; break;
}
return p;
}

inf ix_to_jrpn (char inf i x [], char rpn [] )


{
static int index=0;
struct stacktype opstack;
struct stackitem info;
char tokenop, tokens t r [10];
int token, errflag, overflag;

errflag = overflag = FALSE;


create(&opstack);
token = gettoken(infix, &index, &tokenop, tokenstr);
while ( !errflag && token != -1 ) {
switch ( token )
{
case 1: strcat(rpn, tokenstr);
break;
case 2: info.itemtype = CHAR;
info.element.c = tokenop;
push(&opstack, info);
break;
case 3: overflag = FALSE;
do {
if (empty(opstack)) {
printf ("Parentheses mismatch in infix
expression \n");
errflag = TRUE;
exit(1);
}
108 ■ DATA STRUCTURES USING C ■

pop (&opstack, fcinfo) ;


if ( info.element.c ! = ' ( ' ) {
tokenstr[0] = info.element.c;
tokenstr[1] = '
tokenstr[2] = '\0';
strcat (rpn, tokenstr);
} else
overflag = TRUE;
} while ( loverflag && lerrflag );
break;
case 4: overflag = FALSE;
while ( !empty(opstack) && loverflag) {
pop(&opstack, &info);
if (priority (tokenop) <=
priority (info.element.c)) {
tokenstr [0]= info.element.c;
tokenstr [1]= ' ';
tokenstr [2]= '\0';
strcat (rpn, tokenstr);
} else {
push (&opstack, info );
overflag = TRUE;
}
}
info.itemtype = CHAR;
info.element.c = tokenop;
push (&opstack, info);
break ;
}
token = gettoken (infix, &index, &tokenop, tokenstr);
}
while (!empty (opstack) ) {
pop (&opstack, &info);
tokenstr[0] = info.element.c;
tokenstr[1] = ' ';
tokenstr [2] = ' \ 0 ' ;
strcat (rpn, tokenstr );
}
return;
■ STACKS AND QUEUES ■ 109

evaluate (char rpn[ 3 )


{
static int
struct stacktyp
stacktype opstack;
struct stackitem info;
char tokenop, tokenstr[10];
int token, vail, val2, errflag = FALSE
create (&opstack);
token = gettoken (rpn, &index, fctokenop,tokenstr);
while ( token != -1 ) {
if (token = = 1 ) {
info.itemtype =INT;
info.element.i = atoi(tokenstr);
push ( &opstack, info);
} else { /* token is an operator */
if ( !empty(opstack) ) {
pop(&opstack, &info);
val2 = info.element.i;
} else
errflag =TRUE;
if (’empty(opstack)) {
pop (&opstack, Scinfo ) ;
vail = info.element.i;
} else
errflag = TRUE;
if (errflag) {
printf ("Error due to malformed RPN
expression \n");
}
switch (tokenop)
{
case '+' : vail += val2; break;
case '~ ' : vail val2; break;
case : vail * - val2; break;
case '/' : vail /= val2; break;
110 ■ DATA STRUCTURES USING C ■

info.itemtype = INT;
info.element.i = vail;
push (&opstack, info);
>
token = gettoken(rpn, &index, &tokenop, tokenstr);
}
if (opstack.top != 0)
{ /* if stack does not contain one element only */
printf ("Error in evaluation or malformed RPN
expression\n");
exit(1);
}
pop (&opstack, &info);
return (info.element.i);
}
gettoken (char s[], int *ptri, char *ptrop, char str[])
{
int i, j, token;
char c;

for (i = *ptri; (c=s[i])== ' V ii c == #\t' II c == 'Xn');


i++ );
switch (c)
{
case '(7 : token = 2;
*ptrop = c;
*ptri = ++i;
break;
case '.) ' : token =3;
* ptrop = c;
★p t n = ++i ;
break;
case '+'
case '-'
case
case '/' token = 4;
*ptrop = c;
*ptri = ++i;
■ STACKS AND QUEUES ■ 111

break;
default : if (c >= ' 0 ' ScSc c <= ' 9 ' ) {
for(j = 0; (s[i] >= '0' ScSc s[i] <= '9');
j+ + )
{ str [j+ + ] = ' ';
str[j ] = '\0';
*ptri = i;
token = 1;
} else
token = -1;
break;
}
return token;
}
A complete C program that receives an infix expression and then evaluates it after con-
verting the expression into RPN expression is given in Example 6.2. The main program calls
two functions. The first function is inf ix-to-RPN which translates an infix expression to its
equivalent RPN expression using Algorithm 6.2 mentioned above. The function evaluate takes
the generated RPN expression and evaluates it using Algorithm 6.3. The infix expression is
assumed to have its operands as integer only. The program may always be extended for logical
expressions also.

6.4 INTRODUCTION TO QUEUE


A queue is basically a waiting line, for example, a line of customers in front of a counter in a
bank. Notice that a customer joins in a line at one end of the line and the customer in front of the
line gets the service first. After getting the service the customer in front of the line leaves the line
(or bank). This indicates that a customer gets service in first-come-first-served (FCFS) (or first-
in-first-out (FIFO)) basis. This is unlike the LIFO structure of stack. So if we think a waiting line
of customers as an array of customers, then one end of that array must be used for inserting
(joining) a customer to the line called rear or tai 1 of the queue and the other end must be the
point from which a customer gets the service called the front or head of the queue.
As a data structure, a queue is an ordered list of elements from which an element may be
removed at one end, called f r o n t of the queue, and into which an element may be inserted at
the other end, called the rear of the queue.
Pictorially, a queue containing three integer values 70, 90, and 50 may look like as in
Fig. 6.7(a). 70 is the element in front of the queue. Now the rear of the queue is just the next cell
to 50 where a new integer may be inserted. Fig. 6.6(b) shows the queue after removing two
elements from the queue. Since an element is removed always from the front, the integers
70 and 90 are removed and the integer 50 is now in front of the queue. Fig. 6.6(c) shows the
status of the queue after inserting two new integers 60 and 80 into the queue.
112 ■ DATA STRUCTURES USING C ■

Fig. 6.7 The status of queue at various stages

6.5 QUEUE IMPLEMENTATION USING ARRAYS


The above discussion suggests that we can safely use an array to implement a queue. In con-
tinuation to our earlier discussion let us choose the array size is defined by a constant MAX and
consider the value of MAX as 6. This indicates that at any point of time there cannot be more
than six elements in our queue. Let us also consider that we have two variables, f r o n t and
r e a r , to indicate the front and rear of the queue. In our previous queue if we now insert a new
integer 85 the queue will take the form as in Fig. 6.8. At this point rear holds the value 6 and only

Fig. 6.8 Status of the queue after inserting 85

four integers are there in the queue. Now if we try to insert another new integer it will not be
possible to do so unless we shift the queue elements to the left side. Clearly this shifting is a slow
process. Obviously, it would be a good solution to this problem if we can bring the rear to the
beginning of the array instead of setting its value to MAX in this situation. So we must treat our
array as a circular one as in Fig. 6.9.
■ STACKS AND QUEUES ■ 113

0 5
rear -----------------------
85\

1 ( J 80) 4

\50 60y
front ----------------------- > — ?>

Fig. 6.9 Circular view of the queue in Fig. 6.8

Let us now look at the whole process from the beginning, keeping in mind that our array
is circular. The initial queue is shown in Fig. 6.10(a), where the front and rear have the same
value 0. After inserting 70, 90, and 50 the queue looks like in Fig. 6.10(b). Here front holds the
index 0 and rear holds the index 3. After removing two elements from the queue (from front of
the queue) the queue takes the form of Fig. 6.10(c). Now if we insert five more integers to the
rear of the queue, say the values 60, 80, 85, 95, 99, then the queue takes the form of Fig. 6.10(d).
In the last situation it can easily be noticed that front and rear hold the same value 2. This
again creates a difficulty that with this implementation we cannot differentiate the status of
empty queue and full queue, because in both the situations the front equates to rear. For clarifi-
cation, if all the six integers are removed from the front of the queue it will take the form given
in Fig. 6.10(e).

Note that when front and rear hold the value (MAX - 1 ) the very next value that front and
rear get is 0. This means that after removing an element, the front is changed using the assign-
ment
front = (front +1) % MAX;
Similarly, rear changes after inserting an element to the queue by the assignment
rear = (rear +1) % MAX;
To avoid the anomaly between an empty queue and a full queue, we can introduce a
restriction that a queue implemented using an array of size MAX must not have more than
(MAX - 1 ) elements. Then the status of a queue is full when the condition
((rear + 1) % MAX == front)
is fulfilled. Obviously, the status of the queue is empty if
(rear == front)
is fulfilled.
114 ■ DATA STRUCTURES USING C ■

Summing up the discussions above, to implement a queue we may use the storage struc-
ture as a structure in C containing a circular array that can store the queue elements, front and
rear, to hold the position of the starting element and the position following the last element of
the queue, respectively.
#define MAX . /^Maximum size of the queue array */
typedef ......... Q_type; /* the type of element stired in queue*/
struct Q_typee {
int front,
rear ;
struct Q_type element[MAX];
};
struct Q_type queue;
A C program that implements all basic queue operations is given in Example 6.3. The
program is a multiplication skill test program that gives you some multiplication problems.
Wrongly answered problems are queued and asked again at the end of the session.
Example 6.3
/* C listing to show all basic queue operations. This is a
multiplication skill test program */

#include <stdio.h>
tinclude <stdlib.h>
#include <time.h>

#def ine MAX 25


#define NUMBER 100
struct problem {
int nl;
int n2;
};
struct q_type {
int front;
int rear;
struct problem element[MAX];
};
main ()
{
s true t q___type wrong_queue ;
/* queue of problems answered wrong */
■ STACKS AND QUEUES ■ 115

struct problem question;


int num, wrong=0, count=0, wrongl=0, score=0;

printf ("This will test your multiplication skill\n");


printf ("For a multiplication if you can answer correctly in the
first chance\n");
printf ("you score 2 points. You will be given a second
chance. For a correct \n");
printf ("answer in this chance you score 1 point.\n");
printf ("\n\n");
printf ("Enter number of questions(l to 24) : ");
scanf ("%d", &num);
printf ("\nNow the problems follows: \n");
createQ (&wrong__queue) ;
randomize();
do{
++count;
question.nl=rand()%NUMBER;
question.n2=rand()%NUMBER;
if (’query(question,1))
{
wrong++;
addQ (&wrong__queue, question);
}
else
score +=2;
} while (count < num);
if (wrong)
{
printf ("You now get one more chance to answer the
problems/n");
printf ("which were incorrect in the first chance\n\n");

count=0;
do{
++count;
removeQ (&wrong_queue, ^question);
116 ■ DATA STRUCTURES USING C ■

i f (!query(question, 2))
wrongl++;
else
score++;
} while (count < wrong);
}
printf("You made %d correct out of %d\n", num-wrongl, num);
printf ("You scored %d points.\n", score);
exit(0);
}

createQ (struct q_type *ptrq)


{
ptrq->front = ptrq->rear = 0;
return;
}

emptyQ(struct q_type queue)


{
return(queue.front==queue.rear);
}
addQ(struct q_type *ptrq, struct problem item)
{
int gen_rear;

gen__rear = (ptrq->rear + 1)%MAX;


if (gen_rear == ptrq->front)
{
printf ("QUEUE : Attempt to insert in a full queueXn");
exi t (1);
} else {
ptrq->element[ptrq->rear]=item;
ptrq->rear=gen_rear;
}
return;
}
■ STACKS AND QUEUES ■ 117

removeQ(struct q_type *ptrq; struct problem *ptr_item)


{
if (emptyQ(*ptrq))
{
printf ("QUEUE : Attempt to delete from an empty queue\n");
exit(1);
} else {
*ptr_item = ptrq->element[ptrq->front];
ptrq->front = (ptrq->front+l)%MAX;
}
return;
}
query(struct problem prob, int n)
{
int response,answer;

printf("%d * %d = ", prob.nl, prob.n2);


scanf ("%d", ^response);

answer = prob.nl * prob. n2;


i f (answer==response)
{
if(n==l)
printf ("Correct ...in first chance \n");
else
printf ("Correct ...in second chanceNn");
return 1;
} else {
if(n==l)
printf("Wrong ...in first chance\n");
else
{
printf("Wrong ...in second chance\n");
printf("Correct answer is %d \n", answer);
}
return 0;
}
}
118 ■ DATA STRUCTURES USING C ■

The program above is self-explanatory and involves all the standard queue operations.
On execution of the program, one gets the information on how to use it.

E iX iE iR sC iliS iE iS
1. Convert each of the following infix expressions to postfix.
(a) X - Y + Z
(b)X + Y/Z + W
(c) (X+Y)/Z + W
(d) X - (Y - (Z - (W - U )))
2. Convert each of the following postfix expressions to infix.
(a) X Y + Z -
(b)XY + Z - W U * /
(c)XYZ W - + *
(d) X Y / Z W / /
3. Write C functions to convert
(a) A prefix string to infix notation
(b) A postfix string to prefix notation
4. Convert the following boolean expressions to Reverse Polish Notation.
(a)X&&( Y ll!Z)
(b) ( X II ( Y & & ! Z ) & & ( W II U )
(c) (( X < 7 ) && ( X >9)) II ! ( X > 0)
(d) (X != Y ) && ( Z != W )
5. Modify the program in Example 6.2 so that it can accept the binary operator % (mod) and
unary operator - (minus).
6. Write an expression evaluator in C that accepts an infix expression involving logical opera-
tors ( &&, II, ! ) and relational operators (<, >, <=, >=, ==, != ), converts to RPN, and then
evaluates to 1 or 0 depending on true or false, respectively.
7. A stack is to be implemented so that a stack member should hold a list of integers. Imple-
ment such a stack. Also write the push and pop routines that can be used on such a stack.
8. Instead of a stack, implement a queue so that each element of the queue holds a list of
integers. Write the functions addQ and removeQ for such a queue.
RECURSION

Recursive algorithms are especially useful in manipulating data structures that are themselves
defined recursively. Whenever a data object is defined recursively, it is often easy to describe
algorithms that work on these objects recursively. Though all the languages, such as BASIC,
COBOL, and the like, do not have the recursive facility, however, all new programming lan-
guages use recursion as their primary iterative control structures. If our programming language
does not allow recursion, that should not matter because we can always translate a recursive
programme into a nonrecursive version. C language has an inherent property to use recursion.
We will show the readers how the recursion can be implemented through C language. So a
discussion on recursion will be helpful to our readers for proper understanding of the use of
recursion in data structures.
In this chapter we introduce recursion, a problem-solving technique often used in com-
puter science. The basic objective is to provide a variety of examples alongwith explanation to
design and understand recursion. We will confine ourself to the basic concept of recursive
algorithms and the way they can be implemented using C language.

7.1 BASIC CONCEPTS OF RECURSION


Recursion is a function invoking an instance of itself, either directly or indirectly. In C, all li-
brary functions can be used recursively. Recursion is a programming technique and some pro-
gramming tasks are naturally solved with the use of recursion, which can be considered an
advanced form of flow of control. Recursion is an alternate to iteration . Recursion defines a
problem in terms of itself. A recursive solution repeatedly divides a problem into smaller sub-
problems until a directly solvable subproblem is reached. Once we obtain the solution of to
solvable subproblem, we feed it back into the next larger subproblem for its solution. This pro-
cess continues until we solve the original problem. Let us clarify the basic idea of recursion.
Let the original problem be P. P is redefined in terms of subproblem PI and PI is again
redefined in terms of P2 and so on. Let the last subproblem be Pn. If the subproblem Pn can be
solved without further subdivision, then the solution of Pn can be used to solve the subproblem
Pn-1. This solution is fed back into Pn-2,..., P2, PI until we finally have solved our original
problem P.
As an example, we will now introduce some fundamental concepts of recursive pro-
gramming. The most common is a factorial in which n! = n (n -l)(n -2)...2*l= n (n -l) is calcu-
lated.
Here we obtain the value of n! by taking the number n and multiplying it by (n-1)!. To
obtain the value of (n-1)!, we use the following by substituting (n-1) for n, that is,(n-1)! = (n-
l)(n -2 )!.
120 ■ DATA STRUCTURES USING C ■

Suppose we want to compute 3! . From the above discussion, we obtain the following
result
3! = 3* (3-1)!
=3*2!
=3*2* (2-1)!
=3*2*1!
=3*2*1*(1-1)!
=3*2*1*0!
=3*2*1*0*(0-1)!
=3*2*1*0*(—1)!

Note that the symbol asterisk (*) represents multiplication. The above example, that is,
calculation of factorials, identifies two major problems — (i) termination of recursion was not
provided and (ii) n! will always evaluate to zero for n>0.
To solve the problems, we need to divide the recursive definition into two cases: a base
case and a general case.
0!=1 /* The base case */
n!=n * (n-1)!, n > 0 / * The general case */
The base case is the non-recursive definition that terminates the recursion. On the other hand,
the general case is the recursive part of the solution definition.
We now implement the recursive version of factorial program using C language. For the
case of understanding the factorial program, we first illustrate the non-recursive version of
factorial program.
Example 7.1

/* Non-recursive version of Factorial program */


#include <stdio.h>
main( )
{
int Number;
printf("\n Give the number :");
scanf("%d" , &Number);
printf("\n Factorial of Number %d is %d \n", Number
Factorial(Number)),
}
Factorial(Number)
int Number;
{
int Fact = 1;
int i;
■ RECURSION ■ 121

for(i =1; i<=Number ;i++)


Fact*=i;
return(Fact);
}
Example 7.2

/* Recursive version of Factorial N */


/* Recursive function */
Factorial _ Recursion (N)
int N;
{
if (N<0)
{printf("\n Negative number is entered \n");
return ;
}
if (N)
return(1); /* Base case */
else
return(n * Factorial ^Recursion (N-1));
/* General case */
}
To understand how recursion works, let us take the value of N as 3. We have labelled two
cases by two com m ent statem en ts. We assum e the first call to the fu nction
Factoria l_Recur sion ( ) as being at level 1. The next recursive call will be at level 2 and so
on. To calculate the factorial of 3, we proceed as follows.

Level 1: Factorial_Recursion(3)
return(3 * Factorial_Recursion(2))
[from General case]

Level 2: Factorial_Recursion(2)
return(2 * Factorial_Recursion(1))
[from General case]

Level 3: Factorial_Recursion(1)
return(1* Factorial_Recursion(0))
[from General case]
122 ■ DATA STRUCTURES USING C ■

Level 4: Factorial_Recursion (0)


return(1)
[from Base case]
We now compute the factorial of 3 since we have reached the base case. We move in the
reverse order (i.e., from the order in which they are generated). The output of level 4 is 1, level
3 produces 1, level 2 computes 2, and finally level 1 calculates 6.
We now take some more examples to illustrate the idea of recursion.
Example 7.3

/^Recursive version to calculate xn , n>0, n is positive */


/* integer*/
POWER(x ,n )
int x,n;
{
/* x is the number and n is power */
if(n == 1)
return(x);
return(x * POWER(x,n-1));
}
Let us look at another example. This programme counts to the desired number, but also
displays information that help us to see what happens in recursive calls.
Example 7.4

/* Program to count recursively and also to write */


/* information about current level of recursion */
#include <stdio.h>
int level;
void count(int val)
{
printf("\n\n Starting count at level % 2d : val
= %2d\n", ++level, val);
if (val>l)
count(val-1);
printf("Displaying val ; %2d\n", val);
printf ("Leaving count at level %2d : val = %2d\n" Level — , val);
}
/*Main program */
main ( )
{
■ RECURSION ■ 123

void count(int);
int val;
printf("Count to what value?");
scanf("%d", &val);
level =0;
count(val);
}
Let us see how the recursion process works.
The program begins executing in main function. Let this function call co u n t () with an
argument of 4 (i.e., the value of val is 4). The function co u n t (4) begins its execution while the
main function is on hold. This function displays level information about level of recursion.
Since the value of val, that is, argument of count (now it is 4) is greater than 1, the function calls
co u n t ( ) with an argument of 3. Count (3 ) displays information about the level of recursion.
Since the parameter has a value greater than 1, the function calls co u n t ( ) with an argument 2.
At this point, m ain ( ) , c o u n t (4) , and co u n t (3 ) are on hold, and co u n t (2 ) begins
its execution. Just as the two previous calls to co u n t ( ) did, co u n t (2 ) displays information
about the level of recursion. The function now checks whether its argument value is greater
than 1. Since the condition is true, m ain ( ) , c o u n t (4) , co u n t (3) , and co u n t (2 ) are on
hold , and co u n t (1) starts its execution. Count (1) displays its information about recursion
level. Now the best condition for the if statement fails. Control thus transfers to the next state-
ment in the same function which turns out to be the call to p r i n t f ( ) for displaying the value
of val. Note that the last version of co u n t ( ) function called was the first version to do any
work. One feature of recursive function calls is that the last version called is the first one to do
work. The last one called is also the first one to finish its work. We can also see this in the output
as given below.
Count to what value? 4
Starting count at level 1: val =4
Starting count at level 2: val =3
Starting count at level 3: val =2
Starting count at level 4: val =1
Leaving count at level 4: val =1
Leaving count at level 3: val =2
Leaving count at level 2: val =3
Leaving count at level 1: val =4
A recursive function must always test whether it can stop before calling another version
of itself. If a recursive call is made, the parameters in some calls should eventually have values
that will make further recursive calls unnecessary.
Suppose we want to calculate a recursive algorithm to implement the routine power
(x,y), that is, it raises the value contained in x to the yth power. For example, if we need to
calculate the value of 5 raised to the power of 4, power (5.4), we are really solving for 5 * power
(5.3). Likewise, power (5,3) is equivalent to 5* (5,2) when in the power the value is raised to be
equal to 0, the ending value, the value 1, is returned to the calling function.
124 ■ DATA STRUCTURES USING C M

If power ( ) function is called with power (5,2), the following processing is performed:
power (5,2) return
5* power (5, 2 -1)
power (5,1) return
5* power (5,1 -1)
power (5,0)
return 1 by definition
5 ------------- result of power(5,1)
25------------- result of power (5,2)
We will now present the recursive routine for the power function.
Example 7.5

/* Power function */
/* Returns the result of value raised to the */
/* power contained in the variable raised */
/* to, or -1 if the value in raised to is negative */

float power(x,y)
float x;
float y;
{
if (y<0)
return( -1 , 0) ;
else
if(y ==0) /* Any value raised to zero is one */
return( 1,0);
else
return(x* power(x,y -1);
}

7.2 RECURSION IMPLEMENTATION


Recursive algorithms are frequently quite short and elegant but they may not always be the
quickest in terms of execution time and of memory space. We can calculate a factorial in itera-
tive way in a more efficient way than the recursive method. In this section we will see how
recursion is implemented and why it is sometimes inefficient.
Whenever a function is called, the run-time system dynamically allocates memory spaces
for storing parameters of that function and other variables and constants that are declared within
it. Also it is required to store return address. Since it is difficult to know in advance about the
memory spaces which will be reserved for the recursive call, the stack will store the necessary
m RECURSION m 125

information. The stack holds all the information of every function that has been invoked but not
yet completed. Let us consider a simple program to illustrate the use of the stack.
Example 7.6

/* Recursive call */
# include <stdio.h>
m a i n ()
{
static int x=0;
x++;
x<7 ? main( ) :x++;
printf ("%2d* , x );
}
Output: The output of the programme will b e : 8 8 8 8 8 8 8
In the above program m ain ( ) calls itself, though it is not common to the programmer.
The integer variable, x, is defined as storage class, static. Thus, it stores the changed value of x
after initialization. So the value of x is zero for the first time and at the time of next call it stores
the incremented value of x. Each time m ain ( ) function is called, the p r i n t f (. ) statementis
stored in the stack. This statement remains on the stack until the function is completed. The
condition x<7 imposes that there are six calls to m a i n ( ) function and naturally six
p r i n t f ( ) statements are stored on the stack. Since functions are always excited in a reverse
order from the order in which they are invoked, these informations are stored in a stack due to
the nature of the stack (last-in-first-out). Finally, when the condition x<7 fails, the value of x is
incremented again ( i.e., x=7+l =8). Now the informations from the stack are popped off and
execute the p r i n t f ( ) statement with the latest value of x. We will show the process in
Fig. 7.1.

printf()
printf()
printf()
printf()
printf() printf()
printf() printf() printf()
x=l x=2 x=7
(a) Stack (b) After first call (c) After second call (d) After final call

Fig. 7.1 Run-time stack structure

It is always possible to exhaust the space allocated to the run-time stack and get a stack
overflow error condition. Besides this there is also the issue of time required to store the remain-
ing statements on the stack when a function is invoked and return the space when it is com-
pleted. Function invocation involves a great deal of overhead, and it can be a time-consuming
126 ■ DATA STRUCTURES USING C ■

operation. Therefore, recursive algorithms that invoke a great deal of function linkage can some-
times run more slowly than interactive programs that solve the same problem.
We will now present another recursive function which is often used in string manipulation.
The following routine counts the number of characters contained in a string via a recur-
sive process.

string-length (String )
char *String;
{
if(*String == NULL) /* End of the string */
return(0);
else
return(l + String-length (++String ));
/* Recursive call */
}
The routine examines the value referenced by *String to see if it is NULL (the ending
condition). If the *String does not contain NULL the value returned is return (1+String_length
(+ + String));
If *String is NULL, it will return zero. If the routine is invoked with the string "Hello",
the following processing will take place:
String _ length (String); /* Return 1 */
1st time: *String points to 'H'
return! 1+String_length (++String ));

2nd time: String_length(String); /* Return 1+1 =2 */


*String points to 'e'
return(1+String_length (++String ));

5th time: String_length (String) ;


*String points to "o" /^return 4+1=5 */
return (1+String_length (++String));

6th time: if(*String == NULL) is true,


so it returns(0), that is, 5+0 =5

7.3 THETOWER OF HANOI


The traditional game Tower of Hanoi is also an example of recursion. The game involves three
pegs: pegl, peg2, and peg3. Suppose there are N discs on pegl. It is required to move N discs
from pegl (source) to peg3 (destination). The relative ordering of the discs on pegl has to be
■ RECURSION ■ 127

maintained as they are moved to peg3. The discs are to be stacked from the largest to smallest
beginning from the bottom.
While moving the discs from pegl to peg3, the following rules are to be observed.
(i) Only one disc can be moved at a time.
(ii) No large disc can be placed over the top of a smaller disc.
(iii) An auxiliary peg, that is, peg2 can be used to act as an intermediate to store one or more
discs while they are being moved from their source, (peg pegl) to their destination
peg (peg3).
Let us illustrate the changeover process considering the value of N as 1,2, and 3. Finally,
we will conclude the problem in general. Suppose there is only one disc, that is, N=l. The solu-
tion is very simple, that is, merely move the disc from pegl to peg3, as shown in Figs. 7.2(a)
and 7.2(b).
If there are two discs on pegl, move the top disc from pegl to peg2 and then move the
second disc from pegl to peg3. Finally, move the top (first) disc from peg2 to peg3. The transfer
procedure is shown in Figs. 7.3(a) - 7.3(d).
The problem of transferring discs from pegl to peg3 is somewhat complex in case when
the value of N is 3. In this case, move the first two discs from pegl to peg2 using peg3 as an
intermediate. Then move the third disc from pegl to peg3. Next, move two discs from peg2 to
peg3 using pegl as an intermediate. The entire procedure is shown in Fig. 7.4. Fig. 7.5 illustrates
the same problem when N contains four discs.
For general N, use the technique already established for transferring three discs. First,
move (N-1) discs from pegl to peg2 using peg3 as an intermediate. Then move top disc from
pegl to peg3. Then again apply the same technique to move (N-1) discs from peg2 to peg3
using pegl as an intermediate.

A B C

(a)
1

A B C

(b)
Fig. 7.2 (a) Initial configuration when N= 1 (b) Final configuration when N=1
1

A B C

(a)
2 1

A B C

(b)
128 ■ DATA STRUCTURES USING C ■

1
2

(a)
Initial placement of discs when N=3

(b)
Move top disc from A to C

(c)
Move second disc from A to B

2
B

(d)
Move disc from C to B
■ RECURSION m 129

2 3

A B C

(e)
Move disc from A to C
1 3

A B C

(f)
Move top disc from B to A
2

1 3

A B C

(g)
Move disc from B to C
1

A B C

(h)
Move disc 1 from A to C

Fig. 7.4 (a) - (h) Initial to final configuration when number of discs is 3

Next we show the content of peg when number of discs is 4.


1
2
3
4

A B C

(a)
Initial configuration of discs when N=4
130 ■ DATA STRUCTURES USING C ■

4 1

A B C

(b)
Move disc from A to B

4 1 2

A B C

(c)
Move disc from A to C

3 1

4 2

A B C

(d)
Move disc from B to C

4 3 2

A B C

(e)
Move disc from A to B

4 3 2
A B C

(f)
Move disc from C to A

1 2

4 3

A B c
(g)
Move disc from C to B
■ RECURSION ■ 131

1
2
4 3

A B C

(h)
Move disc from A to B

1
2
3 4

A B C

(i)
Move disc from A to C

-2 1
3 4

A B C

(j)
Move disc from B to C

1
2 3 4

A B C

(k)
Move disc from B to A
1
2 3 4
A B C

(1)
Move disc from C to A

1 3
2 4
A B C

(m)
Move disc from B to C
132 ■ DATA STRUCTURES USING C M

2 1 4

A B C

(n)
Move disc from A to B

1 4

A B C

(o)
Move disc from A to C

A B C

(P)
Move disc from B to C ( Final configuration)
Fig. 7.5 (a) - (p) Initial to final configuration when number of discs is 4

Recursion is a powerful tool when used properly, but there are trade-offs. Recursion can sim-
plify difficult programming tasks, such as the Tower of Hanoi. It is doubtful whether a pro-
grammer could have developed the non-recursive solution to the Tower of Hanoi problem di-
rectly from the problem statement. A non-recursive solution involving stacks is more difficult
to realize and more prone to error when a stack cannot be eliminated from the nonrecursive
version of a program and when its counterpart recursive version can be as fast or faster than the
non-recursive version under a standard complier.
In general, a nonrecursive version of a program will execute more efficiently in terms of
time and space than a recursive version. This is because the overhead involved in entering and
exiting a block is not required in the recursive routine. In a non-recursive program, the stack
activity can be eliminated. However, it may sometimes be possible to convert a recursive func-
tion to a non-recursive function and vice versa. The cost involved for this conversion may de-
pend entirely on the knowledge of the programmer. Finally, even C language supports recur-
sion, a recursive solution to a problem is often more expensive than a non-recursive solution,
both in terms of time and space. Frequently this expense is a small price to pay for the simplicity
■ RECURSION ■ 133

and self-documentation of the recursive solution. We now present the program for the Tower of
Hanoi problem.
Example 7.7

#include <stdio.h>
/* Recursive routine for Tower of Hanoi */
/* It perforins to move N-number of discs */
/* from source to destination using*/
/* auxiliary N given as input */
Tower_of_Hanoi ( N, pegl, peg2, peg3)
int N;
char pegl, peg2, peg3;
{
if (N<1)
printf("\n No disc is present on pegl\n");
else if (N ==1)
printf("Move disc from %c to %c\n", pegl, peg2);
else
{
Tower__of__Hanio (N-1, pegl, peg3, peg2);
printf("Move disc from %C to %C\n", pegl, peg3);
Tower __of__Hanoi (n-1, peg2, pegl, peg3);
}
}
/*Main program */
main( )
{
int N ; /* Number of discs to be moved */
char pegl, peg2, peg3 ;
pegl= 'A';
peg2= 'B ';
peg3= 7C #;
printf ("\n Enter the number of discs to be moved:");
scanf("%d, &N);
Tower_of_Hanoi(N, pegl, peg2, peg3);
}
134 ■ DATA STRUCTURES USING C ■

Output:
Sample run is given below:
Enter the number of discs to be move: 3
Move disc from A to C
Move disc from A to B
Move disc from C to B
Move disc from A to C
Move disc from B to A
Move disc from B to C
Move disc from A to C
In the above program, we call T o w e r _ o f _ H a n o i function within the function
T ow er_of_H anoi. This represents a recursive call. Although such a function may be easily
written but it is somewhat difficult to understand thoroughly. While describing the process of
recursive call, it will be evident that the use of a stack is essential for implementing the recursive
call.

7.5 RECURSION VS ITERATION


Recursion is an extremely powerful problem-solving technique. For certain classes of problems
(specially those involving recursively defined data structures), it allows us to create highly el-
egant and compact solutions to complex problems. Generally there are essentially two types of
recursion. The first type concerns recursively defined functions, such as factorial function whereas
the second type is the recursive use of non-primitive recursion. A typical example of this type is
the Tower of Hanoi problem. It is always possible to transform mechanically any primitive
recursive function into an equivalent iterative process. However, this is not the case for
nonprimitive recursive function. Although there exists an iterative solution for the Tower of
Hanoi problem, in general, there are many problems of that form for which iterative solutions
either do not exist or are not easily found. Certain inherently recursive processes can be solved
in programming languages where such facilities do not exist. For example, a factorial program
can be easily solved iteratively as compared to the recursive method. It is also true that even
when there is no inherent recursive structure, the recursive solution may be much simpler.
By setting up an external stack, we can transform any recursive program into its iterative
form. However, it is often complicated and harder to understand. Therefore we should always
evaluate whether an iterative solution may be superior to recursion in a particular case.
The key to understanding the function is to be aware about the process when the recur-
sive call is made. The difficulty with handling recursion lies in the process that a function is
called from within that function. Eventually, a return will be made to the last calling function.
Since the call also involves changing the values of the function's parameters, the old values will
be destroyed unless they were stored in the stack along with their return address. Thus, a recur-
sive call involves a stack to store not only the return address but also the values of all param-
eters essential to the current state of the function. To illustrate this, we will trace the actions
taken when a call of the form Tower_of_Hanoi (3 , 'A ', 'B ', 'C ') is initiated.
■ RECURSION ■ 135

Tower _of_Hanoi (3, 'A', 'B', 'C')

------------------- ►Tower_of_Hanoi (2, 'A', 'C', 'B')

------------------- ►printf ( )

> Tower_of_Hanoi (2, 'B', 'A', 'C')

Tower_of_Hanoi(l, 'A', 'B', 'C') :A -^C

Tower of Hanoi (2, 'A', 'C', 'B') — ►------ printf ( ) :A—►B

----- Tower_of_Hanoi)l, 'C', ‘A’, 'B') :C—►B


printf ( ) :A—►C

----- Tower_of_Hanoi (1, 'B', 'C', 'A') :B —►A

Tower of Hanoi (2, 'B', 'A', 'C') printf () :B -*-C

Tower of Hanoi(l/A'/B'/C') :A -»C

7.6 EXAMPLES
In this section we present more examples to illustrate the design, development, and implemen-
tation of recursive functions. In the subsequent chapters, we will come across examples that are
more easily solved by recursion. We will present here only programs and recursive routines
without describing the construction of programs. The reader will understand the actual process
through the output which will be given after each program.
Example 7.1: The Fibonacci sequence is the sequence of integers
0, 1, 1, 2, 3, 5, 8,13, 2 1 ,..............................
Each element in the sequence is the sum of the two preceding elements. We can define the
sequence by the following recursive expressions:
fibo(n) = n if n = 0 or 1
fibo(n) = fibo (n-2) + fibo (n-1) if n> = 2
We will now present the recursive version of the above sequence.
Example 7.8

/* main */
#include < stdio.h>
m ain( )
{
136 ■ DATA STRUCTURES USING C ■

int n ; /* No. of terms */


printf("\n Enter the value of n:");
scanf("%d", &n);
fibo(n) ;
}
/* Recursive function for Fibonacci sequence */
fibo(n)
int n;
{
if ((n= = 0) I! (n= = l) )
return(n);
else
return (fibo (n-2)+fibo(n-l) ) ;
}
The next example is to write a recursive function convert (number, base) to convert a
given positive integer(number) to its equivalent number in another base and return it as a string.
Example 7.9

#include(stdio.h>
char A [16];
int i =0;
main ( )
{
int number, base, 1, p, q, t;
printf("\n Enter number and base:");
scanf("%d, %d", ^number, &base);
l=strlen(A) ;
Convert(number,base);
for(p=0, q=l~l; p<q, p++ , q--)
{
t=A [p] ;
A[p]=A[q]; /* Swap A [p] and A[q] */
A[q] = T ;
}
printf("\n%S", A ) ;
return ;
}
m RECURSION m 137

/* Recursive function convert( ) */


Convert(number,base)
int number, base ;
{
i f ( number==0)
return;
else
{
if(number % base <10)
A[i++] = number % base +'0';
else
A[i++]= number % base + 55;
/* For hexadecimal number
'A' to'F ' */
Convert(number/base, base); /* Recursive call */
}
}
We are now presenting a set of inputs and the corresponding outputs.
(i) Input:
Enter number and base: 10,2
Output:
1010
(ii) Input:
Enter number and base: 255,16
Output:
FF
Example 7.10

#include <stdio.h>
bitcount(word)
unsigned int word ;
{
/* This function returns the number of bits in a word */
/* which are 1 */
if(word == 0)
return(0);
else
138 ■ DATA STRUCTURES USING C ■

return(1+bitcount(word&(word-1)));
}
/* Main program */
main( )
{
unsigned int word ;
printf("\n Enter an unsigned word:");
scanf("%n", & word);
/* Call the bit count function */
printf("\n Number of l's in the word = %d", bitcount(word));
}
Input:
Enter an unsigned word:15
Output:
Number of l's in the word =4

Example 7.11: The following example illustrates a function that returns an integer value to its
ASCII representation.

/* This function returns an integer value*/


/* to its ASCII representation */
To_ascii(value, S, index)
int value;
char S [ ];
int index;
{
int sign = value ;
S[index ++] = Absolute (value) % 10 + 'O';
value = value/10;
if (Absolute (value)>0)
To_ascii (Value,S,index);
else
{
if(sign<0)
S[index ++ ]= '-';
S [index]= NULL;
Reverse(S,0, index-1);
}
■ RECURSION ■ 139

/* Returns the absolute value of the value */


/* Contained in the variable, value*/
Absolute(value)
int value;
{
if (value>0)
return(value);
else
return(-value);
}

/* This function reverses the characters in a string*/


/* or substring contained in the string */
Reverse(S, start, end)
char S [ ];
int start , end;
{
int length;
char temp;
length = strlen(S); /* String length function */
if(start>= length)
return(-1);
else
if (end>= length)
end = length -1;
if(start >= end)
return;
else
{
temp = S [start];
S[start]= S[end];
S[end]=temp;
Reverse (S, ++ start, — end);
}
}
It is also possible to write a reverse function in different ways. Program 7.12 is another
way of representing this.
140 ■ DATA STRUCTURES USING C ■

Example 7.12

#include<stdio.h>
/* Function to print the contents of the character string */
/* in reverse order , e.g., "This" will be printed as "sihT" */
print__reverse (S) ;
char *S;
{
if (*S ! = NULL)
{
print_reverse(++S) ;
putchar(* ( -- S) ) ;
}
}
/* Main program */
main( )
{
char S[80] ;
printf("M Enter a string:\n") ;
gets(S) ; /* Read a string */
print-reverse (S) ;
}
Input:
Enter a string
This is a String
Output:
g n i r t S a Si sihT

7.7 COST OF RECURSION


Recursive functions can make code much more compact and easy to read, but not necessarily
always more efficient. In fact, sometimes recursive versions of a function can be very inefficient.
In this section, we will look briefly at an example of a recursive function that becomes more
inefficient as the values used increase. The main point here is to understand the idea of how the
number of recursive calls can grow. This example also shows that the recursive function of the
Fibonacci numbers is far more expensive than the iterative function.
Fibonacci numbers are elements in a mathematical series that occur in many situations.
The definition of a Fibonacci number is simple but recursive. The first two Fibonacci numbers,
fibo(l) and fibo (2), are defined as 1. Subsequent numbers are formed by adding the preceding
two Fibonacci numbers.
■ RECURSION ■ 141

To understand the cost of recursion, we again rewrite the Fibonacci program (with a
minor modification) to compute the desired Fibonacci number. The program is inefficient after
about the twentieth Fibonacci number.
Example 7.13

/* Program to compute Fibonacci numbers */


#include<stdio.h>
#include<math. h>
long No_of_calls =0 ;
long fibo(int n ) ;
{
No_of_calls ++ ;
if (n>2)
return(fibo (n-l)+ fibo (n-2));
else
return(1);
}
main( )
{
long fibo( int), fibo_number;
int seed ;
printf("\n Number?");
scanf ("%d", &seed);
fibo_number= fibo(seed) ;
printf("fibo(%2d) = %ld", seed, fibo_number);
printf ('%71d calls to fibo ( ) \n", no_of_calls );
}
At the beginning of the recursive call, the function tests when it can stop. If it so happens,
it will return a specific value 1 and the function returns control to the calling function. The
Fibonacci number fibo (4) is the sum of fibo (2) and fibo (3) and fibo (3) is the sum of fibo (2) and
fibo (1). Let us look at the following outputs for selected Fibonacci numbers.
Number?fibo (10) = 55,109 calls to fibo ( )
Number? fibo (20) = 6765,13529 calls to fibo ( )
Number? fibo (30) = 832040,1664079 calls to fibo ( )
In the above it is seen that the number of calls is unacceptably high even for a relatively
small value of seed. It is much simpler and more straightforward to use the iterative method for
evaluation of the Fibonacci function, that is, fibo ().
It is always possible to accomplish a task in a nonrecursive way. The trade-offs are effi-
ciency of coding and efficiency of execution. For example, the following program will compute
Fibonacci numbers nonrecursively with a higher efficiency than its iterative counterpart.
142 ■ DATA STRUCTURES USING C ■

/* Non-recursive version of Fibonacci function */


long fibo(int x)
{
long fl=l, /* fibo (n-1) */
f2 = 1, /*fibo (n-2) */
curr_fibo; /* Current value of fibo( )*/
int i;
if (x>2)
{
/* Case 1 and case 2 are handled by */
/* the else part */
for( i=3; i<=x; i + + )
{
curr__fibo = fl+f2;
f2 = f1;
fl=curr__fibo;
}
return(curr_fibo);
}
else
return(1);
}
In this section, we illustrate how the cost of recursion increases exponentially when the
depth of recursion increases. The depth of recursion is proportional to the number of times the
function is called recursively in the process of evaluating a given argument or arguments. We
will see more examples in subsequent chapters where the recursive method is more effective
than the iterative method. This chapter has introduced the basic concept of recursion. The cen-
tral point to remember is the fundamental difference between iteration and recursion. We de-
scribed the merits and demerits of recursion in comparison with iteration. Examples are given
to illustrate the idea of recursion.

E mX mE i R i C ; h S i E i S
1. Write a non-recursive routine to calculate x*y by using addition only. Assume that both x
and y are non-negetive integers.
2. Write a recursive function to evaluate
s(n)=2+4+6+... +n
m RECURSION m 143

3. The greatest common divisor (GCD) of two integers A and B is defined as


gcd(A, B) = gcd(B, A) if A<B
A if B=0
gcd(B, A%B) if A>=B
The expression A%B yields the remainder of A upon division by B. Write a recursive rou-
tine to compute gcd(A, B). Find also a non-recursive method for computing this function.
4. The Ackeman's function is defined as follows
ack(m, n) = n+1 if m=0
ack(m-l, 1) if m !*0 and n=0
ack(m-l, ack(m, n-1)) if m != 0, n != 0
Write an iterative routine to compute Ackeman's function.
5. Write a recursive function to compute the number of sequences of n binary digits that do
not contain two Is in a row.
6. We can recursively define sorting an array A as finding the biggest element in A, placing it
at the beginning of A, and concatenating to it the result of sorting the remaining n-1 items
in the arrey. Write a recursive routine to sort elements of an array A.
7. Write a recursive function that counts the number of times a particular key occurs in a
linked list L.
8. Write a recursive C program to compute a+b using postincrement (x++) operation. Assume
that a and b are non-negetive integers.
9. Write an iterative routine to implement Tower of Hanoi problem.
10. Develop a recursive function to compute the number of different ways in which an integer
X can be written as a sum, each of whose operands is less than n.
LISTS

In Chapter 6, stacks and queues were considered. As a data structure they are a sequence of
data elements where only two types of operations can be performed: insertion and deletion.
Moreover, these operations are restricted to the ends of stacks and queues which are nothing
but special types of lists. But in the case of a general list, there is no restriction. An element can
be inserted into or deleted from any position of the existing list. In fact, a general list may re-
quire some more operations on it. Lists, in general, together with different types of list imple-
mentations are discussed in this chapter in detail.

In general, we use the term 'list' to signify a series of elements. It is to be noted here that for our
purpose the series of elements in a list also maintains an order, that is, first element of the list,
second element of the list, and so on.
As a data structure, a list is a finite sequence of elements. In fact, a list may be an empty
list also. There are several types of operations involved in a list and they greatly depend on the
application. The common operations on list include
(i) create an empty list.
(ii) check if a list is empty.
(iii) traverse the list or a portion of the list.
(iv) insert an element into the list.
(v) delete an element from the list.
By definition, a list is a finite sequence of elements and hence an array may seem to be
most natural as the storage structure of the sequential implementation of list. In this implemen-
tation the list elements are stored as consecutive array elements. This implies that ith list ele-
ment is stored as ith array element.
It is to be noted at this point that an array has a fixed length. Once the array is defined we
are restricting the size (number of elements ) of the list. For the time being let us compromise
with the above fact by defining an array of sufficiently large size so that all the list elements can
be accommodated. Let MAXLEN be the constant that defines the maximum length of the list
(array). Also consider that we inserted n elements to this list (array) sequentially, that is, ith list
element is in the ith element (with index (i-1 )) of the array and n < MAXLEN. Obviously, then
some of the array elements will be empty at its tail. So we should have an auxiliary variable to
maintain the current length of the list.
musTsm 145

With this background we may assume the following definitions and declarations.
#define MAXLEN .../*Maximum length of the array(list)*/

typedef........ itemtype; /*Type of list elements */

struct listtype {
iterntype itern[MAXLEN];
int len;
};
struct listtype list;
itemtype element;
With the above definitions and declarations the first three list operations mentioned above
can be implemented very easily, but the last two will be little inefficient.
To create an empty list we can simply set the length of list to zero by using a statement
like
list.len=0;
Now it is trivial to return 1 if the list is empty by using a statement like
return(list.len==0);
which might be used to check if a list is empty.
We can traverse or look through the list elements one by one from the beginning of the
list by passing through all the array elements from the beginning. This may be of the form
for(i=0; iclist.len; i++)
{
process list.item[i];
}
Now, to insert an item into the list we must first decide at what point it is to be inserted.
Generally speaking, we need to know after which member of the list the item is to be inserted.
As we are considering sequential implementation we know that the ith member of the list is
stored in array index(i-l). Consider that our item should be the mth member of the list. So it
should be stored in index(m -l) of the list array. Therefore, we must first make the room empty
at the index(m-l) of the list array by shifting all the members of the list one step to the right and
then put the item at this empty place. For shifting the members of a list array one step to the
right we may use the code
for(i=len-l; i>=m-1; i— )
list.item[i+l]=list.item[i];
and finally put the element at its proper place by setting
l i s t . item [m-1 ] =e le m e n t; and increasing len by l i s t . l e n + + ;
Note that we can safely assume the fact that len>=m-l. One more thing must be kept in
mind that before doing the shifting of list members we must be sure that the list array is not full.
Clearly, the number of list members that must be shifted, plays a very important role in
146 ■ DATA STRUCTURES USING C ■

measuring the efficiency of this insertion process. If there are n members in the list then both the
worst case and average case complexities for their process are O(n).
The reverse process is delete. Here if we want to remove the mth member we need to
shift some list members to the left and may use the code
for(i=m-l; i<len-l; i++)
list.len[i]=list.len[i+1];
list.len— ;
Here also measure of efficiency depends on the number of list members to be shifted
and again both the worst case and average case complexities are O(n), where n is the total
number of members in the list.
From the above discussion we see that if we need to maintain a huge list where the fre-
quency of insertion and deletion is very high, this sequential implementation will be simply an
inefficient one since most of the time will be consumed in shifting of list members. There are
other better implementation which is discussed in the next section.

In the sequential implementation we have seen that an array may be thought of a list whose
ordering of elements is implicit. This is obvious because the ith member of the list is stored in
the (i -l)th location of the array. Actually, their implicit ordering is responsible for shifting the
elements of the list to the right or left when we need to insert or delete an element to or from the
list, respectively.
In the linked implementation of lists the ordering of list members (elements) is kept ex-
plicitly. This is possible only when we can
(a) get the first element of the list, and
(b) for any given list element we can determine its successor element.
This suggests a trivial linked-list implementation which requires to maintain the following.
(a) A collection of nodes that stores two items of information:
(i) The list member or element, and
(ii) A pointer that explicitly gives the location of the successor node.
(b) A pointer that holds the location of the first node of the list.
The pointers are basically establishing a link between the current node and its successor
node. That is why these pointers are termed as links in some of the books. We use the terms
Tink' and 'pointers' synonimously.
As an example, a list of the following names of places Bangalore, Jamshedpur, Pune might
pictorially be represented as

In the above diagram arrows indicates links. So in the above list head is a pointer that
points to the first node whose information (data ) part contains Bangalore. This is actually the
first list member. This node has got another part called the link part which keeps the informa-
tion about the location of its successor (next) node. This link part is basically a pointer to the
■ LISTS m 147

successor node. In the successor node of the first node the information part contains Jamshedpur
and the link part points to its successor. Through this link we reach the node where Pune is in
the information part and its link part shows an electrical ground symbol indicating that there
are no more nodes, that is, this node is the terminal node. The pointer value at this node is
NULL. For illustration purpose, in future we will denote the information part of a node pointed
by ptr->info and the link part by ptr->link
Now we are in a position to look as to how the five basic list operations given in Section
8.1 can be implemented. The first list operation is to create an empty list which is simply to
assign a NULL pointer to the pointer variable head. This is to be done to create an empty list to
indicate that head points to no node. The second operation is to check if a list is empty. This can
trivially be implemented by testing whether head is holding the NULL pointer or not. The next
list operation is traversing a linked list. A list traversal is essential when we need to process
each list element exactly once. Consider, for instance, that processing a list element consists of
only printing the element. We do this by initializing an additional pointer variablepsntp tr(say)
to point to the starting node. This may pictorially be represented as

psntptr

This can be achieved by assigning psntptr equal to head.


Before going further into the algorithm for traversing a list, note the following notations
to be used in the illustration. We have already seen that a linked list is a sequence of nodes. Let
p tr be a pointer which points to a node. A node has an information part and a link part. We use
the notation ptr->info to indicate the information part of the node pointed by the pointer ptr.
Similarly, ptr->link is used to indicate the link part of the same. With these notations we resume
our illustration of list traversal process.
As the pointer variable psntptr presently points to the first node we can process (print)
the information part of it by printing it, that is,
Print psntptr->info
This will print the list element 'Bangalore'. Then psntptr is to be shifted to point the next
node and print the information part of the next node. This can be done by
assign psntptr equal to psntptr->link
Print psntptr->info
The above assignment is represented pictorially as

psntptr

and then prints 'Jamshedpur'. Now again psntptr is shifted to point the next node by
assign psntptr equal to psntptr->link
148 ■ DATA STRUCTURES USING C ■

which may be depicted as

psntptr

In a similar fashion 'Pune' will be printed by


Print psntptr->info
Now notice that there is no more in the node list. So the value of psntptr-link is a NULL pointer.
Hence the assignment

assign psntptr equal to psntptr->link


makes psntptr to hold a NULL pointer. At this point all the list elements are processed (printed)
and we can terminate. The above discussion now suggests the following algorithm for a linked
list traversal. It is also valid for an empty list.

assign psntptr equal to head;


while (psntptr ! = NULL)
{
print psntptr —> info
assign psntptr equal to psntptr->link;
}
We are now left with the last two basic operations, insertion and deletion.
It is quite obvious that to insert an element into a list we should have the list member to
insert and at which point of the list the member should be inserted. It is also clear that the
member cannot be inserted directly. First we should have a free node whose information part is
to be filled with the value of the list member and then this node is to be inserted at the right
point of the list. This again yields an obvious question that where from we get a free node? To
answer this question let us assume that we have a storage pool of free nodes where from we can
get such a node and we have a function called fetchnode () that returns a pointer to a free
node, which we can use to serve our purpose. For the time being let us not turn our attention to
how the function fetchnode () returns the pointer to a node. To insert a node to a given
linked list (means the head of the list is given) we need to consider the following two cases.
(i) The node to be inserted at the beginning of the linked list.
(ii) The node to be inserted after some given node within the list.
Clearly, in the first case the head of the list will change but in the second case the head
will remain unaltered.
To illustrate the first case let the city name 'Ahmedabad' be inserted at the beginning of
the list. Firstly, we call the function fetchnode () to get a pointer ptr (say) to a free node. Set
the information part of this node by the value 'Ahmedabad'. Now to connect this node to the
beginning of the list whose first element is pointed by head, we set the link part of the node
pointed by ptr with the value of head and then change head with the value of ptr.
■ LISTS ■ 149

This may be written as


assign ptr equal to fetchnode();
assign ptr->info equal to 'Ahmedabad';
assign ptr->link equal to head;
assign head equal to ptr;
A stepwise pictorial representation may look like

ptr

ptr

Here head points to the first node of the list whose list members are the four city names
'Ahmedabad', 'Bangalore', 'Jamshedpur', and 'Pune'.
Let us now consider the second case which seeks to insert a node that will contain the city
name 'Chennai' after the node with the city name 'Bangalore'. Consider also that we have in
hand a pointer prevp tr (say) that points to the node with the city name 'Bangalore'. Precisely
speaking, we seek to insert a node with the city name 'Chennai' after the node pointed by
prevp tr. As before, we first do the following:
assign ptr equal to fetchnode();
assign ptr->info equal to 'Chennai'
The status of the node and list may be depicted as

Now to insert the node pointed by ptr within the list after the node pointed by prevp tr,
we do the following in sequence. Clearly the link part of the node pointed by prevptr is the
pointer that points to the next node to the node pointed by prevptr, and should be next node
to the node pointed by ptr after insertion. So link part of the node pointed by prevptr is
150 ■ DATA STRUCTURES USING C ■

copied to the link part of the node pointed by ptr, that is,
assign ptr->link equal to prevptr->link
This status is represented as

After this, the node pointed by ptr should be the next node to the node pointed by
prevptr and we can do it by assigning the link part of the node pointed by prevptr as ptr,
which may be expressed as
assign prevptr->link equal to ptr
This completes the insertion operation as shown below.

We are now left with the deletion operation on linked list. Here also we need to consider
two different cases as follows.
(i) The node is to be deleted from the beginning of the list, that is, the first node of the linked
list is to be deleted.
(ii) The node to be deleted is situated after a node which is pointed by a given pointer, that is,
the successor of a given node is to be deleted.
Note that here also the head (pointer to the first node of the list) will change in the first
case after deletion and it will remain unchanged in the second case. Consider that presently we
are starting with a list with five elements depicted as below, choose first element to be pointed
by head.

One point should be noted here that after deleting a node it will not appear in the list. So
the space used for the deleted node will not be useful anymore unless we make it free that is, we
should treat the deleted node as an empty node so that we can reuse it whenever needed. For
our purpose, let us assume that we have a function freenode (ptr) which allows the node
pointed by the pointer ptr as a reuseable node.
m l is t s m 151

To delete the first node from the above list we take the help of an additional pointer
variable p tr(say ). Firstly, we save the starting pointer head within the variable ptr. Then we
simply set head such that it points to the very next node. This can be done by storing the value
of ptr->l ink to head. Finally, we call freenode (p tr ) to send back the first node (which is
now pointed by p t r ) to the storage pool so that this space can be reused.
More precisely, we perform the following steps in sequence.
assign ptr equal to head
assign head equal to ptr->link
Call the function freenode(ptr):
The step-by-step status of the list is depicted as follows.

In the final list we see that there are four nodes, each containing a city name whose first
node is pointed by the pointer variable head. To illustrate the second case of deletion process,
let us assume that we have another pointer variable prevptr which is pointer to the node with
city name 'Chennai' (previous node of the node to be deleted) and we are to delete the node
which comes immediately after this node in the linked list, that is, the node with city name
'Jamshedpur'. So the current picture of the list looks like

Here also it is better to take the help of an additional pointer variable p tr(say) which will
point to the node to be deleted. This can be achieved by executing the step
assign ptr equal to prevptr->link
152 ■ DATA STRUCTURES USING C ■

and our linked list takes the form

From the above figure it is clear that to delete the node pointed by ptr we should change
the link part of the node pointed by prevp tr so that it points the next node to the node pointed
by ptr. Clearly, this is achieved by executing
assign prevptr->link equal to ptr->link
On execution of the above step the linked list takes the form

The only thing we need now is to return the deleted node to the storage pool which
requires to execute
Call function freenode(ptr)
Finally, we are left with the following list.

In summary, as we have discussed earlier, it has been found that in a linked list there is
no need of shifting the elements in case of insertion and deletion, as in the case of an array. But
we have not yet seen how a linked list may be implemented. We have just discussed how differ-
ent operations on a linked list work in an abstract representation. The next section deals with
the different implementations of a linked list.

8.3 LIST IMPLEMENTATIONS


An introduction of list and different operations that are usually performed on lists are described
in the previous section in detail. This discussion ensures that when we need to store a sequence
of elements which may frequently change by inserting an element to it or by taking out an
element from it, we choose a linked list data structure. This is because a shifting of elements is
not necessary to achieve these tasks. Here we present how such a linked list may be imple-
mented. We present two different implementions namely, (a) array-based implementation and
(b) pointer-based implementation.
m LISTS m 153

Though pointer-based implementation is good enough for our purpose, for a first-time
C user it might not be very handy. So for a better visulization of the linked structure we choose
to go through the array-based implementation. The readers may skip this if they find it too
trivial for their purpose.

8.3.1 Array-based Linked-list Implementation


In the following an array is used as the storage structure to implement the linked-list data struc-
ture. We have already seen that a linked list is a sequence of nodes where a node is having two
parts in it, namely, (a) an information part and (b) a link part.
The link part of a node is used to point to the next (successor) node in the linked list. If
there is no successor node of a node, that is, it is the terminal node of the linked list, the link part
of the node is to point to no node and hold a NULL pointer. There is one more pointer head
(say) in a linked list which points to the starting node of the linked list. This suggests that a node
may be represented by a structure in C and the linked list as an array of such structures. A node
is simply an array element whose information part is used to store a list member and the link
part stores the array index of its successor node in the linked list. The link part of the terminal
node may be set to -1 (because this cannot be an array index) to indicate that there is no more
node in the list. The above discussion leads us to declare the following to implement an array-
based linked list.
tdefine SIZE .... /* Maximum number of list members possible */
struct nodetype {
char info[20]; /* Information part */
int link; /* index of successor node */
};
struct nodetype nodearray[SIZE];
In the above declaration we have considered that a list member (the information part of a
node) may be a character array (or string ) of a maximum of 20 characters. Initially our list is
having no member. So the pointer head that points to the first node of the list should have the
value -1 to indicate that it is not pointing to any node. When we require to insert a node to the
list we will need a free node which may be an array element of the nodearray. This is because
the nodear ray is a collection of empty nodes initially and hence we can think that initially this
nodearray is the storage pool from which we take nodes whenever necessary. We will also
organize this srorage pool of available free nodes as a linked list whose first node is pointed by
the pointer (index), say avail. So we need to define two pointers (index) head and
avail as
int head, avail;
We must also set head = - 1 initially to create an empty list. To organize the storage pool
nodearray of available nodes as a linked list, the nodes of the array must be linked together
one after the other. The simplest way to do this is to link the first node to the second, second
node to the third, and so on. The pointer ava i1 should have the value (zero) to point to the first
node and the link part of the last node (at the index (SIZE-1)) should have the value -1 to
indicate that there are no more nodes in the list. A pictorial representation of the above discus-
sion is shown in Fig. 8.1. We take the value of SIZE as 10 for a better understanding.
154 ■ DATA STRUCTURES USING C ■

A C function that initializes the storage pool may be of the following form which returns
the index of the first node of the pool.
initpool() /* Function to initialize the storage pool */
{
int i, avail;

avail=0;
for(i=0; i<SIZE-l; i++ )
nodearray[i].link=i+l;
nodearray[SIZE-1].link=-l;
return avail;
}
Having done this wecan now write the function fetchnode () that returns the index of
an available node from the storage pool of the available list. The returned index may be used to
store a list member. The fetchnode () function simply returns the index of the first node of the
available list if it is non empty and update the pointer(index) avai 1 to point to the next node.
The function may be written as given below.
fetchnode() /* Function to return the index of a free node if
available, otherwise returns -1 */
{
int ptr;
ptr=avail; /* Note that avail is an external variable */
if (avail==-l)
printf("Available list is empty—can't return node \n");
else
avail = nodearray[avail].link;
return ptr;
}
■ LISTS m 155

Consider that the first city name that is to be inserted to the empty list pointed by head is
Tune'. At the beginning we ask a free node by calling the fetchnode () function which will
return an index. Let this index be stored in an auxiliary variable ptr(say). The information part
of the node at the index p t r is then set to the value Tune'. Actually, this node insertion to the
empty list pointed by head is simply adding a node at the front of a list which may be coded in
C as
int ptr; /*Definition of the auxiliary variable */
ptr = fetchnode();
strcpy(nodearray[ptr].info,"Pune");
nodearray[ptr].link = head;
head = ptr;
After inserting Tune' to the list the configuration of nodearray may be depicted as
Index info link
head*•----- ►O Pune -1
•----- >-1
avail« 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 -1
Now if we want to insert 'Bangalore' to the front of the list we need to execute the same
code, except for the second statement changes to strcpy (nodearray [ptr].info, "Bangalore");
and the configuration of nodearray takes the form
Index info link
0 Pune -1
head« —*1 Bangalore 0
avail« -*2 3
3 4
4 5
5 6
6 7
7 8
9
-1
156 ■ DATA STRUCTURES USING C ■

In a similar way, if we insert now the city name 'Ahmedabad' at the front of the list the
second statement changes to
strcpy(nodearray[ptr].info,"Ahmedabad");
And now the nodearray looks like
Index info link
0 Pune -1
1 Bangalore 0
head*----- >2 Ahmedabad 1
avail*----- ^3 4
4 5
5 6
6 7
7 8
8 9
9 -1

We can easily see that in this implementation the storage structure of our linked list is an
array namely nodearray. Moreover this nodearray is holding two different lists:
(a) The list with city names whose first node is pointed by head.
(b) The list of available nodes which are used whenever necessary. The starting node of
this list is pointed by avai 1.
Now if we want to insert the city name 'Surat' at the end of the list, that is, insert after the node
pointed by prevptr which holds the index 0 (say) we take a node from the available list, store
'Surat' to the information part of the node. Clearly, the successor of this node should be the
successor of current prevptr. Moreover the successor of prevptr should be set to the node
we are inserting. A code in C language to implement this is presented below:
int ptr;
ptr = fetchnode();
strcpy(nodearray[ptr].info, "Surat");
nodearray[ptr].link=nodearray[prevptr].link;
nodearray[prevptr].link=ptr;
■ LISTS ■ 157

On execution of this sequence of instructions nodearray takes the form


Index info link
0 Pune 3
1 Bangalore 0
head*----- ►2 Ahmedabad 1
3 Surat -1
avail*----- ►4 5
5 6
6 7
7 8
8 9
9 -1

On insertion of the city name 'Chennai' after the node pointed by 1 (value of prevptr)
using the code in the previous page the nodearray becomes
Index info link
0 Pune 3
1 Bangalore 4
head*----- ►2 Ahmedabad 1
3 Surat -1
4 Chennai 0
avail*----- ►5 6
6 7
7 8
8 9
9 -1

To get this form the same set of code as above was executed except the second statement
which should be like
strcpy(nodearray[ptr].info, "Chennai");
To insert 'Mumbai' after the node pointed by 4, which should be the value of prevptr
the second statement should be changed as
strcpy(nodearray[ptr].info, "Mumbai");
and configuration of node ar r ay becomes as in Fig. 8.2.
158 ■ DATA STRUCTURES USING C ■

Index info link


0 Pune 3
1 Bangalore 4
head*----- ►2 Ahmedabad 1
3 Surat -1
4 Chennai 5
5 Mumbai 0
avail*----- ►6 7
7 8
8 9
9 -1

Fig. 8.2 Final configuration of nodearray after inserting six city names

At this point our linked list is identified by head and it contains 6 nodes each holding a
different city name. The list is formed by inserting 6 nodes in succession within an empty list.
Some were inserted at the front of the existing list while some were inserted after a specific
given node in the list. The process of insertion is shown in the form of the function insertlist().
Clearly, this function would require three arguments,
(a) pointer (index) to the first node of the existing list.
(b) pointer(index) after which the new node is to be inserted.
and (c) the value (string in this case) that should go to information part of the inserted node.
The function insertlistO for our purpose may be written as in the following, which returns
the head, that is, the pointer to the first node of the list.
insertlist(int start, int prevptr, char element[])
/* start is the pointer(index) to the first node of the list */
/* prevptr is the pointer(index) to the node after which the node
to be inserted. If prevptr is -1, it means that node is to be in
serted at the beginning of the list */
/* element holds the value of the node to insert */
{
int ptr;

ptr = fetchnode();
strcpy(nodearray[ptr].info, element);
if(prevptr == -1) /* Insert at the front of the list */
{
nodearray[ptr].link = start;
start = ptr;
■ LISTS m 159

} else { /* insert after prevptr */


nodearray[ptr].link=nodearray[prevptr].link;
nodearray[prevptr].link=ptr;
}
return start;
}
If we now try to print each of the list members one by one starting from the beginning of
the list we need to traverse through each of the nodes of the list. Given the pointer to the start
node of the list the list traversal can simply be achieved by performing the following steps.
Step i: Move to the first node of the list. This is the current node now.
Step ii: Process current node.
Step iii: Move to the next node using link part of the current node.
Step iv: Repeat Step (ii) and Step (iii) until the link part of the current node is -1 (NULL).
Strictly speaking the above steps are not really showing the algorithm for list traversal, instead
it is a rough sketch of the algorithm. A proper C function that prints the list members (city
names) of the list whose first node is pointed by head is presented in the function traverse_list().
This function requires a single argument which is the pointer to the first node of the list.

traverse_list(int head)
{
int current;

current = head;
while (current!= -1)
{
printf("%s \n", nodearray[current].info);
current = nodearray[current].link;
}
return;
}
Now it is the turn of the deletion operation to consider. To delete a node from the list we
need to know the pointer to its previous node. That is, we can delete a node from a list when we
know the pointer to the node which preceds the node to be deleted. Considering the present
configuration of nodearray as in fig. 8.2 let us try to delete the node after the node which is
pointed by the index 2 i.e., the value of the pointer prevptr (say) is 2. This means that we want
to delete the node with city name 'Bangalore' from the list pointed by head. A code in C to
delete the node may be written as in the following:
int ptr ; /* An auxiliary pointer variable */
ptr = nodearray[prevptrJ.link ;
nodearray[prevptr].link = nodearray[ptr].link;
160 ■ DATA STRUCTURES USING C ■

The first instruction of the above code simply stores the link part of node at p revp tr within the
variable ptr. Then simply sets the link part of this node by the link part of the node pointed by
ptr, that is, by the link part of the node to be deleted. Now nodearray changes to
Index info link
0 Pune 3
1 Bangalore 4
head*----- ►2 Ahmedabad 4
3 Surat -1
4 Chennai 5
5 Mumbai 0
avail*----- ►6 7
7 8
8 9
9 -1
This configuration of nodearray shows that the first node of the list pointed by head
holds the city name 'Ahmedabad', its next node is at 4 which holds 'Chennai'. Next to it is the
node at 5 holding the city name 'Mumbai', then the node at 0 which contains 'Pune' and then the
final node at 3 (it is final node, because its link part shows -1) which holds 'Surat'. Clearly, the
node with city name 'Bangalore' is deleted from the list pointed by head. The area of deleted
node is not useable now. It is possible to reuse the area of the deleted node if and only if this
node is returned to the storage pool of available nodes. This can simply be done by inserting the
deleted node to the front of the list of available nodes. Let this task be achieved by calling a
function freenode () that receives the pointer to the (deleted) node which is to be freed. So the
call to the function as
freenode(ptr);
is to be done after executing the above two instructions to complete the deletion task properly.
Now our nodearray takes the form
Index info link
0 Pune 3
->1 Bangalore 6
-+2 Ahmedabad 4
3 Surat -1
4 Chennai 5
5 Mumbai 0
6 7
7 8
8 9
9 -1
■ LISTS ■ 161

This configuration of nodearray shows any call to the function fetchnode () will
return the node at index 1 which is a deleted node and hence the deleted node is reuseable now.
Note that though physically the node at index 1 holds the city name 'Bangalore' it is not treated
so and considered as a junk value and clearly not relevant.
Let us now delete a node which is at the front of the list pointed by head. Clearly, we do
not have any preceeding node of the node to be deleted. Such a situation can be handled very
simply by executing the following C code. The code below also returns the deleted node to the
storage pool of available nodes.
int ptr; /* Auxiliary pointer */
ptr m head; /* Store the current head to ptr */
head = nodearray[ptr].link ; /* Change head to point to
the next node*/
freenode(ptr); /* Send back deleted node to storage pool of
available nodes */
On execution of the above code our nodearray will take the form

Index info link


0 Pune 3
1 Bangalore 6
avail*----- ►2 Ahmedabad 1
3 Surat -1
head*----- ►4 Chennai 5
5 Mumbai 0
6 7
7 8
8 9
9 -1

The function deletelistO is presented below. It returns pointer to the starting node
of the list from which a node is deleted. The function requires two arguments, namely, (a) pointer
(index) to the first node of the list from which the node to be deleted and (b) pointer to the node
that precedes the node to be deleted.
Note that if no node precedes the node to be deleted, that is, if the first node is to be
deleted then it is indicated to the function by passing the second argument as -1. The function
will appear as given below:

deletelist (int start, int prevptr)


/* start is the pointer(index) to the start node of the list */
/* prevptr is the-pointer(index) to the preceeding node to the
node to delete, it is -1 when the first node is to be deleted */
{
i nt ptr ;
162 ■ DATA STRUCTURES USING C ■

if (p r e v p tr == - 1 )
{
p tr = s t a r t ;
s t a r t = n o d e a rra y [p tr]. lin k ;
} e ls e {
p tr = n o d e a rra y [p re v p tr]. lin k ;
n o d e a r r a y [ p r e v p t r ] .l i n k = n o d e a r r a y [ p t r ] . l i n k ;
}
fre e n o d e (p tr);
re tu rn s t a r t ;
}
The function f re e n o d e () requires two arguments also. They are
(a) pointer to the first node of the list of available nodes; and
(b) the pointer to the free node that is to be attached to the front of the list of available
nodes.
The function freenode () returns the pointer to the beginning of the list of available
nodes and is shown below

fr e e n o d e (in t a v a i l, in t p tr)
/* a v a i l i s th e p o i n t e r ( in d e x ) t o th e f i r s t node o f th e l i s t o f
a v a i l a b l e nod es */
/* p t r i s th e p o i n t e r ( in d e x ) to th e f r e e d */
{
n o d e a rra y [p tr]. lin k = a v a il;
a v a il = p t r ;
re tu rn a v a i l ;
}

8.4 APPLICATION OF LINKED LIST (ARRAY BASED IMPLEMENTATION)


As an application to linked list we consider the problem of creating an index of a given docu-
ment stored in an ASCII file. This index is an ordered list of all distinct words within the docu-
ment. It may so happen that the document contains the same word at different places but its
index will have the word only once.
One trivial solution to achieve this is to create a link list of words as described in the
following. First create an empty list. Read the first word from the document. Insert the word
into the list. Read the next word. If this is the same word simply ignore and read the next word.
If this word is smaller (according to alphabetical order) than the previous one, insert the word
at the front of the list. Otherwise if it is larger (according to alphabetical order) than the previ-
ous one, insert the word after it in the list. Read the next word from the document. Search the
existing list from the beginning of the list to find whether the word is already in the list. If so,
■ LISTS ■ 163

read the next word and repeat the search, otherwise, find after which node this word is to be
inserted within the list and then insert it at its proper place. Continue reading and searching till
the end of the document and finally print the list. Clearly, the above method is not efficient
because each time we read a word we need to search the list from the beginning and on an
average the number of comparisons is half the current length of the list. Evidently the list will
become lengthy within a short span of time because we are continuously adding nodes to the
list whenever we find a new word.
In order to reduce the search time we can split the list into a number of smaller lists.
For this particular problem, it is quite natural to use twenty six (26) lists, one for each alphabet
that is, we may construct one list for all words starting with each different alphabet (letter). In
other words, there will be a list for all words starting with 'A', a list for all words starting with
the letter 'B', and so on. A program that constructs an index from a stored document (ASCII
file) is given in Example 8.1. The data structure used is linear linked list while the storage struc-
ture is array- based.
Example 8.1

/* Index construction of a document (Array based) */


#include <stdio.h>
#define MAXWORD 15
#define POOLSIZE 1000
#define Null -1
#define TRUE 1
#define FALSE 0

typedef struct nodetag {


char info[MAXWORD];
int link;
} nodetype;

int free;
nodetype node[POOLSIZE];

mai n ()
{
int i , predptr, list[26];
char word[MAXWORD];

initpool();
for (i=0; i<26; i++)
createlist(&list[i]);
while(getword(word)) /* returns word with len */
i f (1 search(list[word [0]-'A' ] , word, &predptr) )
164 ■ DATA STRUCTURES USING C ■

insertlist(&list[word[0]-'A'],word,predptr);
for(i=0; i<26; i + +)
i f (lemptylist(list[i]))
traverselist(list[i]);
return;
}

initpool()
{
int i;

f or(i=0; i<POOLSIZE; i++) .


node[i].link=i+l;
node[P00LSIZE-1].link = Null;
free = 0;
return;
}

getnode()
{
int ptr;

ptr = free;
if (free != Null)
free = node[free].link;
else {
printf("No available nodes\n");
printf("Storage pool is empty\n");
ptr = Null;
}
return ptr;
}
createlist(int *ptrlist)
{
*ptrlist = Null;
return;
}
■ LISTS m 165

emptylist(int list)
{
return(list == Null);
}

traverselist(int list)
{
int curptr;

curptr = list;
while (curptr != Null)
{
printf("%s\t", node[curptr].info);
curptr = node [curptr].link;
}
putchar('\n');
return;
}

search(int list, char item[], int *predptr)


{
int curptr, found;

curptr = list;
*predptr = Null;
while (curptr != Null)
if((found = strcmp(node[curptr].info,item) ) >= 0)
break;
else {
*predptr=curptr;
curptr = node[curptr].link;
}
return ((found = = 0 ) ? TRUE : FALSE);
}

insertlist(int *ptrlist, char item[], int pred)


{
int temp;
166 ■ DATA STRUCTURES USING C ■

temp = getnode();
strcpy (node[temp].info,item);
if (pred == Null)
{
node[temp].link=*ptrlist;
*ptrlist=temp;
} else {
node[temp].link=node[pred].link;
node[pred].link=temp;
}
return;
}

getword (char s [])


{
int i = 0
char c;

while(!isalpha(c=getchar()))
if (c == EOF)
return i;
do {
s[i++]=c;
c = getchar();
} while(isalpha(c));
s [i ]=' \0';
return i;
}
For the document
TWINKLE TWINKLE LITTLE STAR
HOW I WONDER WHAT YOU ARE
UP ABOVE THE WORLD SO HIGH
LIKE A DIAMOND IN THE SKY
WHEN THE BLAZING SUN IS GONE
WHEN THERE NOTHING SHINES UPON
THEN YOU SHOW YOUR LITTLE LIGHT
TWINKLE TWINKLE ALL THE NIGHT
the above program creates an array of (twenty six) linked lists which may be picturized in
Fig. 8.3.
m LISTS m 167

list[0] -►I A ~ R -------- H ABOVE IH------H ALL -H ARE

list[l] - H BLAZING \ j\

list[2]

list[3] - Hd i a m o n d |j ^|

list[4]

list[5]

list[6] - H GONE lj>l

list[7] -H HIGH -H HOW

list[8] -H i H— H in" -H is

list[9]

list[10]
1
list[ll] -H LIGHT -H LIKE - H LITTLE '

list[12]

list[13] NIGHT |H-------- H NOTHING TjH

list[14]

list[15]
1
list[16]

list[17]

list[18] - » 1 SHINES H -------- H SHOW 1-I— H sky M ---------H SO H ---------H STAR -WSUN

list[19] -H THE ] ---------H THEN H ------H THERE R ---------H TWINKLE \ J\

list[20] -H up -W UPON

list[21]

list[22] WHAT R ---------H WHEN R ------H WONDER R ---------H WORLD Tjl

list[23]

list[24]

list[25]

Fig. 8.3 Linked lists created by the program in listing 8.1


168 ■ DATA STRUCTURES USING C ■

The output of the program for the given document is the following list of words.
A ABOVE ALL ARE
BLAZING
DIAMOND
GONE
HIGH HOW
I IN IS
LIGHT LIKE LITTLE
NIGHT NOTHING
SHINES SHOW SKY SO STAR SUN
THE THEN THERE TWINKLE
UP UPON
WHAT WHEN WONDER WORLD
YOU YOUR

8.5 POINTER BASED IMPLEMENTATION OF LINKED LISTS


As an about data structure the definition of a list does not impose any limit to the number of
elements in a list. Furthermore, the size of an array is fixed at compile time and it is not possible
to change this size during program execution. So an implementation of list that uses an array as
the fundamental storage structure is not fool proof because it poses srestrictions on the number
of elements in a list. A loyal implementation would require the capability of allocating and
deallocating storage area for the nodes of the list dynamically at the time of program execution.
C language provides functions like malloc () and cal loc () to allocate a memory area dy-
namically. The function free () can be used to deallocate a memory area which is allocated
dynamically in C. In the following we will see a pointer based implementation of lists that uses
pointers in conjunction with these dynamic memory allocation /deallocation functions. We know
that a linked list is a sequence of nodes where each node is having two parts in it. The first part
is the information part while the second part of a node is a link part which is used to point to the
successor node of the list if it exists, otherwise their link part should point to no node indicating
that it is the terminal node of the list.
This suggests that a node in C may be implemented by using a self-referential structure that has
a member that points to a structure of the same type. So to implement a node we may declare
typedef struct list_node{
char info[20]; /* information part */
struct list_node *link; /* link part */
} nodetype;
Now if head is the pointer to the starting node of the list we may define
nodetype *head;
to indicate that head is a pointer to a node of type nodetype. As in Section 8.3 we consider a list
member (information part of a node) as an array of characters of a maximum of 20 characters.
Initially our list does not have any member in it.
■ LISTS m 169

So to create an empty list we just set


head=NULL;
To insert a node to this empty list pointed by head with city name 'Pune' we require a
storage area which may be used as the node. In order to do this we need to call the function
malloc () which is provided in C library. This function malloc () when called, allocates a
specified number of bytes at the execution time. At this stage we need to take the help of the
s i z e o f operator in C also.
For example, if we need to allocate the storage for an integer value and set a pointer to
this address we do the following:
int *pi;
pi=(int *)malloc(size(int)) ;
s i z e o f ( i n t ) gives us the number of bytes required to store an integer value which is the
argument of the function mal lo c (). Now a call to mal lo c () returns a pointer (the memory
address) where we can store an integer. This returned pointer is now casted as an integer pointer
by using (int *).
So for our purpose if we want to allocate the storage of a node of our linked list we need
to do the following:
nodetype *ptr;
ptr=(nodetype * )malloc(sizeof(nodetype));
Clearly, a node insertion to an empty list is simply adding a node at the front of the list.
Consider an auxiliary array 'city' which holds the city name 'Pune' which is to be in-
serted, that is, we have an array 'ci ty' which is defined as
char city[20];
and it holds the string 'Pune'.
So to insert a node with city name 'Pune' we may use the following C code.
nodetype *ptr;
ptr=(nodetype *) malloc(sizeof(nodetype));
strcpy(ptr->info,city);
ptr->link = head;
head = ptr;
After executing this code if c i ty array holds the string 'Bangalore' and we re-execute the
above code, another node with cityname 'Bangalore' will be inserted as front of the list. Simi-
larly, putting the cityname 'Ahmedabad' in the array city and re-executing the same code we
get a list like the following picture representation.

head*— ►Ahmedabad •----- ► Bangalore Pune

The above code is used to insert a node at the front of a given list (list is given when its
head is given). But we may want to insert a node within a given list after some given node also.
For example, we may want to insert a node with the city name 'Surat' after the node with cityname
'Pune' which is pointed by the pointer say prevp tr. That is, we have with us in the value of the
pointers head and prevptr. We also have an array c ity filled with the string 'Surat'.
170 ■ DATA STRUCTURES USING C ■

Our objective is to insert a node, with information part stored in the city array, into the
list pointed by head after the node pointed by prevptr. To achieve this a C code may be
written as
nodetype *ptr;
ptr=(nodetype *)malloc(sizeof(nodetype));
strcpy(ptr->info,city);
ptr->link=prevptr->link;
preveptr->link=ptr;
On execution of this code the list takes the form

h ead « Ahmedabad Bangalore Pune

It prevptr points to the node with cityname 'Banglore' and the city array holds the
string 'Chennai', the execution of the above code will change the linked list as

head < Ahmedabad Bangalore Chennai Pune Surat

Thus we have seen that we can either insert a node at the beginning of a given list or
insert a node after a given node within a list. This insertion process is shown in the form of a
C function insertnode ().
Obviously, this function would require three arguments, namely
(a) a pointer to the starting node of the existing list say start;
(b) a pointer to the node after which the new node to be inserted say prevptr. The new
node to be inserted at the front of the list is indicated by setting the value of prevptr as
a NULL pointer, before calling the function;
(c) the information part of the new node.
The function should return the head, the pointer to the first node of the list, because
insertion of a node at the front of a list changes the head of the list.
The function insertnode () is presented below:

nodetype *insertnode (node type *start, nodetype *prevptr, char element [] )


/* start is the pointer to head of the list */
/* prevptr is the pointer to the predecessor of the node to insert
*/
/* element array holds the information part of the new node */
{
nodetype *ptr;

ptr = (nodetype *) malloc(sizeof(nodetype));


strcpy(ptr->info, element);
if(prevptr = = NULL)
{ /*insert at the front of the list*/
■ LISTS m 171

ptr->link = start;
srart = ptr;
} else {
/* insert after node pointed by prevptr */
ptr->link = prevptr->link;
prevptr->link = ptr;
}
return ptr;
}
Now to print all the list members of a list we need to traverse through all the nodes of the
list from the first node to the last node of the list. A simple list traversal algorithm may be
presented as the following pseudo code.

Set current node = head of the list;


while(current node != NULL)

process current node;


move to the next node through the link part of the current node
and call it as current node;

A list traversal function traverse_list () is presented below which prints all our list
members. The function receives the list head as its argument and returns nothing.

traverse_list(nodetype *head)

nodetype ^current;
current = head;
while (current != NULL)
{
printf("%s \n", current->info);
current = current->link;
}
return;

Like the insert operation the deletion of a node from a list is either of the following two
types:
(i) We may delete a node from the beginning of a given list, that is, the head of the list is to
be deleted. After deletion the next to head will be the new head of the existing list.
(ii) We may delete a node which is situated after a given node.
172 ■ DATA STRUCTURES USING C ■

As in the array implementation, the pointer implementation to delete a node from the
front of list pointed by head may be written as
nodetype *ptr; /* Auxiliary pointer variable */
ptr = head; /* Save current head to ptr */
head = ptr->link; /* Change head to point to next node */
free(ptr); /* Deallocate the allocated memory area pointed by
ptr */
Consider now that we want to delete a node from a list, whose front node is pointed by
head, which is situated after the node pointed by prevptr in the list as in the figure given
below.
1 2 3 4 5

prevptr

In the above figure there are five nodes in the list and they are numbered. The pointer
prevptr points to the node number 3. We want to delete the node numbered 4 which appears
just after the node pointed by prevptr. The following C code segment may be used to achieve
the above task.
nodetype *ptr; /* Auxiliary pointer variable */
ptr=prevptr->link; /* Save the pointer to the node to delete in ptr
*/
prevptr->link = ptr->link ; /* Set the next node to prevptr as the
next node to the node to delete */
free(ptr); /* Deallocate the allocated memory area pointed by ptr */
Combining the above two C segments the function del etenode () to delete a node from
a list may be written as presented below. The function receives two arguments, one of which is
the list head and the other one is a pointer to the previous node of the node to be deleted. This
pointer is NULL, if the node is to be removed from the beginning of the list. The function
returns a pointer to the head of the list.
nodetype *deletenode(nodetype *start, nodetype *prevptr)
/* start is the pointer to the list head */
/* prevptr is the pointer to the preceding node to the node to delete */
{
nodetype *ptr;
if (prevptr == NULL)
{
/* delete from front of the list */
ptr = start;
start = ptr->link;
} else { /*delete the successor of the node pointed by prevptr */
ptr = prevptr->link;
■ LISTS m 173

prevptr->link = ptr->link;
}
free(ptr);
return start;
}

8.6 APPLICATION OF LINKED LIST ( POINTER BASED IMPLEMENTATION)


In Section 8.4 we considered the problem of creating the word index of a stored document
(ASCII file). The data structure that was used to store the word index was an array of linked list
of size 26, one for each letter of eiTglish alphabet. As the implementation of linked lists was
array-based, the storage pool was implemented as an array of structure which was maintained
by the programmer. Moreover, for this array implementation the size of storage pool became
fixed. Clearly, this puts a restriction on the size of the linked lists which should not be.
In the pointer based implementation of linked lists, the storage pool is created by the
system and its size is limited by the available memory size. The discussion in section 8.5 sug-
gests that dynamic memory allocation/deallocation functions may be used to handle this stor-
age pool efficiently instead of the user defined functions fetchnode () and freenode ().
Some other user defined functions like ini tpool () will also not be of use. A pointer based
program that constructs a word index from a stored document (ASCII file ) is presented in
Example 8.2. The input and output pattern of this program is identical to the program given in
Example 8.1.
Example 8.2
#include <stdio.h>
#include <ctype.h>
#define MAXWARD 15
#define TRUE 1
#define FALSE 0

typedef struct node {


char info[MAXWORD];
struct node *link;
} nodetype;
typedef nodetype *nodeptr;

main()
{
nodeptr list[26], prevptr;
char word[MAXWORD];
int i;
174 ■ DATA STRUCTURES USING C ■

for(i=0; i<26; i + +)
createlist(list + i);
while(getword(word)) /* returns word with len */
i f (!search(list[word[0]-'A'], word, &prevptr))
insertnode(&list[word[0]-'A ' ], word, prevptr);
for(i=0; i<26; i + + )
i f (iemptylist(list[i]))
traverselist(list[i]);
return;
}

createlist(nodeptr *ptrlist)
{
*ptrlist = NULL;
return;
}

emptylist(nodeptr list)
{
return(list == NULL);
}

traverselist(nodeptr list)
{
nodeptr curptr;

curptr = list;
while (curptr!=NULL)
{
printf("%s\t", curptr->info);
curptr = curptr->link;
}
putchar('\n');
return;
}

search(nodeptr head, char item[], nodeptr *prevptr)


■ LISTS m 175

{
nodeptr curptr;
int found;

curptr = head;
*prevptr = NULL;
while(curptr!=NULL)
i f ((found=strcmp(curptr->info)) >= 0)
break;
else {
*prevptr = curptr;
curptr = curptr->link;
}
return ((found == 0)? TRUE : FALSE);
}

insertnode(nodeptr *headptr, char item[]7 nodeptr prevptr)


{
nodeptr temp;

temp = (nodeptr)malloc(sizeof(nodetype));
strcpy(temp->info, item);
if (prevptr == NULL)
{
temp->link = *headptr;
*headptr = temp;
} else {
temp->link = prevptr~>link;
prevptr->link = temp;
}
return;
}

getword(char s [])
{
i n t c, i =0;
176 ■ DATA STRUCTURES USING C ■

while(!isalpha(c=getchar()))
if (c == EOF)
return i;
do {
s[i++]=c;
c=getchar();
} while (isalpha(c));
s [i] = '\0 #;
return i;
}

E iX iE iR s C iliS iE iS
1. What are the main advantages of linked lists over arrays in representing a group of items?
2. Write a program to reverse a linked list so that the last element becomes the first one, the
last but one becomes the second element and so on.
3. Write a program to find the sum of integers in a singly linked list.
4. Develop a line-oriented text editor that assigns a number to each line of text and maintains
the lines in a linked list by line number order.
5. Write a program that takes as input a polynomial as (coefficient, exponent) pairs, in any
order. The polynomial is stored in a linked list of (coefficient, exponent) pairs. Write a pro-
gram to print the polynomials in descending order.
6. Write a function binary_search that accepts two parameters, an array of pointers to a group
of stored numbers, and a single number. The function should use binary search to return a
pointer to the single number if it is in the group. If the number is not present in the group,
return the value NULL.
7. Write a linked list program that cyclically permutes the elements of a given sequence. For
example, if the response is
3 4 2 1 5
the program prints
5 3 4 2 1
8. Write a function that removes the first node in a linked list with a given value.
9. Write a function that takes a pointer to a linked list and reverse the order of the nodes by
simply altering the pointers.
If the original list where
5 3 7 1
the function will return
1 7 3 5
m LISTS m 177

10. Write a function multiply(p, q) to multiply two long positive integers represented by singly
linked lists.
11. Write a C program to split a linked list into two lists, in such a manner that the first linked
list contains the odd numbered nodes and second linked list contains the even numbered.
12. A palindrome is some word /line that reads the same forwards or backwards.
Given a linked list of words, write a C function to create a palindrome list from it by concat-
enating its reverse list to the given list.
9
LINKED LISTS—VARIANTS

In the previous chapter we considered linear linked lists. They are linear in their structure in the
sense that list elements must be processed in a sequential manner from first to last. This is
because the linked lists that we have seen have a head node which is directly accessible and
each node contains an information part together with a link part that allow to move to its
successor node, if it exists.
But in some cases, it may be convenient to use different kinds of accesses. In fact, other
kinds of links between the nodes may also be helpful. In this chapter we consider some of these
variants of linked lists such as linked stacks, linked queues, lists with fixed head, circular linked
list, and so on.

9.1 LINKED STACKS


An array-based implementation of stacks that we have seen in chapter 6 loses its generality
because the fixed size of the array puts a limitation on the size or length of the stack list. A
pointer-based implementation of stacks will not impose such limitation on the stack size.
We know that a stack is nothing but a list such that its elements can be accessed only at
one end which is known as stack top, that is, insertion (push) and deletion (pop) operations are
done at the top of this stack. We have also seen that a linked list is accessible only through the
head (start node) of the list. Clearly, if we create and maintain a linked list in such a way that all
the elements that are inserted to the list are the front of the list and removal is also from the
beginning of the list then it will follow the conditions of a stack. Hence, to implement a stack
using pointers we may declare as below.
typedef struct stack__node{
char info[20]; /* Information part */
struct stack_node *link; /* Link part*/
} stacktype;
stacktype *stack;
To define the abstract data type of stack we need to talk about the operations on stack
also, which is already discussed at the time of array implementation.
Creating an empty stack may simply be written as

create(stacktype **stackptr);
{
*stackptr = (stacktype *) NULL;
return;
}
■ LINKED LISTS—VARIANTS ■ 179

The function to check whether a stack is empty, can now be written as


e m p ty (s ta c k ty p e * s t a c k )
{
return(stack==NULL);
}
The push operation for stack is simply the insertion operation which is restricted to the
beginning of the linked list. So the function push () is the simplified form of the function
i n s e r t n o d e () of Section 8.5. The code for the function push () is presented below.
stacktype *push(stacktype *stack, char element[])
{
stacktype *ptr;

ptr=(stacktype *)malloc(sizeof(stacktype));
strcpy(ptr->info, element);
ptr->link = stack;
stack=ptr;
return stack;
}
It should be noted that this push function does not check whether the stack is full, which
was checked in the push function of the array-based stack implementation. This is because the
pointer implementation of stack does not impose any limit on the size of the stack and hence can
never be full. Of course, the stack size is limited to the available memory in this case. But the
pop function must check for the empty-stack condition in this implementation also, because the
stack may have no element at a point of time and if we try to pop a stack element at that time it
is an error condition. A code to pop () function is given below.

stacktype *pop(stacktype *stack, char element[])


{
stacktype *ptr;

if (empty(stack))
{
printf("Stack underflow !\n");
exit(l); /* Exit with error condition */
} else {
strcpy(element, stack->info);
ptr = stack;
stack = stack->link;
free(ptr);
}
}
180 ■ DATA STRUCTURES USING C ■

The linked implementation of a queue is a simple modification of that of a stack. We recall that
a queue is nothing but a list in which elements are removed only at one end called the front or
head of the queue and elements are inserted only at the other end called the rear or tail of the
queue. From the above definition it is very natural that a linked list will become a queue if the
deletion occurs at the front of the list and any insertion is done at the end of the list. The deletion
operation will then be very similar to the pop operation of the linked stack but the insertion
operation will require a list traversal to get the pointer to the last node(element) of the list after
which the insertion will take place. It is possible to avoid this list traversal if we keep the pointer
to the last node of the list to a variable. So implementation of a linked queue needs the declara-
tion of the form
typedef struct queue_node{
char info[20]; /* Information part*/
struct queue_node *link; /*Link part*/
} nodetype;
typedef struct {
nodetype *qfront, *qrear;
} queuetype;
queuetype queue;
We have already seen that the operation commonly used on a queue includes the follow-
ing.
(i) Create an empty queue.
(ii) Check whether a queue is empty.
(iii) Insert an element at the end (i.e. at the rear) of the queue.
(iv) Delete an element from the front of the queue.
The function create () to create an empty queue may be simply written as
creat.e (queuetype *qptr)
{
qptr->qfront = qptr->qrear = (nodetype *)NULL;
return;
}
The function to check if the queue is empty or not may be implemented by using the
function
emptyQ(queuetype queue)
{
return(queue.qfront==NULL);
}
The operation to insert an element at the end of the queue is basically inserting a node in
the linked list (linked queue) after the node pointed by qrear of the queue. But this needs special
attention because if the queue is an empty queue the insertion should take place at the front of
■ LINKED LISTS—VARIANTS ■ 181

the queue. The deletion operation can be implemented just like the pop () function of a linked
stack discussed in the last section. The function i n s e r t Q () to add a node to the rear of the
queue may be written as
insertQ(queuetype *Qptr, char element[])
{
nodetype *ptr;

ptr=(nodetype *)malloc(sizeof(nodetype));
strcpy(ptr->info, element);
if(empty (*Qptr))
{
Qptr->qfront = ptr;
Qptr->qrear = ptr;
}
else
{
ptr->link=Qptr->qrear->link;
Qptr->qrear->link=ptr;
Qptr->qrear = ptr;
}
return;
}
Similarly, the function d e l e t e Q () from the front of the queue may be written as
nodetype *delete(queuetype *Qptr)
{
nodetype *ptr;

if(emptyQ(*Qptr));
{
printf("Attempt to delete node from empty queue\n");
exit(1);
} else {
ptr=Qptr->qfront;
Qptr->qfront = ptr->link;
if(Qptr->qfront==NULL)
Qptr->qrear=NULL;
free(ptr);
}
182 ■ DATA STRUCTURES USING C ■

9.3 VARIANTS OF LINKED LISTS


There are other varieties of linked lists which are useful for certain applications. These variants
are nothing but modifications to standard linked list which also simplifies the basic list opera-
tions for typical applications.

9.3.1 Lists with Fixed Head Nodes


Each of the two basic list operations, insert and delete, considers two different cases.
(i) Insert or delete at the front of the list in which the head of the list changes.
(ii) Insert or delete after a given node.
The reason for considering these two cases is that the first node in a standard linked list
does not have any predecessor but all other nodes have a predecessor. Clearly, if we can make
a predecessor of the first node then characteristically it will not differ from other nodes of the
linked list. Obviously, then the algorithms for insertion and deletion will be simple. We can
achieve this by introducing a dummy head node at the beginning of a linked list which will
actually store no list element in its information part. The link part of this dummy head node will
point to the actual first node of the list and hence this dummy node serves the purpose of the
predecessor of the actual first node of the linked list. Since this dummy node is fixed, insertion
and deletion at the front of the list will occur at the node just after this dummy node, keeping the
head of the list unchanged. Obviously, every link list in this implementation should have head
node and so an empty list also must have such a node. We have already seen that a declaration
of the following form is used to implement a linked list.
typedef struct list_node {
char info[20];
struct list_node *link;
} nodetype;
nodetype *head;
In this implementation the same declaration will do. But the algorithm for the basic list
operations will be changed a little. To create an empty list, in this implementation instead of just
setting head to NULL we may use a function create () as below.

create(nodetype **head)
{
*head = (nodetype *)malloc(sizeof(nodetype));
(*head)->link = NULL;
return;
}
The call create (&head) where head is declared as a pointer to nodetype will create an
empty list as below.

head •------► —
■ LINKED LISTS—VARIANTS ■ 183

A function empty () to check if a list is empty may be written as


e m p ty (n o d ety p e * head)
{
return(head->link==NULL);
}
A linked list with fixed head node that holds the city names 'Bangalore', 'Pune', and
'Surat' will look like

head — Bangalore Pune Surat

As the dummy head node has an undefined information part, this area may be used to
store the information regarding the type of the list members. For example, the above list may be
presented as

head city_of_India •- Bangalore •-— ► Pune •- —► Surat

In such implementation every node is having a predecessor node and so the insertnode ()
function will require only two arguments.
(i) A pointer to the previous node (say prevptr) after which the node has to be inserted.
(ii) The element, that is, the information part, say e 1emen t array of the node to be inserted.
Now the function insertnode () for this implementation may be written as
insertnode(nodetype *prevptr, char element[])
{
nodetype *ptr;

ptr = (nodetype *) malloc(sizeof(nodetype));


strcpy (ptr->info, element);
ptr->link = prevptr->link;
prevptr->link = ptr;
return;
}
The above function need not return pointer to head because head is fixed in this case with
pointer to the dummy node having an undefined information part. In a similar way the function
to delete a node from a linked list with fixed head node may be written as
deletenode (nodetype *prevptr )
{
nodetype *ptr;

ptr = prevptr->link;
prevptr->link = ptr->link;
184 ■ DATA STRUCTURES USING C ■

free(ptr);
return;
}
The function traverse_list () to print all the list members will change a little, and is
presented below.
traverse_JList (nodetype *head)
{
nodetype *current;

current = head->link;
while(current != NULL)
{
printf("%s\n", current->info);
current=current->link;
}
return;
}

9.3.2 Circular Linked List


Another useful data structure is a linked list in which the link part of the last node, instead of
having NULL pointer value, points back to the head node of the list. These linked lists are
known as circular linked lists and may be depicted as in the following where cllist is the
pointer to the head node.

cllist Bangalore • > Pune -> Surat


T

It is clear that every node in a circular linked list is having a predecessor and a successor.
So, as in the case of linked lists with fixed head node, here also no special considerations are
required for insertion and deletion of nodes. An empty circular linked list may be created by
simply setting cllist to NULL and to check if a circular linked list is empty we need to check if
cllist is NULL.
A function insertcllistO for inserting an element into a circular linked list whose
head is pointed by c 11 i s t after the node pointed by prevptr is shown below.

nodetype *insertcllist(nodetype *cllist, nodetype *prevptr,


char element[])
{
nodetype *ptr;
ptr=(nodetype *)malloc(sizeof(nodetype));
strcpy(ptr->info, element);
if(empty(cllist))
■ LINKED LISTS—VARIANTS ■ 185

{
ptr->link=ptr;
cllist=ptr;
}
else
{
ptr->link=prevptr->link;
prevptr->link=ptr;
}
return cllist;
}
A single-node circular linked list points to itself, as shown in the figure below. So we
need to give special attention when the insertion takes place in a circular list with no elements.

—► Bangalore •- —

Similarly, the function deletecllistO may be written as

nodetype *deleteclist(nodetype *cllist, nodetype *prevptr)


{
nodetype *ptr;

if(empty(cllist))
{
printf("Attempt to delete node from empty list \n");
exit(1);
}
else
{
ptr = prevptr -> link ;
if(ptr == prevptr) /* Single node circular list /*
cllist = NULL;
else
prevptr->link = ptr->link;
free(ptr);
return cllist;
186 ■ DATA STRUCTURES USING C ■

Obviously, we need to give special attention to the case where we want to delete from a
single-node circular list .The function t r a v e r s e _ c 11 i s t () to traverse a circular list is a simple
modification of the function to traverse a simply linked list and is of the following form.

traverse_cllist(nodetype *cllist)
{
nodetype *current;

i f (!empty(cllist))
{
current = cllist;
do
{
printf ("%s\n#/, current -> info);
current = current->link;
} while (current!=cllist);
}
return;
}
Note that in this case do _while loop is used instead of while loop. If we implement a
queue as a circular linked list, it may be advantageous to maintain one pointer cllist to the tail
node instead of the head node. Then to maintain a queue we need only one pointer cllist (the
rear) to access both the front (cllist->link) and the rear (cllist) of the queue. Some other applica-
tion may use a circular linked list with fixed head rode to simplify the algorithms of the appli-
cation.

9.3.3 Doubly Linked Lists


So far, we were restricted to only singly linked lists in the sense that a node may have only one
link part in it which points to the successor of the node. Some applications might also need to
get the predecessor of a node frequently. In case of singly linked lists, accessing the predecessor
of a node needs a search from the beginning and hence is inefficient. Such applications may
employ another data structure which is a doubly linked list in which each node contains two
links, one to the successor node in the list and other to the predecessor node in the list as shown
below.

dllist Bangalore Pune «- Surat

These lists are also known as symetrically linked lists. It is very clear that with such lists
it is possible to move to either direction through the list by keeping only one pointer. Though
traversal in both directions is possible, it is achieved at the cost of extra space for the predeces-
sor links. As a variant of this, like in the singly linked list, we can introduce fixed head node for
doubly linked list also. This will simplify the basic operations on lists. Furthermore, for easy
access to either end of the list we can make this doubly linked list with dummy head node as a
■ LINKED LISTS—VARIANTS ■ 187

circular one also. With such modifications our doubly linked list will take the following form.

The pointer implementation of doubly linked list requires the following declarations.
typedef struct listnode{
char element[20]; /* Information part */
struct listnode *plink, slink ;
/* Predecessor & sucessor links */
} nodetype;
nodetype *dllist;
An empty doubly linked list may be created by using the function
createD(nodetype **nodeptr)
{
*nodeptr = (nodetype *) malloc(sizeof(nodetype));
(*nodeptr)->plink = (*nodeptr)->slink = (nodeptr *) NULL;
return;
}
A call to createD (&dllist) will create an empty doubly linked list which may be
depicted as

dllist

The function emptyD () to check if a doubly linked list is empty may be written as fol-
lows.

emptyD(nodetype *nodeptr)
{
return (nodeptr->plink == nodeptr);
}
In fact, we may return the value of the expression nodeptr->s 1 ink = = NULL also.
Inserting a node into a doubly linked list after the node pointed by prevptr needs a little
care. Consider the portion of such a doubly linked list as depicted below which contains the city
names 'Bangalore' and 'Surat'. Let prevptr point to the node with city name 'Bangalore' and we
need to insert the node with city name 'Chennai' after it.
Let ptr points to the node be inserted. For insertion, the order of the link adjustments is
as follows. First, we need to set the predecessor and successor links of the node to be inserted.
That is, (i) ptr->plink = prevptr; and (ii) ptr->slink = prevptr->slink;
188 ■ DATA STRUCTURES USING C ■

ptr

The above two instructions may be in either order within themselves. These are used to set the
link fields of the node we are inserting. Now the resetting of links is to be made so that the node
pointed by ptr is inserted at its proper place and these may be written as (iii) prevptr->slink =
ptr; and (iv) ptr->slink->plink = ptr;
These reset instructions may also come in either order but we must be careful that first set
instruction and then the reset instructions should be done to insert a node in doubly linked lists.
The order of these instructions are marked in the figure with dashed lines. The above discussion
leads to the following function insertdllistO to insert a node to a doubly linked list.

insertdllist(nodetype *prevptr, char element[])


/* prevptr points to node after which insertion take place */
{
nodetype *ptr;

ptr=(nodetype *) malloc(sizeof(nodetype));
strcpy(ptr->info, element);
ptr->plink = prevptr;
ptr->slink = prevptr->slink;
prevptr->slink = ptr;
ptr->slink->plink = ptr;
return;
}
The deletion operation is more simple and only requires to reset the slink part of the
predecessor of the node to delete and the plink part of its successor. The function
deletedllist () may be written as
deletedllist(nodetype *nodeptr)
{
/* nodeptr points to the node to delete */
nodeptr->plink->slink = nodeptr->slink;
nodeptr->slink->plink = nodeptr->plink;
free(nodeptr);
return;
}
■ LINKED LISTS—VARIANTS ■ 189

9.4 APPLICATIONS OF LINKED LISTS


There are several applications of linked lists. Out of these enormous applications we choose to
present two applications, which seem to be sufficient to explain the linked-list functions. First
we will show a pointer implementation of sparse polynomials which may be considered as an
application of singly linked list with fixed head node. Next we present an application of circular
doubly linked list with fixed head: large integer arithmetic.

9.4.1 Sparse Polynomial Manipulation


A polynomial in x (a single variable) is expressed as
p(x) = a0 + a:x + a2x2 + ... + anxn
where a0, a 1, a2, ..., anare constant coefficients in the polynomial. The degree of the polynomial
p(x) is n where n is the largest power of x in p(x) with nonzero an. A constant is a polynomial of
degree zero.
Clearly, the polynomial can be represented as the list of constant coefficients ( a^a^..., an).
This list may be stored in an array. But this array representation of polynomials will be
worthwhile if the value of n is not too large and most of the coefficients a. are non-zero. In fact,
for a sparse polynomial, where most of the coefficients a.s are zeros, this array representation is
highly inefficient. For example, consider the sparse polynomial
U(x) = 7 - 7 x 51 + 5x "
For storing u(x) the array implementation as shown below needs to reserve an array of 100
elements of which only three elements will be non-zero and other elements will have no use and
hence highly costly with respect to storage requirements.
u ( x) » (7,0,...,-7,0,...,5)
corresponding exponent positions —»(0,1,.. .,51,52,.. .,99)
A little understanding suggests another representation of a polynomial which uses a list
of (coefficient, exponent) pairs instead of a list of coefficients only. With this representation, our
polynomial u(x) may be identified by the following list of pairs
u(x) <=> ((7,0),(-7,51),(5,99))
The above discussion suggests that a linked implementation is more appropriate to rep-
resent a polynomial where each node in the list has two members in the information part and a
single member for the link to its successor, as depicted in the following.

coefficient exponent link

Obviously, the coefficient member of a node is always non-zero.


For such a linked implementation of polynomials, we may use the declarations of the
following form.
190 ■ DATA STRUCTURES USING C ■

typedef struct list_node{


float co-eff;
int expo;
struct list_node *link;
} nodetype;
nodeptr, *uptr,*vptr;
Here as above, uptr and vptr may be used as pointers to head nodes of two different lists,
each of them representing a different polynomial. To simplify the list algorithms we choose to
represent a polynomial by a linked list with fixed head node. Thus the polynomials
u(x) =7 - 7x51 + 5x"
and v(x) = 2 + 9x33 + 7x51 - l l x 77
may be represented as the following linked lists pointed by uptr and vptr, respectively.

uptr 0 -7 51 99

vptr 0 33 7 51 11 77

For an illustration of how to process such linked representation of polynomials we con-


sider the polynomial addition operation. We begin by creating an empty linked list with fixed
head node. Let this empty list be pointed by fptr. This linked list pointed by fptr will hold the
linked list representation of the polynomial which we get after adding u(x) and v(x). The empty
created list will look like

fptr

To create such an empty list we may execute the following statements


fptr=(nodetype *)malloc(sizeof(nodetype));
fptr->link=NULL;
which leave the coefficient and exponent part of the node (fixed head) as undefined.
As we have considered linked lists with fixed head nodes, the pointers uptr, vptr, and
fptr are fixed. So we will need three auxiliary pointers which will run through three lists. Let u,
v, and f be the three auxiliary pointers that will run through the linked lists pointed by uptr,
vptr, and fptr, respectively. At first we initialise these three auxiliary pointers as below
u = uptr->link;
v = vptr->link;
and f = fptr;
The above initialization of pointers sets u and v to point to the current nodes of uptr and
vptr to be processed, respectively, and f points to the last node of fptr. We compare the expo-
nents of the nodes pointed by u and v at every step. If the exponents are equal, then the coeffi-
■ LINKED LISTS— VARIANTS ■ 191

cient parts of these nodes are added. If this sum is nonzero, a node is created with coefficient
part as this sum, exponent part as the common exponents of the nodes pointed by u and v. This
created node is then attached at the end of the list pointed by fptr. That is, the created node is
attached to fptr after the node pointed by f, the last node. But if this sum is zero, then no node is
added to fptr, instead u and v are advanced to point to their successors.
On comparison of the exponent in the nodes pointed by u and v if we find that they are
different a node is created which is a copy of the node containing the smaller exponent with its
link part set to NULL and inserted at the end of the list pointed by fptr. Then the pointers (u or
v) which point to the node with smaller exponent and f are advanced to point the successor
node of the corresponding list.
The above process is continued till u or v becomes NULL. The changes at each step in the
lists pointed by uptr, vptr, and fptr are shown below.
Step 1:

uptr —► — •--> 7 0 #--► -7 51 •--► 5 99 i>


t /) /
u

vptr 2 0 33 51 -11 77 i
/) /

fptr
I

Step 2:

uptr 7 0 *H -7 51 5 99
Tu
vptr 33 7 51 -11 77 i\
T /) /

fptr 33
£
Step 3:

uptr 7 0 -7 51 5 99 (*
t a /
192 ■ DATA STRUCTURES USING C ■

vptr —t 2 0 9 33 7 51 -11 77
Tv
fptr —« 33
TV

Step 4:

uptr —i - — •--► 7 0 #- -7 51 •--► 5 99 T


u

vptr —i 33 51 -11 NyLL


v

fptr - i 33 -11 77
T
f
When the end of one of the lists (uptr or vptr) is reached, the remaining nodes of the other
list are attached at the end of fptr, to get the linked representation of sum polynomial (pointed
by fptr).

uptr 0 -7 51 5 99 ii NULL
/) /
t
u

vptr 33 51 -1 1

fptr 33 11 77 99

Actual C functions for such polynomial operations are not written here and are left to the
reader. We can build algorithms for other polynomial operations such as multiplication of two
polynomials and evaluation of a polynomial for a given value of x with little care.

9.4.2 Large Integer Arithmetic


We know that the size of a number (integer) that can be stored in computer memory is limited
to the word size of the system being used. In fact, the highest positive integer number that can
be stored in the n bit word is (2n_1 - 1) in 2's complement representation. Clearly, to store and
■ LINKED LISTS— VARIANTS ■ 193

manipulate positive integers which are larger than this size is not possible to do in a straightfor-
ward manner. Rather, first we need to choose a data structure that can be used to represent a
large integer.
It seems very natural to choose a linked list as the data structure to represent a large
integer, because there is no upper bound on the number of digits in the integer. For simplicity,
we consider large positive integers only. The processing of such linked list representation of
large positive integers will require a frequent back and forth movement through the linked list.
So a circular doubly linked list representation would be a better choice in this case. To make the
list operations simpler we choose the circular linked list with fixed head nodes to represent a
large positive integer. Each node in this linked list will store a three digit integer (say) except
possibly the first node, which corresponds to a group of three consecutive digits in the large
positive integer. The first node may contain an integer which might have less than three digits
also. For example, the positive integer 72,165, 834, 982 is represented as the following circular
linked list with fixed head pointed by operand as

To create this doubly linked list for the above integer the input should be in a group of
three digits separated by blanks as shown below.
72 165 834 982
We need a declaration of the form given below to create such lists.
typedef struct list_node{
int info;
struct list_node *plink, *siink;
} nodetype;
nodetype *nodeptr;
To create a circular doubly linked list with fixed head pointed by nodeptr we first create
a node of nodetype pointed by nodeptr as follows.
nodeptr = (nodetype *) malloc(sizeof(nodetype));
nodeptr->plink = nodeptr->slink = nodeptr;
This may be depicted as

The first part of the large integer (in this case 72) is read and stored it to variable num
(say). A new node is then created whose info member is holding the value of num.
This may be done by
ptr=(nodetype *)malloc(sizeof(nodetype));
ptr->info = num;
194 ■ DATA STRUCTURES USING C ■

ptr is the pointer to this created node. Now this node is inserted as the last node of the circular
doubly linked list pointed by nodeptr. This may be achieved by executing the following code:
ptr->plink = nodeptr->plink;
ptr->slink = nodeptr;
nodeptr->plink->slink = ptr;
nodeptr->plink = ptr;
The following figure shows the state of the list.

nodeptr

The same process is repeated by reading the next part of the large integer and is contin-
ued until we finish with all the parts of the integer that has come in the input. A function to read
such a long integer forming a circular doubly linked list with fixed head is given below. This
function readint () returns a pointer to nodetype which is the pointer to the head of the linked
list representing the large integer.

nodetype *readint()
{
nodetype *nodeptr,*ptr;
int num;
nodeptr = (nodetype *)malloc(sizeof(nodetype));
nodeptr->plink = nodeptr->slink = nodeptr;
while(scanf("%d", &num) != EOF)
{
ptr = (nodetype *)malloc(sizeof(nodetype));
ptr->info = num;
ptr->plink = nodeptr->plink;
ptr->slink = nodeptr;
nodeptr->plink->slink = ptr;
nodeptr->plink = ptr;
}
return nodeptr;
}
Consider the following circular doubly linked lists with head nodes pointed by operandl
and operand2 representing two large integers 2, 583, 647 and 72,165, 834, 982.
■ LINKED LISTS—VARIANTS ■ 195

To add the integers represented by the above doubly linked lists we traverse the lists
from right to left (starting from the end of the list), add the two three digit integers in the corre-
sponding nodes and carry digit drawn from the previous node processed to get the sum and
carry digit. We then create a node to store this sum and insert it at the front of another circular
doubly linked list with fixed head that will represent the total of the two given large integers.
For a better understanding we present a C program in the following example 9.1 that
adds and prints two large integers given as input. The integers are represented as a doubly
linked list.
Example 9.1

#include <stdio.h>
#define SIZE 1000
#define LEN 3
typedef struct list_riode{
int info;
struct list_node *plink, *slink;
} nodetype;
nodetype *operandl, *operand2, *total, *readint(), *addnodeint();
main()
{
printf ("Enter two integers in group of %d \n",LEN);
printf ("\nseparating each group by space :\n");
printf ("First integer :");
operandl = readint();
print__dlist (operandl) ;
printf ("Second integer :");
operand2 = readint();
print_dlist(operand2);
total = addnodeint(operandl, operand2);
printf("Sum of these two integers = ");
print_dlist(total);
return;
}
nodetype *readint()
196 ■ DATA STRUCTURES USING C ■

nodetype *nodeptr, *ptr;


int num;
/* function to read from stdin a big integer in groups of LEN
number of digits. A circular doubly linked list with fixed
head node is formed with these integer values. The pointer
to the head node is returned */
nodeptr = (nodetype *)malloc(sizeof(nodetype));
nodeptr->plink = nodeptr->slink = nodeptr;
while (tokenise(&num) != EOF)
{
ptr = (nodetype *)malloc(sizeof(nodetype));
ptr->info = num;
ptr->plink = nodeptr->plink;
ptr->slink = nodeptr;
nodeptr->plink->slink = ptr;
nodeptr->plink = ptr;
}
return nodeptr;
}
tokenise(int *pi)
{
static char ibuf[50];
static int i=0, cc=0;
int c,n=0;
if (cc == 0)
{
while ((c=getchar()) != 'An')
ibuf[cc++] = c;
ibuf[cc] = '\n';
}
if (i == cc)
{
i = cc = 0;
return -1;
}
while(!isdigit(ibuf[i]))
i ++ ;
do {
■ LINKED LISTS—VARIANTS ■ 197

n = n*10 + ibuf[i] - 'O';


} whi l e( isdigit(ibuf[++i]) );
*pi = n;
return 0;
}

print_dlist(nodetype *nodeptr)
{
/* function to traverse the circular doubly linked list with fixed
head node pointed by nodeptr and prints the information contents of
all the nodes */
nodetype *ptr;

ptr=nodeptr->slink;
while (ptr !=nodeptr)
{
printf("%d,", ptr->info);
ptr = ptr->slink;
}
putchar ('\n');
return;
}

nodetype *addnoteint(nodetype *ptrl, nodetype *ptr2)


nodetype *ptrl, *ptr2;
{
/* function to add two integers represented in circular doubly
linked list with fixed head node pointed by ptrl & ptr2 and store
the resulting integers into another identical type of list pointed
by the returned value of the function */

nodetype * tempi, *temp2, *total, *ptr, *headptr;


int sum, carry = 0;

templ=ptrl->plink; /* pointer to the last node of 1st list */


temp2=ptr2->plink; /* pointer to the last node of 2nd list */
total = (nodetype *)malloc(sizeof(nodetype));
total->plink = total->slink = total; /* Set initial list */
while ( tempi != ptrl && temp2 != ptr2 )
198 ■ DATA STRUCTURES USING C ■

{
/* add integers in nodes pointed by tempi & temp2 */
sum = templ->info + temp2->info + carry;
carry = sum / SIZE;
attach (total, sum%SIZE);
tempi = tempi -> piink;
temp2 = temp2 -> piink;
}
ptr = (tempi == ptrl) ? temp2 : tempi;
headptr = (ptr == tempi) ? ptrl : ptr2; /* select list to
continue with */
while (ptr != headptr)
{
sum = ptr ->info + carry;
carry = sum / SIZE;
attach (total, sum%SIZE);
ptr = ptr->plink;
}
if (carry)
attach (total, carry);
return total;
}

attach (nodetype *ptr, int element)


{
/* function to insert a node with element as information part at
the front of the circular doubly linked list with fixed head node
pointed by ptr */

nodetype *temp;

temp = (nodetype *)malloc(sizeof(nodetype));


temp->info = element;
temp->slink = ptr->slink;
temp->plink = ptr;
ptr->slink->plink = temp;
ptr->slink = temp;
return;
}
■ LINKED LISTS—VARIANTS ■ 199

__________________ E i X E i R i C l i S i E S__________________
1. A doubly linked list is a list in which each element contains a pointer to the previous ele-
ment as well as to the next element in the list. There is also a pointer head to the leftmost
element in the list, and a pointer tail to the rightmost element. Both head->prev and
tail->next are set to NULL. Write a C program that creates, destroys and prints such a list.
2. Given a doubly linked list, write a function that removes a node from the list and inserts it
in the front.
3. Write a program that converts a linear singly linked list into a circular linked list.
4. Write functions to perform each of the following operations for circular lists.
(a) Append an element to the end of the list.
(b) Delete the last element from the list.
5. Write algorithms and C routines to perform each of the following operations for doubly
linked circular lists.
(a) Concatenate two lists.
(b) Delete the nth element from a list.
(c) Delete the last element from a list.
(d) Make a second copy of the list.
6. Write a C function mult(x, y) to multiply two long integers represented by doubly linked
circular lists.
7. Write a routine to merge two circular lists A and B to produce a resultant list C. You need
not create a new list; the nodes of the old lists should now appear in the concatenated list.
8. Write a function for a doubly linked circular list which reverses the direction of the links.
9. Using the doubly linked list structure, write a routine back(n), which moves you backward
by n nodes in the list.
10. How can a polynomial in three variables (x, y & z) be represented by a circular list? Write
functions to do the following:
(a) Add two such polynomials.
(b) Multiply two such polynomials.
SORTING

10.1 INTRODUCTION
Sorting is a fundamental operation in computer science. A good deal of effort in computing is
related to making data available to users in some particular order. The concept of an ordered set
of data is one that has considerable impact on our daily lives. For example, lists of names are
frequently printed in alphabetical order and mailing labels are often printed in pin-code order.
Sorting refers to arranging data in some given order, such as increasing or decreasing, with
numeric data or alphabetically, with character data.
In this chapter we are concerned with rearranging the data so that it is in sorted order.
There are two important and largely disjoint categories related to sorting data—internal sorting
and external sorting. Internal sorting takes place in the main memory of a computer, where we
can use the random access capability of the main memory to take advantage in various ways.
External sorting is necessary when the data to be sorted is too large to fit in the main memory.
Many different sorting algorithms have been invented, and we will describe some of the
common sorting techniques and the advantages and disadvantages of one technique over the
other.

10.2 SORTINGTECHNIQUES f l H H H H H H H H I I
10.2.1 Insertion Sort
The sorting method that we shall consider first is called 'insertion sort'. Insertion sort works the
way we might put a hand of cards in order. The hand is scanned for the first card that is lower
than the one to the left. When such a case is found, the smaller card is picked out and moved to
or inserted at the correct location. Fig. 10.1 illustrates the proceess for five elements, each of
which is an interger. The input data, an array of five integers, is shown in Fig. 10.1(a).

Fig. 10.1 Insertion sort. The integers that are known to be sorted at the beginning of each step are
underlined
■ SORTING ■ 201

In the insertion sort, the first two numbers are compared. If the one on the right is larger
than the one on the left, the second number is inserted in front of the number.
The first number slides over and takes the place of the second number. The process goes
on by scanning to the right until a smaller number is found. It is inserted at the correct location,
and the rest of the numbers slide one position to the right. This process is repeated until the end
of the list is reached. The list is then sorted. The basic operation is thus the insertion of a single
element into a sequence of sorted elements so that the resulting sequence is still sorted.
Consider the figure containing an array of five integers. When the fifth element, 235, is
considered by itself, it is a sorted list of length one. The transition from Figs. 10.1(a)-10.1(b)
consists of inserting 46 in the list of elements that is already sorted. Since 46 is less than 235, the
insertion of 46 is at top of the list and the sorted segment of the list now has a length of two.
Next, 162 is between 46 and 235, it is inserted between them by moving 46 up to make room.
The sorted subset of the list has now grown to a length of three. This is shown in Fig. 10.1(c).
Fig. 10.1(d) is obtained by inserting 205 into the list of elements that is already sorted.
This is accomplished by moving 46 and 162 up to make room. Finally, Fig. 10.1(e) is obtained by
inserting 390 into the list of elements (of length four) that is already sorted.
Algorithm 10.1 implements insertion sort as we just described it.
Algorithm 10.1: Insertion sort
Input: An array A with n elements, A [ 1],...,A [n]
Output: Sort the array A into ascending order
Step 1: for k= n-1 to 1 by -1 do
Step 2: j =k+l
Step 3: s=A [k]
Step 4: A [n+1] = S
Step 5: while (S>A [j]) do
Step 6: A [j-1] = A [j]
Step 7: j = j+1
End
Step 8: A [j - j] = S
End
Step 9: Stop
The program for insertion sort is as follows.
Example 10.1

/* Program for insertion sort */


# define n 6
in t x []={42, 34, 56, 23, 78, 90};
m a i n ()
{
int I, t, j , .
printf("\n\n Input data: \n \n") ;
202 ■ DATA STRUCTURES USING C U

f o r ( 1 =0 ; I < n ; I++)
p r i n t f ( " %3 d" , x [ I ] ) ;
printf("\n*);
f o r 1=1; I <n ; I++)
f o r (j = l , j>0 ScSc x[j]<x[j-l] ; j — )
{
t=x[j];
x[ j ] = x [ j - 1 ] ;
x[ j - 1 ] = t ;
}
printf("\n\n Output data: \n \n");
for(1=0; I<6; I++)
printf("%3d", x[I]);
printf("n");
}
in p u t d a t a :
42 34 56 23 78 90
o u tp u t d a t a :
23 34 42 56 78 90

10.2.2 Selection Sort


The idea behind selection sort is the selection of the smallest (or largest) element from a se-
quence of elements. Suppose an array A contains five elements as follows:
390, 205,182, 45, 235
The selection-sort algorithm for sorting A works as follows.
First, find the smallest element in the list and put it in the first position. Then find the
second smallest element in the list and put it in the second position and so on. We now illustrate
the process in Fig. 10.2 for a sequence of five elements.

390 45 45 45 45
205 ----- ►205 182 182 182
182 182 ------- *>205 205 205
45 3 9 0 -------1 390 ------ ►390 235
235 -------1 235 235 235 i----- ►390

Fig. 10.2 Selection sort (smallest element after each step is underlined)

The smallest element, 45, is selected and exchanged with the first element, 390. This pro-
duces a sorted list of length one. Next, the smallest element among the second through the last
element is selected and then exchanged with the second integer in the list. This produces a
sorted list of length two. In the next step the transition is accomplished by selecting the smallest
■ SORTING ■ 203

integer among the third through the last element in the array. This element is then exchanged
with the third element in the sequence. The list is now of length three. Finally, the smallest
integer is selected among the fourth and fifth elements and that element is exchanged with the
fourth element. The entire list is now sorted.
The following algorithm implements the selection sort as we just described.
Algorithm: 10.2: Selection Sort
Input: An array A with n elements, A [1 ],..., A [n]
Output: Sort the array A into ascending order
Step 1: while (n>l) do
Step 2: for k=l to n-1 do
Step 3: Small = k
Step 4: for j=k+l to n do
Step 5: if (A[j] < A [small]) then do
Step 6: small = j
Step 7: X = A [k]
Step 8: A [k] = A [small]
Step 9: A[Small] = X
end
end
end
Step 10: stop
The program for selection sort is as follows.

Example 10.2

/* Program for selection sort */


# define n 6
int X[] = {6, 5, 4, 3, 2, 1};
m a i n ()
{
int i, t, j, small;
printf (#/\n\n Input data:\n\n" ) ;
for(i=0; i<n; i + +)
printf("%3d", X[i] ;
printf("\n");
for(i=0; i<n; i++)
{
small=i;
for(j=I+l; <n; j++)
204 ■ DATA STRUCTURES USING C ■

i f (X [j ] <X[small])
Small = j;
T= X[i];
X [i]=X[small];
X[small]=t;
}
printf("\n\n Output data:\n\n");
for(i=0; i<6; i + + )
printf("%3d", X[i]);
printf("\n");
}
Input data:
6 5 4 3 2 1
Output data:
1 2 3 4 5 6

10.2.3 Bubble Sort


The basic operation in an bubble sort is the exchange of an adjacent pair of key elements. The
overall sorting process consists of a number of passes over the keys. Bubble sort is an exchange
sort. The basic idea behind bubble sort is to imagine that the records to be sorted are kept in an
array vertically. The records with low key values are' light7and bubble up to the top. We make
repeated passes over the array, from bottom to top. As we move, if two adjacent elements are
out of order, that is, if the lighter one is below, we reverse it. Consider Fig. 10.3 for viewing the
process of bubble sort in ascending order.

390 <-----1 206 206 206 206


206 < -----' 390 * -----1 162 162 162
162 162 -»-----‘ 390 <-----1 46 46
46 46 46 «-----1 390 -----1 235
235 235 235 235 -*-----' 390
(a) (b) (c) (d) (e)
Fig. 10.3 The first pass of bubble sort in ascending order

In Fig. 10.3, the first pass is made over the data given in Fig. 10.3(a). In Fig. 10.3(a) 390 is
compared with 206. They are interchanged since 206 is smaller. The result is shown in Fig. 10.3 (b).
Next, 390 is compared with 162, and they are exchanged. This is shown in Fig. 10.3(c).
The process is then repeated and finally the result is seen in Fig. 10.3(e).
The first pass moves the largest element to nth position, forming a sorted list of length
one. The second pass moves the second largest element to the n -lth position. In general, after
ith pass, the i largest elements will be in the last i positions, so pass i+1 need only consider i-1
■ SORTING ■ 205

elements. In the mean time the small elements are moving slowly, or bubbling, towards the top.
So this sorting method is refered to as bubble sort.
If no exchange is made during one pass over the data, the data are sorted and the process
teminates. The following algorithm shows how an exchange or bubble sort can be implemented.
Algorithm 10.3: Bubble sort
Input: An array A with n elements, A [1 ] ..............A [n]
Output: Sort the array A into ascending order
Step 1: K = n
Step 2: While ( K > 1 ) do
Step 3: for j = 1 to K -l
Step 4: if(A[j] > A[j+1]) then do
Step 5: p = A [j]
Step 6: A[j] = A[j + 1]
Step 7: A [j + 1] = p
end
Step 8: K= K - l
end
end
end
Step 9: stop
The program for bubble sort is given below.
Example 10.3
/* Bubble sort program */
# include<stdio.h>
# define n 10
int X[]={12,15,5,9,4,7,3,19,4,20,};
m a i n ()
{
int i ,j ,k,temp,min;
/* Unsorted list */
printf("\n\t Unsorted List \n\n\n");
for( i=0; i<n; i++)
printf("% 4d", X[i]);
printf("\n\n\n");
for(i=0; i<n-l; i++)
{
min=i;
for(j = i + l ; j<n; j++)
206 ■ DATA STRUCTURES USING C ■

if(X[j ] <X[min])
min = j ;
temp = X [i ];
X[i] = X(min);
X[min] = temp;
printf("\n pass = %4d \n\n", i+1);
for(k = 0; k < n; K++)
printf("%3d", X[K];
printf("\n");
}
/* Sorted list */
printf("\n\n\n\t Sorted List \n\n");
for(i=0; i<n; i++)
printf("%3d", X [i ];
}
Unsorted List
12 15 5 9 4 7 3 19 4 20
pass - 1
3 15 5 9 4 7 12 19 4 20
pass = 2
3 4 5 9 15 7 12 19 4 20
pass = 3
3 4 4 9 15 7 12 19 5 20
pass = 4
3 4 4 5 15 7 12 19 9 20
pass = 5
3 4 4 5 7 15 12 19 9 20
pass =6
3 4 4 5 7 9 12 19 15 20
pass =7
3 4 4 5 7 9 12 19 15 20
pass = 8
3 4 4 5 7 9 12 19 15 20
pass = 9
3 4 4 5 7 9 12 15 19 20
Sorted List
3 4 4 5 7 9 12 15 19 20
■ SORTING M 207

10.2.4 Complexity Analysis


Complexity of bubble sort Traditionally, the time complexity for a sorting algorithm is mea-
sured in terms of the number of comparisons. The f(n ), number of comparisons, in the bubble
sort can be easily computed. The bubble-sort program was implemented with two loops. One is
inner loop and the sother is outer loop. The inner loop is executed the following number of
times:
(n - 1) + (n - 2) + ... + (n - k) =1/2 (2n - k - 1) (k)
where k indicates the number of executions of the outer loop before there is a pass during which
there are no exchanges. The inner loop contains one comparison and sometimes an exchange. In
other words, there are ( n - 1 ) comparisons during the first pass which places the largest ele-
ment in the last position; there are (n - 2) comparisons needed to place the second largest in the
next-to-last position, and so on. We now consider two cases—best case and worst case—de-
pending on the value of k.
The best case occurs when the value of k is 1. It indicates that no exchange is required
since the original data were sorted. So the value of f(n) is
F(n) = n-1 = O(n) (order of n)
The worst case occurs when K is n - 1 . The inner loop is executed 1 /2n ( n - 1 ) number of
times. Thus
F(n) = (n-1) + (n-2) + ... + (n(n-l))
= (n-1) + ( n-2) + ...+ 1
= n(n-l)/2
= 0 (n 2)
In other words, the time required to execute the bubble-sort algorithm is proportional to
n2where n is the number of input items.
The bubble-sort method has two serious drawbacks. First, its inner loop contains an ex-
change that requires three moves. Second, when an element is moved, it is always moved to an
adjacent position. So it is expensive in most cases.
Complexity of insertion sort The number f(n) of comparisons in the insertion-sort algorithm
can be easily calculated. The outer loop of the insertion-sort algorithm is always executed n-1
times, where n is the number of items to be sorted. In this case it is executed the following
number of times
1+2 + ... +n -1 = n(n-l ) /2
the f(n) is expressed as
f(n) = 1+2 + ... + (n-1 )= n (n -l) /2 = 0 (n 2)
Given all possible combinations of initial ordering of data, on the average, the inner loop
is executed half as many times as in the worst case.
Accordingly, for the average case
f(n) = 1 /2 + 2/2 + ... + (n -l)/ 2 = n(n-l)/4 = 0 (n 2)
This is because the inner loop of insertion-sort algorithm is producing a sorted list look-
ing for the element that is larger than the one being inserted. On the average, this requires
probing half the list. Thus an insertion sort will give the best performance for small elements or
for elements with sorted items that are time consuming to compare.
208 ■ DATA STRUCTURES USING C ■

Complexity of selection sort The selection sort algorithm consists of two loops, outer loop
and inner loop. In the algorithm, each of these is implemented using 'for' statement. There are
n-1 comparisons to find the smallest element, there are n-2 comparisons during pass2 to find
the second smallest element, and so on. Thus, f(n) is calculated as in worst case as
f(n ) = (n - 1 ) + (n - 2 )...+ 2+1 = n(n-l)/2 = 0(n 2)
The number of interchanges and assignments is independent of the data being sorted.
The selection sort always requires 0(n2) comparisons no matter how the data are initially or-
dered, but it never requires more than 0 (n) moves. So selection sort will give the best perfor-
mance for large elements with minimum sorted items. It requires fewer moves than an insertion
sort, and comparing sort items is not the dominant activity.
The following table presents the comparative study between three algorithms discussed so far.

Table 10.1
Method Worst Case Average Case
Bubble sort n (n -1 )/2 = 0 (n 2) n(n-1 )/2 = 0 (n 2)
Insertion sort n (n -1 )/2 = 0 (n 2) n(n-1 )/4 = 0 (n 2)
Selection sort n (n -1 )= 0 (n 2) n(n-1 )2 = 0 (n 2)

10.2.5 Shell Sort


In any sorting algorithm if we move items only one position at a time, its average running time
will be, at best, proportional to N2, where N is the number of records. If we want to make
substantial improvements over straight insertion, a new technique is required to move records
over a wide range instead of short steps. This method was proposed in 1959 by Donald L Shell.
It is called Shell sort or diminishing increment sort. The method is based on divide-and-conquer
technique invented by Donald Shell.
Consider an array of integers to be sorted using the shell sort. This method divides the
original list into sublists by taking every kth element beginning with the first one. The value of
k miss called an increment. Suppose we had a 12-item list
4 7 2 9 10 5 7 3
which was broken into three sublists using an increment of 4.
Listl 4 9 7 5
List2 7 10 3 10
List3 2 5 9 3
Each sublist is then sorted and a smaller increment is selected.
Listl 4 5 7 9
List2 3 7 10 10
List3 2 3 5 9
The entire list after one increment has been used.
4 3 2 5 7 3 7 10 5 9 10 9
Next, a new increment of 2 (i.e. k=2) produces two new sublist
Listl 4 2 7 7 5 10
List2 3 5 3 10 9 9
■ SORTING M 209

These two lists are sorted individually


Listl 2 4 5 7 7 10
List2 3 3 5 9 9 10
The list is now given below
2 3 4 3 5 5 7 9 7 9 10 10
Finally, one increment is selected to get the sorted sequence,
2 3 3 4 5 5 7 7 9 9 10 10
Each of the intermediate sorting process involves either a comparatively shorter list or a list
which is comparatively well ordered. So insertion sort can be applied to each sorting operation;
the list tends to converge quickly to its final destination.
The sequence of increments 4,2,1 is not mandatory; any other sequence can be used so
long as the last increment equals 1. For example, the above list can be sorted with increments
5, 3, and 1 instead of 4, 2, and 1. It is described as follows:
The original item list is given below:
4 7 2 9 10 5 7 3 9 5 10 3
k=5
Listl 4 10 9
List2 7 5 5
List3 2 7 10
List4 9 3 3
Each sublist is sorted as follows:
Listl 4 9 10
List2 5 5 7
List3 2 7 10
List4 3 3 9
The entire list after one increment is shown below:
4 5 2 3 9 5 7 3 10 7 10 9
A new increment, that is, 3 is now selected.
Listl 4 2 9 7 10 10
List2 5 3 5 3 7 9
These two lists are: sorted individually.
Listl 2 4 7 9 10 10
List2 3 3 5 5 7 9
The new list is obtained as follows:
2 3 4 3 7 5 9 5 10 7 10 9
Finally, 1 is selected as increment to get the following sorted list:
2 3 3 4 5 5 7 7 9 9 10 10
The algorithm for shell sort is given below.
Algorithm 10.4: Shell sort
Input: An unsorted array A of size N, A [1], ..., A [N]
210 ■ DATA STRUCTURES USING C ■

Output: Sort the array in ascending order


Step 1: I=N/2
Step 2: while (I > 0) do
Step 3: J= I
Step 4: while (J < N) do
Step 5: J= J+l
Step 6: K= J + l
Step 7: while (K > 0) do
Step 8: if (A [K] > A [K+I]) then do
Step 9: S= A[K]
Step 10: A[K] = A[K+I]
Step 11: A [K+I] = S
Step 12: K= K-I
else
Step 13: K = 0
end
end
end
Step 14:1 = 1/2
end
Step 15: Stop
Example 10.4

/* Shell sort program */


# include <stdio.h>
# define n 13
int X[n]={9,12,45,7,53,12,78,90,10,65,3,19,47};
m ai n ()
{
int i, J, K, TEMP , GAP;
printf ("\n\n\t Unsorted Data \n\n");
for(i=0; i<n;i++)
printf("%4d", X[I]);
printf("\n\n\n");
gap-1;
do {
gap=3*gap+l;
} while(gap<=n);
■ SORTING ■ 211

gap = gap/3;
for(;gap>0; gap=gap/2)
{
printf("\n\n\t\t Span Size = %4d \n\n", gap);
for(i=gap; i<n; i++)
for(j = i-l; j>0; j=j - gap)
i f (j +gap < n & & (x[j ]>x(X[J] + gap]))
{
temp = x [j ];
x[j] = x[j+gap];
x[j+gap]=temp;
}
for(K=0; K<n; K++)
printf("% 4d", X[K];
}
printf ("\n\n\t\t Sorted data \n\n");
for(i=0; i<n; i++)
printf("%4d", X[i]);
}
Unsorted Data
9 12 45 7 53 12 78 90 10 65 3 19 47
Span Size= 13
9 12 45 7 53 12 78 90 10 65 3 19 47
Span size = 6
9 12 45 7 53 12 47 90 45 65 53 19 78
Span Size=3
7 3 10 9 12 12 47 53 19 65 90 45 78
Span Size= 1
3 7 9 10 12 12 19 45 47 53 65 78 90
Sorted Data
3 7 9 10 12 12 19 45 47 53 65 78 90

Complexity of shell sort The main reason why straight insertion is relatively slow is that the
items are inserted sequentially. Thus the average running time is 0 (n 2.) Shell proposed an effi-
cient variant in which the insertion process is done in several passes of successive refinements.
For a given input size n, the passes are determined by an 'increment sequence' ht, h
.. .hj, where h = 1. In the early passes (when the increment are typically large), elements can be
displaced far from their previous position with only a few comparisons. The later passes "fine
tune" the placements of elements. The last pass, when h =1, consists of a single straight inser-
212 ■ DATA STRUCTURES USING C ■

tion sort of of entire array. The resulting expected running time is 0 (n5/3). The average running
time will be at least proportional to n2.
The shell sort is also called the diminishing increment sort because the increment se-
quence continually decreases. The method is most efficient if the successive values of h are kept
relatively prime to each other. Kunth has mathematically estimated that, with relatively prime
values of h, the shell sort will execute in an average time proportional to 0(n(Log 2 N)2) .

10.2.6 Quick Sort


The next sort we consider is the partition-exchange sort or quick sort. The method consists of a
series of steps, each of which takes a list of elements to be sorted as input. The output from each
step is a rearrangement of the elements so that one element is in its sorted position and two
sublists remain to be sorted. If the elements to be sorted are
X [1], X [2], ..., X [n]
then quick sort could rearrange the elements to produce the following order shown in Fig. 10.4.
X [1], X [2] ... X [j - 1 ] X[j] X[j + 1] ... X [n]
(a) (b) (c)
Fig. 10.4 (a) Each of these elements is smaller than X [j]. (b) Element in its sorted position,
(c) Each of these elements is larger than X [j]

Each step of quick sort positions a given list into three disjoint sublists. One of these
sublists is a single element that is in its sorted position. The sublist in the left of the sorted
element contains elements that are less than the sorted element whereas the sublist in the right
of the sorted element has elements that are higher than the sorted element. This sorting problem
is said to have been partitioned into further sublists and so on.
Let us illustrate the quick sort with an example. If the initial array is given as
26 57 48 37 12 92 86 33
and suppose we want to place the first element 26 in its proper position, the resulting array is
12 26 57 48 37 92 86 33
Note that the element 12 is less than or equal to 25 and each element above 26 is higher
than or equal to 26.
Since 26 is in its final position the original array has been decomposed into the problem of
sorting two subarrays
(12) and (57, 48, 37, 92, 86, 33)
The entire array is viewed as
12 26 (57, 48, 37, 92, 86, 33)
where parentheses enclose the subarrays that are yet to be sorted. Repeating the process
on the subarray on the right of 26 yields
12 26 (48, 37, 33) 57 (92, 86)
and further repetitions yield
12 26 (37, 33) 48 57 (92, 86)
12 26 (33) 37 48 57 (92, 86)
12 26 33 37 48 57 (92, 86)
■ SORTING M 213

12 26 33 37 48 57 (86) 92
12 26 33 37 48 57 86 92
Note that the final array is now sorted. The algorithm for quick sort is presented below.
Algorithm 10.5: Quick sort
Input: An array A of size n containing n distinct numbers, A [1]...A[n].
Output: Rearranging the elements in array A into ascending order.
Step 1: Right = n
Step 2: for left = 1 to n do
Step 3: Call function qs (A, left, right)
end
Step 4: Stop
Function: qs (A, Left, right)
Step 1: if (left < right) then do
Step 2: j=left
Step 3: K= right + 1
Step 4: while (j <= K) do
Step 5: while (A [j] < A [left]) do
Step 6: j= j + 1
end
Step 7: while (A[K]>A[left] do
Step 8: K= K - l
end
Step 9: if(j < k) then do
Step 10: s=A [j]
Step 11: A [j] = A[K]
Step 12: A [K] = s
end
end
Step 13: s=A [left]
Step 14: A [left] = A[K]
Step 15: A [K] = s
Step 16: qs (A, left, K -l) / * Recursive call * /
Step 17: qs (A, K+l, right) / * Recursive call * /
Step 18: Return
The program for quick sort is given below.
Example 10.5
int X []={5,2,4,7,1,4,3,8,9,4,6,7,10,12,2,15};
/* Quick sort program */
214 ■ DATA STRUCTURES USING C ■

m a i n ()
{
int 1,r ,i ,j ;
1 =0 ;
r=15;
printf("\n\t\t Unsorted data \n");
for(i=0; i<16; i++)
printf("% 3d", X [i ]);
printf("\n”);
for(i=0; i<16; i + +)
quick(X,l,r);
printf("\n\t\t Sorted data \n");
for(i=0; i<16; i + + )
printf ("%3d'', X[i]
printf("\n");
}
quick (x, 1, r)
int x [],1,r ;
{
int i, j, pivot, t, k;
if(r>l)
{
pivot = X[l];
i = 1 +1;
j=r;
do
{
while(X[i]< = pivot && i<r); i++;
while(X[j]> pivot && j>l); j — ;
if(i<j)
{
t=X[i];
X[i] = X [j ];
X[j] = t;
}
printf("\n pivot = % d \n", X[i]);
■ SORTING ■ 215

for(k=0; k<16; k++)


printf("% 3d", X[k];
} while(i<j) ;
t=X [ i ]
X[ i ] = X [ j ] ;
X[ j ] = t ;
}
if(j>i + l) quick(X, 1, j-1) ;
i f (j>r-l quick(X, j+l, r) ;
return;
}
Unsorted Data
5 2 4 7 1 4 3 8 9 4 6 7 10 12 2 15
pivot=5
5 2 4 2 1 4 3 8 9 4 6 7 10 12 7 15
pivot=5
5 2 4 2 1 4 3 4 9 8 6 7 10 12 7 15
pivot=5
5 2 4 2 1 4 3 4 9 8 6 7 10 12 7 15
pivot=4
4 2 4 2 1 4 3 5 9 8 6 7 10 22 7 15
pivot=3
3 2 1 2 4 4 4 5 9 8 6 7 10 12 7 15
pivot=2 —
2 2 1 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=l
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=2
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=2
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=3
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=4
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=4
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
216 ■ DATA STRUCTURES USING C U

pivot=4
12 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=4
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=5
1 2 2 3 4 4 4 5 9 8 6 7 10 12 7 15
pivot=9
1 2 2 3 4 4 4 5 9 8 6 7 7 12 10 15
pivot=9
1 2 2 3 4 4 4 5 9 8 6 7 7 12 10 15
pivot=7
1 2 2 3 4 4 4 5 7 7 6 8 9 12 10 15
pivot=7
1 2 2 3 4 4 4 5 7 7 6 8 9 12 10 15
pivot=6
12 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=6
1 2 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=7
12 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=7
1 2 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=8
1 2 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=9
12 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=12
12 2 3 4 4 4 5 6 7 7 8 9 12 10 15
pivot=12
12 2 3 4 4 4 5 6 7 7 8 9 10 12 15
Sorted Data
12 2 3 4 4 4 5 6 7 7 8 9 10 12 15
Complexity of quick sort Both the expected and the minimum number comparisons required
by quick sort are 0(n log2n). The worst-case behaviour of this algorithm is of the order 0(n 2).
However, if we are lucky enough then each time an item is correctly positioned, the sublist to its
left will be of the same size as that to its right. This would make the sorting of the sublists each
of size n/2. The time required to position an item in a list of size n is O(n).
Quick sort is an extremely good general-purpose routine. A big drawback of the method
is its worst-case performance. It requires 0(n 2) time to sort a list that is already sorted or nearly
m SORTING m 217

sorted. A good way to guard against guaranteed bad behaviour is to choose the pardoning
elements to be random element in the current sublist (say, the first, middle, and the last ele-
ments). This also has the effect of reducing the average number of comparisons.
If the smaller of the two sublists is always processed first after each partition, then the
required stack contains at the most log n entries. We can simulate the stack with only a constant
amount of space at a slight increase in computing time.

10.2.7 Merge Sort


One of the most common external sorts is the merge sort. When the data to be sorted do not fit
in memory, then an external sort is employed. External sort methods bring only portion of the
data into the computer's memory during comparison and swap operations.
Merging is the process of combining two or more sorted sublists into a third sorted list.
For example, merge sort accepts two sorted sublists x and y of nl and n2 elements, respectively,
and merges them into a third list z containing n3 elements. In other words, merge sort begins by
comparing pairs of elements one from each sublist. The smallest element is appended to a sorted
list and is replaced by the next element from its sublist. This process continues until there are no
more elements in one of the sublists. The remaining elements in the other sublist are then ap-
pended to the sorted list, and the sort is complete. For example, we can merge the two sublists
(503, 703, 765) and (087, 512, 677) to obtain (087,503, 512, 677, 703, 765). A straightforward way
to perform this is to compare the two smallest items, and output the smallest. This process
continues until only one sorted list remains. The method is illustrated in Fig. 10.5.
Sublistl (503 703 765)
Sublist2 (087 512 677)
pass 1: 087 (503 703 765)
Smallest element (512 677)
pass 2: 087 503 (703 765)
(512 677)
pass 3: 087 503 512 (703 765)
(677)
pass 4: 087 503 512 677 (703 765)
Sorted list: 087 503 512 677 703 765
Fig. 10.5 Successive passes for merge sort

Note that some care is necessary when one of the two sublists becomes exhausted.
Algorithm 10.6: Merge sort (Recursive version)
Input: Merge sort requires two arrays r and t. r-array holds data to be sorted and array t is used
for merging operation.
Output: Array r holds the sorted list in ascending order.
/* Recursive function */
Merge sort(n) /* Array r of size n */
begin
L = 1
218 ■ DATA STRUCTURES USING C ■

If (n > = 3)
then repeat
Merge(L, n, r, t)
L = 2 * L
Merge(L, n, t, r)
L =■ 2 * L
until L > = (n div 2)
if(L < n)
then begin
Merge(L, n, r , t,)
for K = 1 to n do
r [K] = t [K]
end
end
/* Merge function */
Merge(L, n, r, t)
begin
kl = 1
k2 = L+l
q=l
repeat
endl= kl+1
if(endl> = n)
then endl= n+1
else
begin
end2=k2+l
if(end3>n)
then end2= n + 1
repeat
if(r [Kl]< =r[K2]
then begin
t[q] = r[Kl]
q= q + 1
kl - kl+1
■ SORTING ■ 219

end
else
begin
t[q] = r [k2]
q=q + 1
k2 = k2 +l
end
until (kl=endl)or(k2=end2)
end
if(kl < endl)
then repeat
t[q] = r[kl]
q= q+1
kl= kl+1
until kl = endl
else repeat
t[q] = r[k2]
q= q + 1
k2= k2 + 1
until k2 = end2
kl = k2
k2 = k2 +1
until (kl > = n)
end
The non-recursive merge-sort program and recursive merge-sort program are given as
follows.
Sorting a list of elements by the method of merge sort
Example 10.6

# include< stdio.h >


void merging();
void m a i n ()
{
int nl, n2, n, z, a[10], b[10], c[10], i;
clrscr();
printf("Enter number of elements in the first sorted array: ===>");
scanf{"% d " , &nl);
220 ■ DATA STRUCTURES USING C ■

printf("/n Enter the elements of the first sorted array: \n");


f o r d = 0; i < nl; i + +)
{
scanf("% d " , &z)'
a [i] = Z;
}
printf("/n Enter number of the elements in the second sorted
array:=>"):
scanf("%d", &n2);
printf("\n Enter the elements of the second sorted array:\n"):
fort i =0 ; i<n2 ; i++ )
{ scanf("%d", &z);
b[ i ] = z ;
}
clrscr();
printf("Xn The elements of the first sorted array: \n");
for( i = 0 ; i< nl ; i++ ) printf ("%d", a [ i ] ;
printf ("\n The elements of the second sorted array : \n");
for( i = 0 ; i < n2 ; i + + ) printf ('^d", b [ i ] ;
n= nl + n2 ;
merging( a ,b , c , n l , n 2 , n ) ;
}

void merging(int x[], int y[], int z [ ] , int ml, int m2, int m3)
{
static int xlower = 0, ylower=0,zlower= 0 ;
int xupper = m l , yupper= m2, zupper = m3, 1;
if( x [xlower] < y [ylower])
{
z[ zlower ] = x[xlower] ;
x lower++ ;
}
else
{
z[zlower ] = y[ ylower ] ;
ylower++ ;
}
■ SORTING M 221

zlower++ ;
if ( ( xlower < xupper ) ScSc (ylower<yupper ) )
merging( x, y, z, ml, m2, m3 ) ;
else
{
while (xlower < xupper) Z[zlower++] = x[xlower++ ] :
while(Ylower <yupper) Z[zlower++] = y[ylower ++] :
printf("\n\n");
printf("Elements of the sorted list after combining");
printf("First and second array \n");
for(1=0; 1 < zupper ; 1++ )
printf("% d " , z [ 1 ];
}
}
Enter number of elements in the first sorted array: = = = > 6
Enter the elements of the first sorted array:
12 16 22 34 45 50
Enter number of elements in the second sorted array: = = = > 5
Enter the elements of the second sorted array:
34 39 42 51 59
The elements of the first sorted array:
12 16 22 34 45 50
The elements of the second sorted array:
34 39 42 51 59
Elements of the sorted list after combining first and second array:
12 16 22 34 34 39 42 45 50 51 59
Example 10.7
j J************************************************************************** j
/ * MERGING OF TWO SORTED FILES (Using recursion) */
I ************************************************************************** J
# include < stdio.h >
# include < conio .h >
# include < stdlib .h >
m a i n ()
{
void merge(int_x[], int y[], int xl, int x2, int cl, int c2, int i);
222 ■ DATA STRUCTURES USING C ■

int i , n l , n 2 ;
int a [100], b[100];
clrscr();
printf("Enter the total number of elements in list array \n");
scanf("%d", &nl ) ;
printf (,#\n Enter the element of list file in ascending order \n");
for(i=0; i<nl; i++);
{
printf("a [% d] = " , i + 1 ) ;
scanf("%d", &a [ i ] ;
}
printf("\n");
printf("Enter the total number of elements in 2nd array \n");
scanf( " %d " , &n2 );
printf("\n Enter the elements of 2nd file in ascending order \n");
for( i = 0 ; i < n2 ; i++)
{
printf("b [ % d ] = " , i + 1 );
scanf("% d ", & b [i];
}
printf("\n The list in ascending order \n");
printf ("<--------------------------------------> \n") ;
merge( a, b, nl, n2, 0, 0, 0 );
getch();
}
void merge(int x[], int y[], int xl, int x2, int c2, int i)
{
if ( Cl + C2 = = xl + x2 )
return;
else
{
if ( x [cl] < y[c2] ScSc cl < xl ) ( ( (c2 == x2) )
{
printf("c [ % d ] = % d \ n", i + 1, x[ cl ] ;
merge( x, y, xl, x2, cl, c2, i + 1 ) ;
}
m SORTING m 223

else
{
if ( ( ( x [cl] > y [c2] && c2 < x2 ) (
( cl = = xl ) )
{
printf("c [ % d ] > y [ % d\ n " , i +
1 , y [ c2 ] ;
m erge( x, y, xl, x2, cl , c2+ 1 , i + 1,
Y [ c2 ]);
}
}
}
}
Enter the total number of elements in list array
5
Enter the elements of list file in ascending order
a[l] =2
a[2] =4
a[3] =8
a[4] =11
a[5] =12
Enter the total number of elements in 2nd array
3
Enter the elements of 2nd file in ascending order
b[l] =1
b[2] =3
b[3] = 6
The list in ascending order
<-------------------------------------------------------------------
c [1] = 1
c [2] = 2
c [3] = 3
c [4] = 4
c [5] = 6
c [6] = 8
c [7] = 11
c [8] = 12
224 ■ DATA STRUCTURES USING C ■

Complexity of merge sort The algorithm merge sort makes a pass over the entire array each
time it is called. This occurs approximately log2n times in sorting a list of length n. The proce-
dure merge moves all n of the elements during each pass. So it requires n log2n moves, no matter
how the data are initially ordered. It will perform better for a linked list than for an array. The
number of comparisons during a pass depends on the order of the data. If the sublists of length
one are being merged, then there will be n/2 comparisons. If sublists of length greater than
n/2 are being merged, then as many as (n-1) comparisons may be required. The merge sort
requires approximately n log2n moves and some where between (n/ 2) log2n and n log2n com-
parisons. It has a very consistent performance since the effort required to use it does not affect
much the initial order of the data. It has a drawback to implement it using arrays, because it
requires twice as much memory as any of the other algorithms.

10.2.8 Heap Sort


The heap-sort algorithm was invented by J W J Williams in the year 1964. Like merge sort, but
unlike insertion sort, heap sorts, running time is 0(n log2n). Heap sort sorts in place—only a
constant number of array elements are sorted outside the input array at any time. The heap data
structure is an array object that can be viewed as a complete binary tree.
Fig. 10.6(a) and 10.6(b) show a heap as a binary tree and an array.

Fig. 10.6 A heap view as a binary tree (a) and an array (b)

In the above figure the number within the circle at each node in the tree is the value
stored at that node. The number prefix to a node is the corresponding index in the array. The
tree is completely filled on all levels except possibly the lowest, which is filled from the left up
to a point.
■ SORTING ■ 225

A heap is a binary tree which must satisfy the following two conditions:
(a) The data value that is sorted in any node is less than or equal to the value in any of that
node's children. The value stored in the root of a heap is always the smallest value in the
loop. This condition is called the ordering property of heaps.
(b) A heap must be a complete binary tree. This property is known as the structuring prop-
erty of heaps.
Suppose that A [1], A [2], ..., A[n] is a sequence of elements. This sequence is a heap if it
satisfies the following two conditions
A[i] < A[2i] ... (i)
A[i]< A[2i+1] ...(ii)
for all applicable values of i.
The two conditions will be referred to as the heap conditions. Fig. 10.7 shows three
arrangements of a sequence of elements from a heap. Each element is an integer and each se-
quence contains a set of integers.

1 2 3 4 5 6 7 8 9
A 10 20 25 30 40 42 50 52 55

(a)

1 2 3 4 5 6 7 8 9
A 10 30 20 40 50 25 55 52 42

(b)

1 2 3 4 5 6 7 8 9
A 10 40 20 50 42 30 25 52 55

(c)

Fig. 10.7 Three distinct heaps from same set of integers

The data sets of array A in Fig. 10.7(a) are sorted and then form a heap. So the sorted data
always forms a heap as shown in Fig. 10.7(b) and Fig. 10.7(c). Many other arrangements of the
same data also form a heap. Fig. 10.8 shows binary trees constructed from the heaps.
The heap-sort proceeds in two phases. First, put the data into the heap and, second, data
are extracted from the heap in sorted order. A heap sort is similar to the selection sort because
both algorithms select and then swap into sorted order successively larger elements. A heap
sort uses a more efficient data structure than selection sort but a selection sort will be faster than
heap sort in case of a smaller set of elements.
Initially in a heap-sort structure the smallest element is on top. Thus, if the elements
forming the heap are sorted on the array elements A [1], A [2], ..., A[n], then the elements with
smallest key will be stored as A [1]. Thus it is required to find element with the second smallest
key.
226 ■ DATA STRUCTURES USING C ■

Fig. 10.8 Binary trees constructed from the heaps

So we will proceed in the following way.


The first pass exchanges A [1] and A [n] and then sifts down A [1] in the sequence A [1],
..., A [n -1 ] so that A [1],..., A[n -1 ] forms a heap. The result is that A [n] is a sorted list of length
one and the second smallest element is A [1]. The second step exchanges A [1] with A [n -1 ] and
then sifts down A [1] in the sequence A [1], . . A [n - 2], creating a heap with n - 2 elements.
Now, A [n -1 ] and A [n] from a sorted list of length two, and A [1] is the third smallest element.
The ith step exchanges A [1] with A [n - (i -1 )] and sifts A [1] in the sequence A [1],..., A[n -1 ]
creating a heap with n - i elements. The sequence A [n - ( i - 1 )] ,..., A [n] is sorted, and A[l] is
the ith smallest. We now present the heap-sort algorithm.
Algorithm 10.7: Heap-sort
Input: An input array A consisting n elements.
Output: The resulting sorted array A.
Step 1: j= [n/2] + 1 /1 * [.] integer part */
Step 2: while (j > 1) do
Step 3: j = j —1
Step 4: call function siftdown ( j, n )
end
Step 5: K= n
■ SORTING ■ 227

Step 6: while (K > 1) do


Step 7: T = A [1]
Step 8: A [1] = A [K]
Step 9: A [K] = T
Step 10: call function siftdown (1, K)
End
Step 11: stop
Function: siftdown (X, Y)
Step 1: i = X
Step 2: J= 2 * i
Step 3: P = A [i]
Step 4: while (J < = Y) do
Step 5: if (J < y) and if (A [J] < { A [j + 1) then do
Step 6: J = J + 1
end
Step 7: if ( P > = A [j] then do step 11
end
Step 8: A [i] = A [J]
Step 9: i = J
Step 10: J = 2 * i
end
Step 11: A [i] = P
end
We now present the C source code for the heap-sort algorithm.
Example 10.8
# include <stdio.h>
# define SIZE 100
int A[SIZE];
m a i n ()
{
int no, i ;
int 1, r ,X ;
printf("\n Enter number of elements:");
scanf( % d", &no);
printf("\n Enter the elements to be sorted:");
for( i = 0 ; i < no; i + + )
scanf( " % d " ,&A[i] );
228 ■ DATA STRUCTURES USING C ■

printf("Xn The elements in the array A before sorted \n");


for( i = 0 ; i < n; i + + )
printf(" % d * , A [i] ) ;
printf( " \n " );
1 = ( no/2 )+ 1;
r= no;
while( 1 > 1 )
{
{!■— ;
siftdown(1,r);
}
while ( r > 1 )
{
X=A [1] ;
A [1 ]=A[r] ;
A [1]=X;
r— ;
siftdown( 1, r );
}
printf ("\n The sorted list is as follows : \n");
for( i = 0; i <=n ; i++ )
printf("% d \t", A [ i ];
printf("Xn");
}
/* Function siftdown */
siftdown ( 1, r )
int 1, r;

int i,j , X ;
i = 1 ;
J =2 *i ;
X = A [ i ] ;
while ( j < = r )
{
if( A [ j ] < A [ j + 1 ] )
j ++ t
■ SORTING M 229

if ( X > = A[ j ] break ;
A[ i ] = A[ j ] ;
i = j;
j = i * 2;
}
A[ i ] = X ;
return ;
}
Complexity of heap sort In any sorting method we have a sequence of n integer values, that is,
Av A2, ..., An, and it is required to sort them either in ascending or descending order. Heap sort
proceeds in two phases— create heap phase and maintaining heap property phase. The heap-sort
procedure takes time 0 (n log2 n) since the call to CRE ATEHEAP takes time 0 (n) and each of the
n - 1 calls to maintain the heap property. The overall time complexity of heap-sort is therefore
O (n log2n) in both the average case and the worst case.

10.3 SORTING ON MULTIPLE KEYS


There is practically no data processing activity that does not require the data to be in some
order. Computer sorting techniques are important in many applications. So far we have dis-
cussed various sorting methods such as insertion sort, bubble sort, quick sort, shell sort, heap
sort, and merge sort. Sorting algorithms are broadly classified into two categories—internal
sorting and external sorting. When the data to be sorted does not fit in memory, then an external
sort is employed. One of the most common external sorting method is the Merge sort. The idea
behind merge sort can also be used in case of merging two files.
Sorting is also done upon some key data items. The use of key complicates the sorting
process. Sorting is done either in the ascending order or descending order of the key. So far we
have assumed that the key field is an integer. It is not always true. For example, consider the
case of a payroll problem where the necessary informations are employee number, employee
name, employee address, department number, basic pay, allowances, deductions, and so on. If
the sorting in ascending order is done on employee name, then the employee whose name is
greater than (in collating sequence) that of another employee will appear later in the sorted list.
It is also possible to sort on multiple keys. If we want to sort the said informations according to
department number and within each department according to the employee number, then two
keys are involved, that is, department number and employee number. The first one is called
primary key and the second is called secondary key. In this case the department number is the
primary key and the employee number is the secondary key. Suppose we want to sort first
department number in ascending sequence and then within each department number in the
descending sequence of basic pay, that is, all employee informations having the same value for
department number are to be arranged from the highest to the lowest values of basic pay.
So far, we have illustrated a wide range of sort routines. All of them sort array of integer
in ascending order. However, programmers are frequently faced with the need to sort strings. If
the strings are themselves elements of an array, we can proceed in a manner similar to the sort
routines described in this chapter. The difference is in how we make the comparison between
array elements and in how array elements are interchanged.
230 ■ DATA STRUCTURES USING C ■

In C language, comparing string is done with the help of library function s trcmp. This
function compares two strings element by element until a difference is found. If the two strings
are the same, it returns 0, else it returns a positive number or a negative number depending on
whether the first string is lexicographically greater than the second string or less than the sec-
ond string. While comparing strings is done, it is needed to copy the strings. The following
program illustrates how the quick-sort program can be modified to sort strings.
Example 10.9
# include < stdio.h>
/* Sorting an array of strings */
# define n 10
ma i n ()
{
Char *Names[N]
Int i, left, right;
Left= 0; right = N - 1 ;
/* Read an array of names */
for(i = 0 ; i < N ; i ++)
scanf(" %s\n " , *Names[i];
printf( " \n\n Unsorted names are given below : \n\n ");
for(i = 0 ; i<n ; i++ )
printf( " % s " , Names[I];
/* Call quick sort routines */
Quick sort (Names, left, right ) ;
/* Printing of sorted names */
printf( ' /n/n Sorted names are given below \n\n " ) ;
for( i = 0 ; i<n ; i++)
printf(" \n % s " , Names[i] ;
return 0;
}

/* Quick sort function */


Quick-sort ( x, left, right )
Char *x[] ;
Int left, right ;
{
int i, j ;
char pivot[ 10 ] , t[ 10 ] ;
■ SORTING ■ 231

if ( right > left )


{
strcpy( pivot, x{left] ) ;
i = left + 1 ;
j = right ;
do {
while(strcemp (x [ i ] , pivot )<=0)&& i < right )
i ++ r
while(strcmp (x [ j ] , pivot ) >=0 &&j> left)
j-- ;
if(i < j )
{
strcpy( t, x [ i ] ) ;
strcpy( x [ i ], x [ j ] ;
strcpy(x [ j ] , t );
}
} while( i < j ) ;
strcpy( t, x [ left ] );
strcpy( x [ left ], x [ j ];
strcpy( x [ J ] , t );
}
if (j>left+l) Quick_sort ( x, left, J - 1);
if (j>right-l) Quick_sort ( x, j + 1 , right);
return;
}
We now present two more programs for sorting employee records. The first program
sorts the employee records on employee names. The second program sorts the names as pri-
mary key and employee identification as the secondary key.
Example 10.10
# include<stdio.h>
# include<conio.h>
# include<string.h>
void m a i n ()
{
struct employee
{
in t id;
232 ■ DATA STRUCTURES USING C ■

char *name ;
float salary ;
};
struct employee e[100] ,arr;
int i , k, j , len;
float n ;
char * rname;
clrscr();
printf( " \n Enter the number of employees ; " ) ;
scanf ( " % f . * , Sc n : i ++ ) ;
{
printf( "\n Enter the id number, name, salary: 9 );
scanf(*%d %s %f", &e[i]. id, rname, &e[i];salary );
len= strlen( rname );
len= len * sizeof( rname );
e[ i ]. name = (char * ) malloc ( len );
strcpy( e [ i ] . name, rname ):
}
/* Starts the sorting process */
for ( i = 1 ; i <= n; i++)
{
for ( j= 1; j<=( n-*l ); j+ + )
{
k = Strcmp( e [ j ] .name, e [ j + 1 ] .name ) ;
If ( k > 0 )
{
arr = e [ j + 1 ] ;
e[j + l ] = e [ j ] ;
e [ j ]= arr ;
}
/* If the two strings are equal ! ! */
i f ( k == 0 )
{
if ( e [ j ] . i d > e [ j + 1 ].id)
{
arr= e[ j + 1] ;
■ SORTING ■ 233

e[ j + 1 ] = e[ j ] ;
e [ j ] = arr ;
>
}
}
}

/* Prints the sorted output */


printf( ' \n \n SORTED OUTPUT : - ' ) :
printf( * \n \n nID NO : NAME SALARY ' ) ;
printf( " \n " ) ;
for( i = 1 ; i < = n ; i++ )
printf("\n%d %sRs. %2f*, e[i].id, e[i].name,e[i].salary);
>
Enter the number of employees: 6
Enter the id number, name, salary: 1 parama 3456.99
Enter the id number, name, salary: 2 parama 8976.78
Enter the id number, name, salary: 3 depali 5676. 77
Enter the id number, name, salary: 4 mahuya 2341.98
Enter the id number, name, salary: 5 depali 5678.98
Enter the id number, name, salary: 6 parama 7896.65
Sorted Output:
ID NO NAME SALARY
3 depali Rs. 5676. 77
5 depali Rs. 5678.98
4 mahuya Rs. 2341.98
1 parama Rs. 3456.99
2 parama Rs. 8976. 78
6 parama Rs. 7896. 65
Example 10.11
/* A PROGRAM TO SORT A EMPLOYEE RECORD ON EMPLOYEE NAMES*/
# include <stdio.h>
# include <string.h>
# include <stdlib.h>
main()
{
Struct employee {
234 ■ DATA STRUCTURES USING C ■

int eno;
char ename[30];
float salary;
int egrade;
In-

struct employee E[40]; .


int i,n, no,j,c, gr, grl;
float sal,sail;
char ch, str[30];
clrscr();
printf( " \n Enter how many employees " );
scanf(" \n %d", &n );
for ( i=0;i<n,i++ )
{
printf (" \n Enter emp- no. " );
scanf ( " \n %d#', &no) ?
E[i].eno= no;
printf ("\n Enter emp-name" );
scanf("\n % s", & E [i].ename);
printf("\n Enter employee salary ");
scanf(" \n % f " , &sal);
E [ i ] .esalary = sal;
printf ( " \n Enter the grade " );
scanf( "\n % d " , &gr );
E [i].egrade = gr ;
}
for( i=0;i<n~l; i++ )
{
for( j = 0; j<n-l; j++ )
{
k= strcmp(E[j].ename, E[j + 1 ].ename);
if (k>0) {/* The swap function resides here */
/* l.Swap for name */
c=0;
ch= E [j ].ename[c ];
■ SORTING U 235

while(ch !='\0')
{
str[c]= ch ;
C++ ;

ch=E[j ].ename [c ] ;
}
str[c]='\0';
c=0;
ch=E[j + l].iname[c];
while (ch! = '\0' )
{
E [j ].ename[c ] ;
c+ + /
ch = E [j + 1].ename[c ];
}
E [j ] .ename[c ] ='\0';
c=0 ;
ch=str fc];
While ( ch ! = #\0')
{
Etj+1].ename[c]=ch;
C++ f

ch= str[c];
}
E [j + 1]. ename [c ] = '\0';
/ * Swap for employee no */
no=E[j].eno;
E[j].eno=E [j + l].eno;
E [j + 1].eno= no;
/* Swap for employee salary */
sail = E [j].esalary;
E [j ].salary = E [j + 1].esalary;
E[j+1].esalary = sail ;
/ * Swap for employee grade */
grl=E [j].egrade;
E [j].egrade=E[j+1].egrade;
236 ■ DATA STRUCTURES USING C ■

E [j + 1 ] .egrade= grl;
}
}
}
printf( "\n The sorted employee list is: " );
for( i =0; i<n,i++)
{
printf("\n%d\t%s\t%f\t%d",E [i].eno,E[i].ename,
E[i].esalary,E[i].egrade);
}
}
Enter how many employees 5
Enter emp-no 7171
Enter emp-name T. Dey
Enter employee salary 6000
Enter the grade 2
Enter emp-no 5911
Enter emp-name P. Roy
Enter employee salary 3500
Enter the grade 3
Enter emp-no 2110
Enter emp-name A. Sharma
Enter employee salary 5000
Enter the grade 2
Enter emp-no 1010
Enter emp-name R. Kumar
Enter employee salary 8500
Enter the grade 1
Enter emp-no 3950
Enter emp-name J. Dutta
Enter employee salary 9700
Enter the grade 1
The sorted employee list is:
2110 A. Sharma 5000. 000000 2
3950 J. Dutta 9700. 000000 1
5911 P. Roy 3500.000000 3
1010 R. Kumar 8500. 000000 1
7171 T. Dey 6000. 000000 2
■ SORTING ■ 237

E mX mE mR mCml mS mE mS
1. Write a program to sort the following sequence of eight integers
57, 27, 22, 95, 79, 45, 96
using insertion sort. In each case, keep track of the number of exchanges and comparison
required.
2. What starting conditions are necessary to produce the worst-case behavior of bubble sort?
3. Both insertion and bubble sort are particularly deficient with respect to its ability to move
elements long distances. Shell sort attempts to move element through long distances. It is
fairly simple matter to modify the insertion sort algorithm so that is becomes a shellsort
algorithm. Write an algorithm to implement this.
4. Mean sort uses mean value of those elements being partitioned. Implement mean sort for
an array containing the following integers:
482, 231, 928, 204,105, 428, 379, 47
5. A sorting algorithm is stable if it preserves the order of elements with equal keys. Which of
these algorithms—quick sort, heap sort, or merge sort are stable?
6. The algorithm for heap sort places elements in ascending or descending order. What changes
of a C-Program are needed so that the result will be in ascending order if it originally sorts
in descending order or vice-versa?
7. A drawback to merge sort for arrays is its requirement for twice as much memory as any of
the other algorithms. Merge sort moves all elements during each pass. So it requires nlog2n
moves no matter how the data are initially ordered. It will therefore perform better with a
linked list data structure than with an array. Implement merge sort algorithm using linked
structure.
8. It is noted that (3 /2) nlog2n is a conservative bound for the number of exchanges and com-
parisons required for heap sort algorithm. Run tests on random sequences of integers to
determine the number of exchanges and comparisons actually required. Plot the experi-
mental results along with the value (3/2) nlog2n for merge sort.
9. What are similarities and differences between selection sort and heap sort?
SEARCHING

11.1 INTRODUCTION
We now consider different methods of searching large amounts of data to find one particular
piece of information. This chapter focuses on four general types of searching, that is, sequential,
binary, indexed search, and hashing schemes. The purpose here is not to present an exhaustive
review of all possible search techniques, but to highlight how the main technique can be imple-
mented.
We first define some common terms. A table or a file is a collection of elements, each of
which is called a record. Each record consists of a set of fields. One of the field has a special
meaning called key field. Such key is also called an internal key. Sometimes there is a separate
table of keys that include pointers to the records. Such keys are called external keys. For each
file, there is at least one unique key, that is, no two records have the same key value and it is
called a primary key. However, keys need not always be unique. Such a key is called a second-
ary key.
A searching method is a process that accepts an argument x and tries to locate a record
whose key value is x. The process may return the entire record or it may return the address of
the record. If searching for a record is unsuccessful, then a null value will return. We start our
discussion with the simplest searching technique which is known as sequential search.

A search that looks through a list from the top to bottom while checking each item for a match
is a sequential search or linear search. This search is applicable to a table organised either as an
array or as a linked list. The item to be matched is called a key. Let us assume that x is an array
of n keys, x[0] through x[n-l], and r an array or record, r [0] through r[n-l], such that x[i] is the
key of r[i]. The search begins at array index 0 and ends either when a match is found or the end
of the array is reached. We want to return the smallest integer i such that x[i] equals key if such
an i exists and -1 otherwise. The program segment may look as follows:
for(i=0; i<n ; i+1)
if(key == x [i])
return(i);
return(-1);
The program segment is simple enough. It examines each key in turn and its index is
returned if it matches with the search argument, otherwise -1 is returned.
The sequential search operates with the fact that the entries in the list are in order. If the
■ SEARCHING ■ 239

key is the last entry in the list or if the key is larger than all the entries, then the algorithm will do
n comparisons for n elements.
The following algorithm illustrates the idea of sequential search.
Algorithm 11.1: Sequential search
Input: An array A with n elements (1,..., n) and x is the item to be searched.
Output: The location of x in array A; otherwise return 0 if x is not found.
Step 1: Seti=l
Step 2: While i<=n and A [i] * x do
Step 3: Set i = i+1
end while
Step 4: if i>n then set i =0
end if
Step 5: stop
The sequential search algorithm terminates with index i equal to the index of the first occur-
rence of x in A, if x is present, and equal to 0 otherwise.
The sequential-search algorithm is implemented with a C program as follows:
Example 11.1
# include<stdio.h>
/* Program for Sequential search */
#define N 7
int A[ ] = {20,17,18,7,5,6,19};
m ai n ()
{
int x, y;
printf("\n\n Enter the item to be searched :\n");
scanf("%d", &x );
y=sequential(x);
if(y « -l)
printf("\n Element is not found \n");
else
printf ("\n Element is found and its position is %d\n*, y ) ;
}
/* Function sequential */
sequential(key)
int key;
{
int i;
for(i =0; i<N ; i++)
240 ■ DATA STRUCTURES USING C ■

if(key == A [i])
return (i);
return(-1);
}
A sample run is shown below :
Input: 17
Output: Element is found and its position is 1
Input: 16
Output: Element is not found

The most efficient method of searching a sequential table without the use of indices or tables is
the binary search. This searching technique is a divide-and-conquer method applicable to sort
data items. The data can be in array or a file. When the data is a structure, a key field is used. The
primary limitation of the method is that the data must be in sorted order. Let us first explain
why this method is better than the sequential method.
Consider an array of elements in which they have been placed in some order. For ex-
ample, a dictionary or telephone book may be thought of as an array whose entries are in alpha-
betical order. Suppose we want to search a name for finding his/her telephone number in a
telephone book or a word in a dictionary. In sequential search, each item of the array is exam-
ined in turn and compared with the item being searched for, until a match occurs. If the list is
unordered, the sequential search may be the only way to find anything in it. Such a method
would never be used in looking for a name in the telephone directory since it may complicate
the searching process. Instead of this, the book is opened to a random page and the names on
that page examined^Since the names are ordered alphabetically, such an examination would
determine whether the search should proceed in the first half or second half of the book. This is
the basic idea of the binary search.
In the binary search, an array is divided into two parts. Now, compare the item being
searched for with the item at the middle of the array. If they are equal, the search has been
completed successfully. If the middle element is greater than the item being searched for, the
search process is repeated for the left half of the array since it may appear anywhere in the first
half. On the other hand, if the middle item is less than the item being searched for, then the
process is repeated in the second half. This method reduces the number of elements yet to be
searched into half for each comparison. For large arrays, this method is superior to the sequen-
tial search in which each comparison to the sequential search reduces the number of elements
yet to be searched by onlyone. Let us illustrate the method with the following example.
Suppose we want to locate the data associated with key value 1649. We begin the search in
1129 1203 1211 1519 1609 1649 2821 3575 9279 9289
the middle of the list. In our example, first we search the key found at position 5. The key value
1649 is greater than the key (i.e., 1609) at position 5, we conclude that the key we want to find
should be in between 6 through 10 if it is to be found at all. We again divide by of sum 6 and 10
and find that the key at position 8 is greater than the actual key. This process continues until we
find the key 1649 at position 6.
■ SEARCHING ■ 241

11.3.1 Algorithm Binary Search


Input: A is an ordered array with n entries and x is the item sought.
Output: If x is not in A, return o; otherwise return the position of x in A.
Step 1: Set 1=1 and r=n
/* 1and r indicate the indices of the first */
/* and last entries of the array respectively */
Step 2: Set flag = 0
Step 3: While 1 <=r and flag = 0 do
Step 4: set i = (l+r)/2 / * Index of middle entry */
Step 5: of x = A [i] then flag =1
else
if x <A [i] then r = i-1
else 1 = i+1
end if
end while
if not flag then i=0
end
The idea behind the binary search algorithm is to eliminate half the entries with each
comparison. Initially, we compare the searching element, say x, to the middle element of the
list. If x is larger than the middle element of the list, then it is in the second half. If x is smaller
than the entry in the middle of the list, the second half of the list is eliminated from consider-
ation. If x is equal to the middle entry of the list, the searching process terminates. In each
comparison the size of the section of the list that may contain x is cut in half. The same process
continues until x is found or it is determined that x is not in the list. The binary-search algorithm
requires Log2n comparison whereas the sequential or linear search requires n comparisons in
the worst case.
The following program implements the binary-search algorithm.
Example 11.2
#include<stdio.h>
/* Program for Binary search */
int x [ ] = {1,5,7,8,10,12,14,17,18,20,22,23,24};
int n =12;
main ( )
{
int y, i,key;
printf("\n Enter element to be searched \n”);
scanf(" %d", &key );
clrscr( );
242 ■ DATA STRUCTURES USING C ■

printf("%3d", x[i]);
printf("\n\n\n");
y= binary search(key);
printf("\n\n % d\n", y ) ;
}
/* Binary search function */
binarySearch(key)
int key;
{
int 1, r, m;
1 = 0;
r - n+1;
while( r !=(1+1))
{
m = (1 + r ) / 2 ;

if (x[m]<= key )
l=m;
else
r= m;
}
if (x[l]==key )
{ printf("\n Element is found and its position is
=%d\n" );
}
else
{
printf("\n Element is not found \nw);
}
}
A sample run is given below:
Input:
Enter element to be searched
17
Output:
1 5 7 8 10 12 14 17 18 20 22 23 24
Element is found and its position is = 7
■ SEARCHING ■ 243

11.4 INDEXED SEQUENTIAL SEARCH


Indexed sequential search approach is especially applicable in searching direct access second-
ary storage devices. The idea behind the use of an index is similar to the way we use an address
book to find a person. This search technique generates record addresses by means of a table,
called an index, which stores the key values and relative addresses of the records in a file. Given
a key value, its relative address is sequentially searched for in the index, and if it exists, its
relative address is picked from the index and the corresponding record is directly accessed
using this relative address. A considerable amount of time is saved, but extra space is needed to
store the index. It is rather like the use of a card index in a library. The user looks up for the
name of the book he wants in the card index, and the index gives the catalogue number, which
is like a relative address of the position on the shelves.
The algorithm used for index search is straightforward. The general strategy of an in-
dexed search is to use the key to search the index, find the relative record position of the related
data, and from there make only one access into the data. Since the parallel lists of keys and
relative record positions require much less storage than what the data do, frequently the entire
index can be loaded and permanently held in the main memory so that only one disk is accessed
for each record being sought. For large indexes, large blocks of keys and associated pointers are
to be manipulated for better searching efficiency.
The strategy for conducting indexed sequential search method (ISAM) is based on the
following.
* Scan the index table for a key that is greater than or equal to the given key that is to be searched.
* Next follow the corresponding pointer to search sequentially for the corresponding key.
The following program implements the indexed sequential search technique for finding
the target key.
Example 11.3

Indexed sequential search

#include <stdio.h>
struct INDEX
{
int kindex;
int*pindex;
};
struct INDEX index[10];
int datafile[25], n, size;
void main ( )
{
int i,j=0, found, x,indxsize,key;
int * low, * high, *temp;
clrser ( );
244 ■ DATA STRUCTURES USING C M

printf("Enter how many elements in the file (Not more


than 25) = = = =>");
scanf("%d", &n );
printf("\n Enter the data: In sorted order \n");
for(i =0; i<n, i++ )
scanf ("%d", &datafile[i]);
printf("Enter the size of the index table = = = > " ) ;
scanf("%d", &size);
clrscr( );
printf("\n The input data file of integers is as
follows:") ;
printf("\n = = = = = = = = = = = = ====\n");
for( i=0, i<n; i++)
{
printf("\n At address");
printf("%u<===>Element %d" -- >%d",&datafile(i),
i + 1 , datafile[i]);
for( i=size-l; i<n; i+=size)
{
index [j].kindex = datafile[i];
index [j].pindx = datafilefi];
j++ /*
}
if (n%size!=0)
{
index[j].kindex=datafile[n-1 ] ;
index[j].pindex=& datafile[n-1];
j ++ ;
}
indxsize=j;
printf("\n The Index Table Form the Input file as follows:");
printf("\n ======================================\n\n");
printf("DATA FIELD ADDRESS FIELD (POINTER)\n")
printf ("---------------------");
for(i=0; i<j; i++)
■ SEARCHING ■ 245

{
printf("\n%5d%17u", index [i], kindex, index [i].pindex);
}
printf("\n Enter the number to be searched = = = > " ) ;
scanf("%d"; &key);
found=0 ; x=0;
whi l e (i<indxsize) &&(found ==0))
{
if(index[x].kindex<key)
found=l;
else
x=x+l;
}
if(x== 0)
low = &datafile[0];
else
low = index[x-1].pindex;
if(found ==1)
high = index[x].pindex-1;
else
high=&datafile[n-1] ;
temp = low;
found =0;
while(( temp <= high) && (found == 0))
{
if (*temp== key)
found = 1;
else
temp++;
}
if(found ==1)
printf("\n Found at address %u: Element is %d", temp, *temp);
else
printf("\n Data not found");
getchO;
}
246 ■ DATA STRUCTURES USING C ■

Enter how many elements in the file (Not more than 25) =======>10
Enter the data: In sorted order
12 19 23 35 42 46 55 67 70 85
Enter the size of the index table===>4
The input data file of integers is as follows:

At address 1810 <=====> Elementl--------->12


At address 1812 <=====> Element2--------->19
At address 1814 <=====> Element3---------> 23
At address 1816 <=====> Element4---------> 35
At address 1818 <=====> Element5---------> 42
At address 1820 <=====> Element6---------> 46
At address 1822 <=====> Element7---------> 55
At address 1824 <=====> Element8---------> 67
At address 1826 <=====> Element9---------> 70
At address 1828 <=====> ElementlO---------> 85
The index table formed from the input file is as follows:

DATA FIELD ADDRESS FIELD (POINTER)

35 1816
67 1824
85 1828
Enter the number to be searched ====> 55
Found at address 1822: Element is 55

11.5 HASHING SCHEMES


In the previous method we assumed that the record being sought is stored in a table and it is
necessary to pass through some number of keys before finding the desired one. In hashing
methods we would like to have a table organization and search technique in which no unneces-
sary comparisons are needed for finding the desired key. If each key is to be retrieved in a single
scan, the location of the record within the table can depend only on the key.
When the number of keys actually stored is relatively small to the total number of pos-
sible keys, hash tables become an efficient alternative to directly addressing an array, since a
hash table typically uses an array of proportional size to the number of keys actually stored. For
example, in an inventory control application, product identification codes are numbered from
10,000 onwards. We need to locate its position at (key-9999) in the list. The method is known as
key-to-address transformation or hash function.
With direct addressing, an element with key k is stored in slot k. With hashing, this ele-
ment is stored in slot h(k), that is, a hash function h is used to compute the slot from the key k.
■ SEARCHING ■ 247

Unfortunately, it is also possible that two keys may hash to the same slot— a collision. Fortu-
nately, there are effective techniques for resolving the conflict created by collisions.
A good hash function satisfies the assumption of simple uniform hashing, that is, each
key is equally likely to hash to any of the m slots. Most hash functions assume that the keys are
natural numbers. Various techniques are available for generating the hash functions.
In the division method for creating hash functions, we map a key k into one of the m slots
by taking the remainder of k divided by m, that is, h(k) =k mod m. For example, if the hash table
has a size 20 and the key value is 119 then h(k)=5. It requires only division operation and natu-
rally works fast. Note that for a key value of 118 the hash function h(k) is same as for the key
value 119. This indicates a collision. To avoid collision, it may be better to select the value of m
as prime number.
In some applications, the key value is not an integer. For example, in the case of an insur-
ance number, folding technique for creating hash function works better than the other tech-
niques. The insurance number
567-96-1505
In the shift-folding method, the above number is viewed as three separate numbers to be
added
567
96
1505
2177
The above number can be treated as the hash position itself or further hashing technique
such as division remainder to get a final hash position in the desired range.
The shift-folding method has a great advantage for its ability to transform non-integers
keys into integers suitable for further hashing action.
We now present the simplest collision-resolution technique, called chaining. In chaining,
we put all the elements that hash to the same slot in a linked list. In a hash table T with m slots
that stores n elements, the average number of elements stored in a chain is n/m. In worst case,
all n keys hash to the same slot, creating a list of length n.
The following program illustrates the idea of searching a hash table using chaining
techniques.
Example 11.4

HASH-TABLE SEARCHING BY THE METHOD OF CHAINING

#include<stdio.h>
#include<stdlib.h>
#incude<alloc.h>
#define B 13
struct nodetype
{
in t d ata;
248 ■ DATA STRUCTURES USING C ■

struct node * link;


};
typedef struct nodetype node;
node *arr[15];
int num, count=0;
char ch;
node *root, *1, *ptr;
void display( );
node *search(int);
void m a i n ()
{
int i , j, item, y;
randomize*( ); clrscr( );
for(i=0; i<B, i + + ) a r r [i]=NULL;
printf("How many data to be inserted in the hash table=====>");
scanf("%d", &num);
for(i=0; i<num ; i++)
{
item = random(lOO);
y=hash(item);
if(arr[y]==NULL)
{
root = (node*) malloc(sizeof(node));
root->data = item;
root->link =NULL;
arr[y]= root;
}
else
{
root=(node*) malloc sizeof(node));
root->data = item;
root->link = NULL;
ptr = arr[y];
while (ptr->link! =NULL)
{
ptr=ptr->link;
■ SEARCHING ■ 249

}
ptr->1ink = root;
}
>
printf("The hash table is as follows:\n");
printf("=======================\n");
display( );
printf(" \n Enter the data to be searched ======>");
scanf("%d, &item);
prt= search(item);
if(ptr! =NULL)
printf("\n Data found: At position %d in hash table [%2d]",
count, hash(item));
else
printf("Data not found");
getch;
>
hash(int n)
{
int p;
p=n%B;
return(p);
}
void display();
{
int i;
for (i=o; i<B; i++)
{
root= arr[i];
printf("Hash table [%2d] ===>", i);
while (root! NULL)
{
printf("%d — >", root->data );
root=root->link;
}
printf("NULL\n");
}
250 ■ DATA STRUCTURES USING C ■

}
node *search(int n)
{
int found,i;
node *p, *q;
found =0;
i=hash(n);
p=arr[i];
while(( p !=NULL) &&(!found))
{
count++;
if (p->data ==n)
found =1;
else
p=p->link;
}
i f (found==l)
return(p);
else
return(NULL);
}
How many data to be inserted in the hash table =====>20
The hash table is as follows:

Hash table 0]------- -->91 ->39 >NULL


Hash table 1 ]-----==->NULL
Hash table 2 ]------ ->41 >2 >NULL
Hash table 3 ] -----==->3 >3 >NULL
Hash table 4]--------->43 >82 >56------>NULL
Hash table 5]---------> 70 >96— >57 >18
Hash table 6 ]-----==->19 >45 >NULL
Hash table 7 ]-----===> 59 >NULL
Hash table 8 ]-----==->47 >47 >NULL
Hash table 9]=====->NULL
Hash table 10] ====> NULL
Hash table 11]----- — >11 >NULL
Hash table 12]-----— > 77 .NULL
■ SEARCHING ■ 251

Enter the data to be searched =====> 57


Data found: At position 3 in hash table [5]
How many data to be inserted in the hash table ======> 20
The hash table is as follows:

Hash table => 52- ->0------>NULL


Hash table => 53- ->14------>92------>NULL
Hash table => NULL
Hash table => 94------>42- ->NULL
Hash table => 82------> NULL
Hash table =====> n u l l
Hash table ====> 84------>45- ->84- ->45------>NULL
Hash table ====> 98------>72------>NULL
Hash table 8] =====>NULL
Hash table 9] =====>48------>NULL
Hash table 10] ====> 49------> NULL
Hash table 11] ====>11—>37------>NULL
Hash table 12] ====> 25------>25------>NULL
Enter the data to be searched =====> 67
Data not found.
In the next chapter, another important topic, trees, of data structure will be discussed.

EnXiiE^RiiC^I^S^EiiS
1. What are the advantages and disadvantages of the Sequential search algorithm?
2. Binary search, a technique that takes advantage of the stored order of the list, takes an
amount of work 0(log2n). What is the maximum number of probes made by a binary search
in a list of 128 elements?
3. Determine the list size for which binary search becomes more efficient than sequential search?
4. What are the main advantages of indexed sequential search over sequential search?
5. The division hash function H(k) = k mod m, is usually a good hash function if m has no
small divisors. Explain why this restriction is placed on m.
6. A perfect hash function is one that causes no collisions. How many probe(s) is /are needed
to locate an element that has a given key value.
7. When the perfect hashing functions are feasible?
8. Open address hashing method attempts to place second and subsequent keys that hash to
the open table location into some other position in the table that is unoccupied. What are the
main drawbacks of this method?
252 ■ DATA STRUCTURES USING C ■

9. External chaining has a linked list associated with each hash table address. Each element is
added to the linked list at its home address. What are the advantages of external chaining
over open address hashing technique?
10. With a table size 50000, after how many insertion operations does hashing with open ad-
dressing display the same behavior as binary search?
TREES

For processing dynamic lists it has been seen that the linked-list data structure is very useful. It
imposes a structure where an element may have a predecessor and a successor. But many natu-
ral applications require data structures, which gives the flavour of a hierarchy. This hierarchical
nature is not available in the linked data structure that we have studied earlier. However, the
data structure, the tree, imposes a hierarchical structure on a collection of elements. In this
chapter we will consider trees together with the operations on them and their applications.

12.1
A tree consists of a finite collection of elements called nodes or vertices, one of which is distin-
guished as root, and a finite collection of directed arcs that connect pairs of nodes. In a non-
empty tree the root node is having no incoming arcs. The other nodes in the tree can be reached
from the root by following a unique sequence of consecutive arcs. A NULL tree is a tree with no
node.
The roots in the natural trees are in the ground and grow their branches upwards in the
air. Although trees derive their name from such natural trees, computer scientists usually por-
tray tree data structures in the upside-down form of natural trees, that is, the root at the top and
its growing branches of it downwards.
For illustration, the conventional method of portraying a tree is shown below. In this
picture the direction of the arcs is not shown but it is assumed that the direction of all the arcs is
downwards.
254 ■ DATA STRUCTURES USING C ■

The tree on the previous page has twelve nodes with root node as a at the top, and having
no incoming arcs. The nodes such as i, e, j, k, 1, h do not have any outgoing arc and are
called leaves. The nodes that are directly accessible from a given node are called children of that
node. A node is a parent of its children. For example, d, e, f are the children of b, while b is the
parent of d, e and f. The nodes d, e, f have the common parent b and so they are siblings. If
there is a path from node nl to node n2 we say nl is an ancestor of n2 and n2 is a descendant of
nl. For example, c is an ancestor of 1 and 1 is a descendant of c. Clearly, a node is an ancestor
and descendant of itself. An ancestor or descendant, which is different from itself, is known as
a proper ancestor or proper descendant, respectively. So the other way of defining a leaf is a
node with no proper descendant.
The depth or level of a node may be defined as follows: the depth of the root node is 0 and
the depth of any other node in the tree is the depth of its parent plus 1, that is, depth of a node
is actually the length of the unique path from root to that node. The height of a tree is the depth
of the node that is at the largest depth in the tree plus 1.
In the following sections we will discuss a more specific type of tree: the binary tree. This
type of tree has many applications and has different forms. In a binary tree there are at the most
two outgoing arcs from a node. If every intermediate (non-leaf) node in a binary tree has exactly
two non-empty children, then it is a strictly binary tree. In case in a strictly binary tree all the
leaves are at a same depth d say, it becomes a complete binary tree. The binary tree in Fig. 12.1
is a complete binary tree of depth 3 and hence also a strict binary tree.

Fig. 12.1 A complete binary tree of depth 3

Clearly, in a binary tree if there are b nodes at depth d then at depth (d+1) there is a
maximum of 2b nodes. So in a complete binary tree of depth d there are 2dleaf nodes. Hence the
total number of nodes in a complete binary tree of depth d is given by t where
t = l + 2 J+ 2 2+ ... + 2 d
= 2<d+1) - 1
Fig. 12.2 Each path marked in a binary tree

Trees in which every node has a maximum of two children are called binary trees. For example,
a binary tree is shown in Fig. 12.2. Each path from the root to a leaf node of this tree corresponds
to a particular outcome in flipping a coin three times in succession. For instance, the path A-C-
F-L represents the outcome THH in the diagram.
A formal recursive definition of a binary tree may be written as follows:
A binary tree is either empty, that is, it is a binary tree with no nodes, or consists of a node called
the 'root' which has two pointers to two different binary trees called Teft subtree' and 'right
subtree'.
Note that left subtree and right subtree are two different binary trees and hence we dis-
tinguish between them, that is, the binary trees shown in Fig. 12.3 are two distinct binary trees
in the sense that the root of the binary tree in Fig. 12.3(a) has an empty right subtree while the
root of the binary tree in Fig. 12.3(b) has an empty left subtree. It is also to be noted that a binary
tree having n nodes has (n-1) arcs (edges).

The above definition suggests a natural implementation of binary tree using linked structure.
Each node in the binary tree has two links, one pointing to the left subtree of the node (this link
will be NULL if the left subtree is empty) and the other pointing to its right subtree (NULL if
empty right subtree), together with an information part. Since each node of the binary tree can
be reached from the root by following a unique path from the root, a pointer to the root of the
256 ■ DATA STRUCTURES USING C ■

binary tree will allow us to access any node of the binary tree. So we make the following decla-
rations to implement a binary tree:
typedef struct treetype{
int info; /* Information part */
struct treetype *left; /* Pointer to left subtree */
struct treetype *right; /* Pointer to right subtree */
} treenode;
typedef treenode *nodeptr;
nodeptr tree; /* Pointer to the root of the binary tree */
In the above declaration we assumed the information part of each node in the binary tree
as an integer value. Instead of using int as the typename of the information part one should
replace the suitable typename that fits the actual problem domain. For illustration, consider the
following binary tree with integers as the information part of each node.

In the above picture the pointer that is not linked to a node is a NULL pointer. This style
is followed throughout the chapter.

12.3 TRAVERSALS OF BINARY TREE


-■£/.<t&'s/i'ift"y sV-A i.v'v & ’4. „ -•'/V.. <cV &/> i>.v2 v v § *V $$$%$& A/* / ' \ 0' >k A" W© M

Binary trees are used to organize data so that they can be accessed very efficiently in many
applications. Most of such applications need to be able to walk through all the nodes of the
binary tree by visiting it exactly once. This walking through the tree is referred to as the process
of traversing the binary tree. Clearly, a linear order of nodes is the outcome of a complete tra-
versal of a binary tree and this linear order depends on the traversal algorithm. The definition of
■ TREESM 257

binary tree suggests that a traversal algorithm requires the following three basic steps:
(a) visit a node (denoted in this book as T),
(b) traverse the left subtree (denoted as L), and
(c) traverse the right subtree (denoted as R).
Obviously, the orders in which these steps are performed determine the order of the
node visited in the tree. Moreover, there are 3! different orders in which these steps can be
performed. These are given by
LTR
TLR
LRT
RTL
RLT
TRL
The ordering LTR corresponds to the following traversal algorithm:
if (the binary tree is non-empty)
{
traverse left subtree;
visit the root;
traverse right subtree;
}
It can easily be understood that the last three orders are very much similar to the first
three orders in the above list. The first three orders are the principal orders of traversal and
named as follows:
LTR — inorder traversal
TLR — preorder traversal
LRT — postorder traversal
It is simple to implement either of these algorithms by writing a recursive function. Be-
fore writing the functions we assume the existence of a function visi t (treenode *node)
which performs the desired task (visit) for the node pointed by the pointer node. For our pur-
pose let us consider the function vi s i t as in the following, which prints the information part of
the node visited.
void visit(treenode *node)
{
printf ("%d\t", node->info);
}
We also assume that the pointer tree points to the root of the tree. The three traversal
functions inorder, preorder, and postorder are listed below in examples 12.1, 12.2 and
12.3 respectively.
258 ■ DATA STRUCTURES USING C ■

Example 12.1

/* Inorder traversal function*/


void inorder(treenode *tree)
{
if (tree)
{
inorder (tree->left);
visit (tree);
inorder (tree->right);
}
}
Example 12.2

/* Preorder traversal function */


void preorder(treenode *tree)
{
if (tree)
{
visit (tree);
preorder (tree->left);
preorder (tree->right);
}
}
Example 12.3

/* Postorder traversal function */


void postorder(treenode *tree)
{
if (tree)
{
postorder (tree->left);
postorder (tree->right);
visit (tree);
}
}
To visualize that the names preorder, inorder, and postorder are appropriate, consider
the following arithmetic expression A*B+C/D-E. Notice that this is an infix expression and
uses only binary operators (+, -, *, /). A binary tree may be used to represent this expression.
■ TREES ■ 259

The binary tree that represents an expression is known as an expression tree. The expression
tree for the above expression is shown in Fig. 12.4.

Fig. 12.4 Expression tree for the infix expression A * B + C/D - E

From Fig. 12.4 it can be seen that all the operands of the expression appear at the leaf of
the binary tree. Moreover, the preorder traversal of this tree gives the expression
+*AB-*CDE
which is the prefix expression. The inorder traversal generates the original infix expression
A * B + C / D - E . It can also be seen that the postorder traversal of the tree generates the
postfix(RPN) expression A B * C D / E - + .
At this point we should be little careful to notice that different binary trees may have the
same traversal sequence. For example, the inorder traversals of the tree in Fig. 12.5(a) and
Fig. 12.5(b) have the same sequence x y z w. It is clearly seen that though

Fig. 12.5 Same inorder sequence of two different trees

the inorder traversal sequences of these two trees are the same, their preorder traversals y x w
z and w z y x are not the same. In fact, there is only one binary tree corresponding to a given
inorder and preorder sequence. The construction of such a binary tree is also interesting. Let us
construct a binary tree for which the inorder traversal sequence is given as x y z w and the
preorder sequence as y x w z.
From the preorder sequence it is clear that the root of the binary tree contains y and it is of
the form as shown on the next page, where T1 and T2 are the left and right subtree, respectively.
260 ■ DATA STRUCTURES USING C ■

From the above tree it can be said that the inorder sequence of this is of the following
form: (inorder sequence of T1 subtree) y (inorder sequence of T2 subtree).
This form, when matching with the given inorder sequence, tells us that the inorder sequence of
T1 subtree is x and the inorder sequence of T2 subtree is z w. This, in turn, concludes that T1 is
a single node subtree containing x, but T2 is having more than one node. Clearly again from the
given preorder sequence we can conclude that T2 subtree has a preorder sequence w z since
T1 is a single node subtree containing x. Thus the subtree T2 may be drawn as below which is
having its root containing w with a left subtree T3 and a right subtree T4. Obviously its inorder
traversal is of the form

(inorder sequence of T3 subtree) w (inorder sequence of T4 subtree) and


is given by z w. Hence the inorder sequence of T3 is z and the inorder sequence of T4 is NULL.
So the required binary tree will look like

THREADED BINARYTREE
Binary tree traversals are important operations in many applications. For an inorder traversal,
we have seen, one has to recursively traverse the left subtree first, then visit the root, and finally
traverse the right subtree. Clearly, because of the recursive nature of the routine an implicit
stack is maintained which allows to backtrack whenever necessary. So, for backtracking pur-
pose an explicit stack must be maintained by a routine that implements inorder traversal non-
recursively. Threaded binary tree is nothing but a simple modification to the ordinary
binary-tree structure. This modification allows inorder traversal of a binary tree without any
recursion that uses no stack.
■ TREES ■ 261

In a binary tree the inorder successor of a node M is the node N that appears immediately
after the node M in the inorder traversal of the binary tree. In recursive algorithms this inorder
successor can easily be found by automatic backtracking. As we mentioned earlier, to do this
backtracking a non-recursive algorithm must maintain an explicit stack. A careful look at the
binary tree tells that if the right subtree of a node is non-empty then its inorder successor must
be found in this right subtree, otherwise we need to do backtracking to find it out. In this case,
since the right subtree of the node is NULL, that is, the right link of the node is NULL, a threaded
tree uses this link to keep the pointer that points to its inorder successor. Here we must be
careful that this link should not be confused with the link to its right subtree (because actually
there is no such right subtree for the node). This link is known as a thread. To identify that this
is just a thread and not a link to its right subtree, an extra logical field is added to each of the tree
nodes and may pictorially be shown as in Fig. 12.6.

Fig. 12.6 Node structure of a threaded binary tree

Thus a node in threaded binary tree may be defined as follows:


typedef struct treetype{
int info;
struct treetype *left;
struct treetype *right;
int thread; /* thread=l implies right link is a thread
and is not pointing to right subtree */
} treenode;
typedef treenode *nodeptr;
For illustration, consider the following binary tree.
262 ■ DATA STRUCTURES USING C ■

The binary tree with threads equivalent to this binary tree is shown below.

A function to implement the inorder traversal in a threaded binary tree may be written as
in example 12.4:
Example 12.4

thread_in(nodeptr treeptr)
{
nodeptr ptrl, ptr2;

ptrl = treeptr;
do {
ptr2 = NULL;
while (ptrl != NULL)
{
ptr2 = ptrl;
ptrl = ptrl->left; .
}
if (ptr2 != NULL) /* Non Null treeptr */
(
printf("%d\n", ptr2->info); /* Visit node */
ptrl = ptr2->right;
while (ptrl != NULL && ptr2->thread) |
{ /* Visit inorder successor through thread */
■ TREESM 263

printf("%d\n", ptrl->info);
ptr2 = ptrl;
ptrl = ptrl->right;
}
}
} while (ptr2 != NULL);
}

12.5 BINARY SEARCHTREES


Binary search tree (BST) is an important data structure which is a special type of binary tree
(having specific properties) used mainly in applications where tree sorting and searching is
necessary. To illustrate the properties of a binary search tree consider the binary tree shown in
Fig. 12.7. This binary tree has nodes that correspond to the records whose keys are alphabetic
names (some of the characters from the Mahabharata) given by (in alphabetical order) Arjuna,
Bhima, Bidur, Draupadi, Duryodhana, Ganga, Judhistra, Kama, Krishna, Nakul, Sahadeva,
Sakuni and Viswa.

Fig. 12.7 A Binary Search Tree

Though the binary tree in Fig. 12.7 is little awkward, it has the property that the
value(name) in every node in this tree is such that the root node of its left subtree holds a value
which is either NULL or less than (in lexicographic order) the node and the root node of its right
subtree holds a value which is either NULL or greater than (in lexicographic order) the node.
Because of this property it can be seen that if we traverse this binary tree using inorder traversal
we visit the nodes in the order Arjuna, Bhima, Bidur, Draupadi, Duryodhana, Ganga, Judhistra,
Kama, Krishna, Nakul, Sahadeva, Sakuni and Viswa, which is nothing but the alphabetic order.
264 ■ DATA STRUCTURES USING C ■

A careful look at Fig. 12.7 concludes the following observations to this binary tree.
• Each different node has a distinct value(name), that is, no two nodes in this binary tree
have the same value.
• The value of every node in this binary tree is greater than (in lexicographic order) the
value in its left child (if it exists).
• The value of every node is less than the value in its right child (if it exists).
A binary tree having this property is called a binary search tree (BST). Once the informa-
tion is organized in a BST, the search process becomes simpler. For example, if we want to
search whether the name Judhistra is there in the list one may traverse the tree in the following
manner.
Begin the search by traversing the root of the tree.
Is it Judhistra?
Response is no. (It is Nakul)
Is it less than Judhistra?
Response is no. (Means it is greater than Judhistra)
Traverse the left subtree by traversing its root.
Is it Judhistra?
Response is no. (It is Ganga)
Is it less than Judhistra?
Response is yes.
Traverse the right subtree by traversing its root.
Is it Judhistra?
Response is no. (It is Karna)
Is it less than Judhistra?
Response is no. (Means it is greater than Judhistra)
Traverse the left subtree by traversing its root.
Is it Judhistra?
Response is yes.
Obviously, if we try to search for the name Jayadhrata in this BST, the response of the last
comparison will be 'no' and as the name Jayadhrata is less than Judhistra in lexicographic or-
der, the left subtree of Judhistra is to be searched. Since the left subtree is NULL, we conclude
that Jayadhrata is not present in the BST. In this case since the check that the left subtree is
NULL, we see that one more comparison is necessary. A BST may be implemented as follows:
typedef struct treetype{
char info[15]; /* Node information type */
struct treetype *left;
struct treetype *right;
} treenode;
typedef treenode *bst__ptr;
bst_ptr rootptr; /* Pointer to the root of the BST */
■ TREESM 265

A function to search a key in a binary search tree may now be implemented as in the
following example 12.5. The function b s t _ s e a r c h () is searching for a node with specified
key in a BST. It accepts two arguments:
(i) the pointer r o o t p t r to the root of the BST, and
(ii) an i tem indicating the node information key
If the search is successful, the function returns a pointer to the node containing the speci-
fied item. If not successful, it returns a NULL pointer. The function that follows is a recursive
function. Its main drawback is that in some situations it may require a very large stack space.
Example 12.5

bst_ptr bst__search(bstjptr rootptr, char item[])


{
bst_ptr curptr; /* Current search pointer */
int 1;
if (rootptr != NULL) /* If the BST is non-empty */
{
curptr = rootptr;
if ( (l=strcmp(item, curptr->info)) < 0 )
/* Search left subtree */
curptr = bst__search (curptr->left, item);
else if (1 > 0)
/* Search right subtree */
curptr = bst_search(curptr->right, item);
else
return curptr;
}
return NULL;
}

12.5.1 Building a Binary Search Tree


Let us now concentrate on the process of creating a BST from scratch. For illustration we con-
sider the data set from which the BST will be created as the set of names used earlier stored in an
initialized array of pointers given below.
ch ar *a [] = {
"A rju n a" ,
"B h im a",
" B id u r " ,
"D rau padi",
"Duryodhana",
266 ■ DATA STRUCTURES USING C ■

" Ganga",
" Ju d h is tr a ",
"K am a" ,
" K r is h n a " ,
"N akul",
"S a h a d e v a ",
"S a k u n i",
"Viswa"
};
We write a function bu i ld _b s t () to build a BST from this array(data-set). The function
should receive the following arguments: pointer to pointer to the root node of BST, possibly a
NULL pointer at the beginning; a pointer to array containing the data set, and finally the array
size, that is, the number of elements in the data set.
The function b u i ld _b s t () may be implemented as in the following.

build_bst (bstjptr *rootptr, char *dat/a[], int size)


{
int index;
for (index=0; index<size; index++)
insert_bst(rootptr, data[index]);
return;
}
Basically, the above function simply loops through the elements to insert that element in
the BST using the function i n s e r t _ b s t (). Clearly, as this i n s e r t _ b s t () function actually
allows to insert an element into a BST, it must receive the following arguments: pointer to pointer
to the current subtree node (the root of the current subtree) under which the element is to be
inserted; and the element to be inserted. In our case, as it is a string of characters (a name), this
argument is a pointer to the string.
The function in s e r t _ b s t ( b s t_ p t r * r o o t p t r , ch a r ^elem ent) first checks if
*rootptr is NULL to identify whether the BST is empty or not. If so, then insert the element as the
root of the BST and accordingly set the new value of *rootptr. If the original BST is not empty
then compare the element with root (that is, the root of each subtree) and move to the left or
right subtree depending on whether the element is smaller or larger than the root of each subtree.
In case the element matches with the root, that is, there is a node with the same element, it
simply returns from the function. Our implementation is a recursive one, though in some cases
it may require a large stack. The function in s e r t _ b s t () is listed in example 12.6.
Example 12.6
insert_bst (bst__ptr *rootptr, char ^element)
{ /* A recursive version */
bst_ptr newptr;
int c;
■ TREES■ 267

if (*rootptr == NULL)
{
newptr = (bst_jptr) malloc (sizeof (treenode) ) ;
newptr->left = NULL;
newptr->right = NULL;
strcpy(newptr->info, element);
*rootptr = newptr;
}
else
{
if ( (c=strcmp (element, (*rootptr)->info)) == 0)
return; /* Element already present in the BST */
else if (c<0) /* Insert element to the left subtree */
insert__bst( & (*rootptr)->lef t , element );
else /* Insert element to the right subtree */
insert__bst( & (*rootptr)->right, element );
}
}
With respect to time an iterative component of this function is more advantageous. An
iterative function insert_bst_i ter () that searches a binary search tree and inserts a new
element into the tree if the search is unsuccessful may be implemented as shown in example
12.7. This time let the arguments to the function be the following.
(i) A pointer to the root of the BST, instead of a pointer to pointer to the root as in the case of
recursive version given earlier.
(ii) The element to be inserted.
The function returns a pointer to the inserted node if successful, otherwise returns the
pointer to the node where the element is already present. It may be written as follows:
Example 12.7

bst_ptr insert__bst__iter(bst_ptr rootptr, char *element)


{
bst_ptr ptrl,ptr2,newptr;
int c;

ptrl = rootptr; .
ptr2 = NULL;
while (ptrl != NULL)
{
if ((c=strcmp(element, ptrl->info)) == 0)
268 ■ DATA STRUCTURES USING C ■

/* Element already present in the BST */


return ptrl;
ptr2 = ptrl;
ptrl = (c < 0)? ptrl->left: ptrl->right;
}
newptr = (bst_jptr)malloc (sizeof (treenode) ) ;
newptr->left = newptr->right = NULL;
strcpy(newptr->info, element);
if (ptr2 == NULL)
rootptr = newptr;
else if ((c=strcmp(element, ptr2->info)) < 0)
ptr2->left = newptr;
else
ptr2->right = newptr;
return newptr;
}
It may easily be noted that after insertion of a node into a BST using the above routine the
resulting tree still remains to be a BST.

12.5.2 Deleting a Node from a Binary Search Tree


Deletion of a node from a BST is more complex than insertion. We need to consider three cases
to delete a node from a BST. These are as follows:
(i) The node to delete is a leaf.
(ii) The node to delete has only one non-empty child (either left or right).
(iii) The node to delete has two non-empty children.
The first case is very simple. Since the node to delete has no child, it may simply be
deleted by setting the appropriate (left or right) pointer in its parent node to NULL, and freeing
the node itself.
For example, to delete the leaf node containing 111 from the BST in Fig. 12.8 we simply
set the right pointer in its parent containing 99 to NULL and free the node 111. The BST after
deletion looks like in Fig. 12.9.
The second case is also very simple. For example, if we want to delete the node contain-
ing 35 (having one non-empty child), from the BST in Fig. 12.9, we need to set the right pointer
in the node containing 25 (that is, the appropriate pointer of the parent node of the node to
delete) to the node containing 39 (that is, the only child of the node to delete). After deleting the
node with 35 the BST takes the form as shown in Fig. 12.10.
These two cases discussed above can be combined together to a function where the node
to be deleted has at the most one subtree. It can easily be understood that such a function re-
quires (i) the pointer to the node to delete (say delptr), and (ii) the pointer to the parent of the
node to delete (say pptr).
■ TREESm 269

Fig. 12.9 BST configuration after deletion of 111 from Fig. 12.8
270 ■ DATA STRUCTURES USING C ■

So, to delete a node from a BST for a given information part, we must first search the BST
to get the above two pointers and then only we can delete the node from the BST. Our algorithm
uses the following variables whose purpose are listed below.
delptr: pointer to the node to delete.
pptr: pointer of the parent of the node to delete.
auxptr: an auxiliary pointer that will be set to the pointer to the node by which delptr will be
replaced.
The following function in example 12.8 deletes the node for which the information part is
given by val(an integer value) say, from the BST whose root is pointed by the argument tree. If
such a node is not found, no node is deleted and simply returns leaving the BST unaltered. If the
node is found in the BST, the function deletes the node from it and returns the pointer to the root
of the modified BST (after deleting the node) which is also a BST.
Example 12.8
bst_ptr bst_delete(bst_ptr tree, int val)
/* tree is the pointer to root of the BST *7
/* val is the information part of the node to delete */
{
bst_ptr delptr, pptr;
bst_ptr auxptr; /* An auxiliary pointer */
/* Initialize */
delptr = tree;
■ TREES ■ 271

pptr = NULL;
/* Find node with value val(if any). If found set delptr to
point to the node and pptr to point to its parent(if exists) */
while (delptr != NULL && delptr->info != val)
{
pptr = delptr;
delptr = (val < delptr->info)? delptr->left: delptr->right;
}
if (delptr == NULL) /* Val not found */
return tree;
/* We assume delptr has at the most one child */
/* Set auxptr to point to the node by which the node pointed by
delptr will be replaced */
auxptr = (delptr->left == NULL)? delptr->right: delptr->left;
if (pptr == NULL) /* Node to be deleted is the root itself */
tree = auxptr;
else
(delptr ==pptr->left)? pptr->left = auxptr: pptr->right = auxptr;
free(delptr);
return tree;
}
T h efu n ctio n b st_d elete () given above considers only the first two cases of deleting a
node from a BST. The last case where the node to be deleted having two non-empty children is
yet to be considered. This case can also simply be reduced to one of the first two cases. This can
be achieved by replacing the node to be deleted by its inorder successor (or predecessor) and
deleting this inorder successor (or predecessor) from the BST. The inorder successor (or prede-
cessor) of the value stored in a given node is the successor (or predecessor) of the node in the
inorder traversal of the tree. For illustration, consider the tree in Fig. 12.10 where we want to
delete the node in which the stored value is 72. Clearly, this node is having two nonempty
children. Now we need to locate the inorder successor of this node. A careful look at inorder
traversal process claims that we can reach the inorder successor of a node by starting at its right
child and then descending through the left child as far as we can. In our case this node is the
node with value 77, and is pointed by isucc in Fig. 12.10. To delete the node with value 72, we
first replace the content of this node by the content of its inorder successor node (that is, the
node with value 77). Then we need only to delete this inorder successor. Obviously, this succes-
sor node will always have an empty left subtree. So this deletion can simply be done by one of
the first two cases. After replacing the node with value 72 by the value of its inorder successor
node the tree will look as in Fig. 12.11.
272 ■ DATA STRUCTURES USING C ■

Fig. 12.11 BST configuration after replacement by inorder successor

Finally, after deleting this inorder successor the tree takes the form depicted in Fig. 12.12.

Fig. 12.12 BST configuration after deleting inorder successor

Combining this case with the function bs t_delete () we wrote earlier, the function is
finally written as in example 12.9.
Example 12.9

bst_ptr bst__delete (bst_ptr tree, int val)


/* Tree is the pointer to the root of the BST */
/* Val is the information part of the node to delete */
■ TREESM 273

{
bstjptr delptr, pptr, auxptr;
bstjptr pauxptr, lptr;

delptr = tree;
pptr = NULL;
/* Find node with value val(if any). If found, set delptr to
point to the node and pptr to point to its parent node */
while ( delptr!=NULL && delptr->info!=val)
{
pptr = delptr;
delptr = (val<delptr->info)? delptr->left: delptr->right;
}
if (delptr == NULL) /* Val not found */
return tree; /* Tree unchanged */
/* Set auxptr to point to the node by which the node delptr
will be replaced */
if (delptr->left == NULL)
auxptr = delptr->right;
else if (delptr->right = = NULL)
auxptr = delptr->left;
else { /* delptr has two nonempty children */
/* Set auxptr to the inorder successor of delptr and
pauxptr to the parent of auxptr */
pauxptr = delptr;
auxptr = delptr->right;
lptr = auxptr->left;
while (lptr != NULL)
{
pauxptr = auxptr;
auxptr = lptr;
lptr = auxptr->left;

/* auxptr is now pointing to inorder successor of delptr */


if (pauxptr i= delptr)
{ /* delptr is not parent of auxptr and so auxptr->left
equals pauxptr */
pauxptr->left = auxptr->right;
274 ■ DATA STRUCTURES USING C ■

auxptr->right = delptr->right;
}
auxptr-yleft = delptr->left;
}
if (pptr == NULL)
tree = auxptr;
else
(pptr->left==delptr)? pptr->left=auxptr:pptr->right = auxptr;
free(delptr);
return tree;
}

In the earlier sections we have discussed binary trees, threaded binary trees, and binary search
trees. From the above discussions it is easy to understand that in case of a BST the search time is
0(log n) in the best case but in the worst case (where the BST is degenerated) the search time is
O(n). This suggests that if the tree is uniform in nature the search time for a node will be re-
duced. The performance of an algorithm using trees mostly depends on how quickly we can
search for a node in the tree and so the reduction in search time of a node in a tree is of great
importance.
An AVL tree is basically a BST in which the heights of the two subtrees of each node in
the tree never differ by more than one. That is why it is also called as height balanced tree. The
name AVL tree is after the two Russian mathematicians, G M Adelson-Velskii and E M Landis,
who discovered this sort of trees in 1962.
It is easy to visualize that the height of an ordinary binary tree can be O(n) where n is the
number of nodes in the tree and is degenerated to one side. Hence the search time for a node in
such type of tree is also O(n). Obviously, as n increases, the performance degrades. In case of
AVL trees, the maximum number of nodes n is given by n = 2h- 1 , where h is the height of the
tree. So h = log2(n+l)=0(log2n). Hence the time to search for a node traversing the full height of
an AVL tree having n nodes from the root to a leaf node is 0(log2n) and is significantly less than
O(n).
An AVL (height balanced) tree must keep information about the balance between the
heights of the left and right subtrees of each node in the tree. The balance of a node is defined as
the difference between height of its left subtree and the height of its right subtree, that is,
balance = height of left subtree - height of right subtree
The operations on a BST and an AVL tree are very much similar except for insertion of a node
into an AVL tree and deletion of a node from an AVL tree. After insertion and deletion of nodes
into and from an AVL tree the balance is maintained. The above discussion suggests the imple-
mentation of an AVL tree as follows.
typedef int data_type;
typedef struct avl {
■ TREES ■ 275

data_type* info;
struct avl *left;
struct avl *right;
int balance;
} avlnode;
typedef avlnode *avlptr;

12.6.1 Inserting a Node into an AVL Tree


Obviously, the insertion of a node into a NULL tree(AVL) is trivial. In case where the AVL tree
is not NULL we need to determine the search path, the path from the root node to the node
where the insertion should take place. Clearly, if the node to be inserted is already found in the
AVL tree, the insertion is skipped. Otherwise, the node is inserted to the AVL tree just as we did
in case of binary search tree in Section 12.5. After inserting the node into the AVL tree, the tree
may not remain height balanced. If it becomes unbalanced, as part of insertion, this unbalanced
tree is then rebalanced to ensure the height balance of the AVL tree.

Fig. 12.13 Different situations of unbalancing in an AVL tree

At this point we must first determine what are the situations where an insertion of a node
makes an AVL tree unbalanced. To visualize this let us consider a tree in Fig. 12.13. The figure
indicates different insertion points by using dashed lines into an AVL tree. Each node in the tree
in the above figure is named with an alphabet. Each node of the AVL tree holds the balance
value within it before any insertion. At each insertion point a square box is connected with a
dashed line. The value H within a square box indicates that if an insertion takes place at the
corresponding point the AVL tree will remain height balanced and the value U indicates that
because of insertion the tree will be unbalanced. A careful look at Fig. 12.13 claims that the tree
becomes unbalanced only in two situations.
276 ■ DATA STRUCTURES USING C ■

These situations are given in the following.


• When the newly inserted node is a left descendant of a node that previously had a bal-
ance of 1.
• When the newly inserted node is a right descendant of a node that previously had a
balance of -1.
In Fig. 12.13 the unbalance of the tree for insertion of a node at nodes n, o, and m is due to
the reason mentioned in the first case, and the same at nodes p, q, and 1is due to the second case.
The youngest ancestor that becomes unbalanced at each insertion is listed below.
• d becomes unbalanced for insertion at n and o.
• e becomes unbalanced for insertion at p and q.
• f becomes unbalanced for insertion at 1.
• g becomes unbalanced for insertion at m.
So from now we will concentrate on the subtree that is rooted at the youngest ancestor
and becomes unbalanced as a result of insertion of a node. Firstly, consider the balance of this
youngest ancestor as 1. Let the youngest ancestor that has become unbalanced be P and the root
of its left subtree be Q. Clearly Q is non-null since P has a balance of 1. It should also be noted
that since P, the youngest ancestor of the new node, has become unbalanced the node Q must be
of balance 0 before insertion, that is, both the left and right subtrees of Q are of the same height
n (say). Clearly in that case, the height of the right subtree of P should also be n because of
height balance property. In fact, these subtrees of height n may also be NULL, that is, the value
of n may also be 0. A careful analysis claims that P becomes unbalanced in only two situations
illustrated in Fig. 12.14. In Fig. 12.14(a) the new node is inserted into the left subtree of Q so that
it changes the balance of Q to 1 and balance of P to 2. In the other case the new node is inserted
into the right subtree of Q so that it changes the balance of Q to -1 and balance of P to 2 which is
shown in Fig. 12.14(b).

Subtree Subtree
of of
height height
n(Tl) n(T2)
Subtree Subtree
of of
height height
Newly inserted node n -1 n -1
(T2) (T3)
i
i

(a) (b) Newly inserted node

Fig. 12.14 Unbalancing situations


■ TREESm 277

These unbalanced trees must now be rebalanced to maintain the height balance property.
At this point we must also ensure that after rebalancing the inorder traversal of the tree and the
same for the previous unbalanced tree must match. Before rebalancing, let us shift our concen-
tration a little for the time being. We will come back into our main discussion shortly.
Consider the tree rooted at a shown in Fig. 12.15(a). If we apply a left rotation to this
rooted tree at a we will get the tree as shown in Fig. 12.15(b). Notice that the root

Fig. 12.15 Left and right rotation of rooted trees

of this new tree is changed to c but the inorder sequences of both the trees are the same. An
algorithm to implement this left rotation to a tree rooted at x (pointer to the root of the subtree)
may simply be written as in the following.
y = x->right;
save = y->left;
y->left = x;
x->right = save;
For illustration, the right rotation of the rooted tree in Fig. 12.15(a) is shown in Fig. 12.15(c).
Here the root of the rotated tree is changed to b. But again the inorder sequence of this rotated
tree is same as that of the tree in Fig. 12.15(a).The algorithm to implement the right rotation to a
tree rooted at x may be written as in the following.
278 ■ DATA STRUCTURES USING C ■

y = x->left;
save = y->right;
y->right = x;
x->left = save;
We have seen that both left and right rotations of trees are preserving the inorder se-
quence. So to rebalance an unbalanced tree we may apply any number of left or right rotations
to the unbalanced tree which will also preserve the inorder sequence.
Now let us turn our concentration to the main discussion. To maintain the height balance
property of the tree in Fig. 12.14(a) we simply apply a right rotation on the subtree rooted at P.
This right rotation will yield a new subtree rooted at Q shown in Fig. 12.16. The figure also
shows the balance value of the nodes after rotation.

Fig. 12.16 Tree after right rotation of three in Fig. 12.14(a)

The unbalanced subtree in Fig. 12.14(b) requires a little more attention. Here the new node is
inserted into the right subtree of Q. Let R be the root of the right subtree of Q. Apparently, there
are three different situations for insertion. But they are analogous to each other. These three
situations are
• R itself may be the newly inserted node (arises when n=0).
• New node is inserted into the left subtree of R.
• New node is inserted into the right subtree of R.
We discuss the situation in Fig. 12.14(b) that illustrates the case where the new node is
inserted into the left subtree of R. At first we give a left rotation to the subtree rooted at Q. After
this rotation the subtree rooted at P will look like in Fig. 12.17(a).
Note that after this rotation the inorder sequences of the subtrees in Fig. 12.14(b) and
Fig. 12.17(a) are same. To rebalance, finally we give a right rotation to the subtree rooted at P
shown in Fig. 12.17(a). After this right rotation the final subtree will look like in Fig. 12.17(b).
Here also it should be noted that the inorder sequence of this subtree and that of the subtree in
Fig. 12.14(b) are the same. The other case is very much similar (symmetrical) to the above and is
left to the reader. A C function a v l _ i n s e r t () is presented below. The function allows to
search and insert an element into an AVL tree. The implementation of a node in AVL tree is
discussed earlier in this section. The function shown in example 12.10 returns the pointer to the
root of the AVL tree after node insertion.
■ TREESM 279

Example 12.10
avlptr avl__insert (avlptr tree, data_type item)
{
int imbal_dir; /* Imbalance direction */
avlptr p = tree,
y = p, /*y points to the youngest unbalanced ancestor*/
cy = NULL, /* cy points to the child of y */
py = NULL, /* py points to the parent of y */
pp = NULL, /* pp points to the parent of p */
q = NULL; /* q points to the new node */

/* Search into the AVL tree(also a BST) */


while ( p != NULL )
{
if ( p->info == item )
return(p);
q = ( item < p->info )? p->left: p->right;
if ( q != NULL ScSc q->balance != 0 )
{
py = p;
y = q;
}
280 ■ DATA STRUCTURES USING C ■

pp = p ;
P = q;
}
/* Insert new record into the BST */
q = (avlpt.r)malloc ( sizeof (avlnode) );
q->info = item;
q->left = q->right = NULL;
q->balance = 0;
(item < pp->info)? pp->left = q: pp->right = q;
/* At this point the balance between q and y must be adjusted
because of insertion */
p = (item < y->info)? y->left: y->right;
cy = p;
while ( p != q )
if (item < p->info)
{
p->balance = 1;
p = p->left;
}
else {
p->balance = -1;
p = p->right;
}
/* Check if the tree has become unbalanced */
imbal_dir = (item < y->info)?l : -1;
if (y->balance == 0)
{ /* Another level is added to the tree and since y is the
youngest ancestor, the tree remains balanced */
y->balance = imbal_dir;
return(q);
}
if (y->balance != imbal_dir)
{ /* Node is inserted to the opposite direction of imbalance,
so the tree remains balanced */
y->balance = 0;
return(q);
}
■ TREESM 281

/* Now it has been found that the tree has become unbalanced. Rebal-
ance the tree by applying required rotations and adjusting the bal-
ance of involved nodes */
/* Note that, q is pointing to the inserted node,
y is pointing to its youngest unbalanced ancestor,
py is pointing to the parent of y,
and cy is pointing to the child of y in the direction of
imbalance */
if ( cy->balance == imbal_dir )
{ /* cy and y are unbalanced in same direction */
p = cy;
(imbal_dir == 1)? rotate_right (y) : rotate_lef t (y) ;
y->balance = 0;
cy->balance = 0;
}
else { /* cy and y are unbalanced in opposite directions */
if (imbal__dir == 1)
{
p = cy->right;
rotate_left(cy) ;
y->left = p;
rotate_right(y) ;
}
else {
p = cy->left;
rotate_right(cy) ;
y->right = p;
rotate_left(y) ;
}
/* Now adjust balance of nodes involved */
if (p->balance = 0)
{ /* p is the node inserted */
y->balance = 0;
cy->balance = 0;
}
else if (p->balance == imbal_dir)
282 ■ DATA STRUCTURES USING C ■

{
y->balance = -imbal_dir;
cy->balance = 0;
}
else {
y->balance = 0;
cy->balance = imbal__dir;
}
p->balance = 0;
}
if (py == NULL)
tree = p;
else if (y = = py->right)
py->right = p;
else
py->left = p;
return(q);
}

12.6.2 Deleting a Node from an AVL Tree


In case the AVL tree consists of a single node the deletion is simple. Otherwise we first delete a
node containing x (say) from an AVL tree (having more than one node) just as we did in case of
a BST. Then we rebalance the tree to maintain the height balance property of the AVL tree.
Consider that the deletion of the node has occurred at the node point by p. Then obvi-
ously the height of the subtree rooted at p has reduced by one. Let pp be the parent of the
node p. Check the balance of pp. We may now rebalance the subtree rooted at pp as we did in
insert, if required. If we find that the height of the subtree rooted at pp is not reduced then
deletion from AVL tree is over. Otherwise, we move one step up towards the root and do the
same. In fact, in the worst case we may have to look at the root. The complexity of deletion is
more than that of insertion (though the same concept of rotations is used) because in case of
deletion we may need as much rebalancing as the path length from the deleted node to the root.
The C implementation of this deletion process is left as an exercise to the reader.

B-trees are basically the general form of BSTs in which a node can have more than one informa-
tion part. These are balanced search trees which are very much useful for external searching
and work well in secondary storage devices. Consider a node that can have a maximum of
m children in the tree. We say that it is a B-tree of order m. In a B-tree all leaf nodes are at the
same level. If a node is having exactly k non-empty children, then the node contains exactly
(k-1) keys. The leaf nodes and the intermediate nodes are distinguishable. Precisely speaking, a
■ TREESM 283

B-tree of order m is maintaining the following properties.


• Each internal node except possibly the root has at the most m non-empty children and at
least [m/2~] non-empty children. The root may have no children or any number of chil-
dren between 2 to m.
• Each internal node may have at the most (m-1) keys and the node will have one more
child than the number of keys it has.
• Within a node all the keys are arranged in a predefined order. All keys in the subtree to
the left of a key are predecessors and the same to the right are successors of the key.
• All leaves are at the same level.
• Leaves and internal nodes are distinguishable.

12.7.1 Generation of a B-tree


For a better understanding let us take an example of how a B-tree is generated gradually by
inserting keys into it. We consider, for simplicity, the keys as integers. Assume that the follow-
ing keys are to be inserted into an initially empty B-tree of order 5 (say). The insertion is to be
done in the order of the keys given. Let the keys in order be as follows:
282 314 307 289 393 299 337 407 354 302 462 347 448 482 293 399 418 468 471 436
Obviously, the first four keys will be inserted into a single node. These keys must be
placed into a proper (sorted) order as they get inserted into the node. This is shown in
Fig. 12.18(a). At this point no more keys can be inserted to this node since we are considering
that the B-tree is of order 5 (recall that a node in a B-tree of order m can have a maximum of
(m-1) keys). So to insert the fifth key 393 the node splits into two and the middle key 307 is
moved up to a newly created node and is the new root of the B-tree. This situation is depicted in
Fig. 12.18(b). Clearly the split nodes are half full and so some more keys can be inserted to it
without any difficulty. At this point we must note that while inserting keys into a node, the keys
are stored in the node in a proper order. It can easily be seen that the next three keys can be
inserted into the tree without any problem, as shown in Fig. 12.18(c). Now the next key 354 will
split the node having four keys in Fig. 12.18(c) into two. In this case the middle key is 354 itself
and it will move up and enter into the root. The status after this insertion is depicted in Fig.
12.18(d). The next four keys 302, 462, 347, and 448 can easily be inserted into the B-tree in a
simple manner as shown in Fig. 12.18(e). Now the rightmost node having four keys splits into
two while inserting the next key 482 and the middle key 448 move up to the root node as shown
in Fig. 12.18(f). The node with four keys in this figure splits into two while the next key 293 gets
inserted into the tree. The middle key is 293 itself and moves up to join the root node in a similar
manner. The next four keys 399, 418, 468, and 471 can now be simply inserted to the tree. The
status of the tree after inserting these five keys is depicted in Fig. 12.18(g).
The insertion process of the last key 436 needs special attention. Clearly, this will first
split the node containing 393,399,407, afid 418 into two. After splitting, the middle key 407 will
move upwards to get entry into the root node which is already holding the keys 293, 307, 354,
and 448 and is full. So, in turn this root node splits into two and their middle key 354 moves
upwards and will create a new root. This final state of the B-tree is shown in Fig. 12.18(h).

• 282 • 289 • 307 • 314 •


(a)
284 ■ DATA STRUCTURES USING C ■
■ TREES ■ 285

Fig. 12.18 Generation of B-tree

The basic operations that can be performed on a B-tree are the following: searching for a
key in the B-tree, inserting a key into the B-tree, deleting a key from the B-tree.
The above discussion suggests an implementation of the B-tree node structure as in the
following.
#define ATMOST „ /* Maximum number of keys in a node */
#define ATLEAST „ /* Minimum number of keys in a
node=rATMOST/2l */
typedef int key_data;
typedef struct node {
int n; /* Number of keys in the node */
key_data key[ATMOST+1];
struct node *bough[ATMOST+1];
} btreenode;
typedef btreenode *btreeptr;
286 ■ DATA STRUCTURES USING C ■

12.7.2 Searching for a Key in B-tree


Searching a B-tree is a generalization of searching in a BST. A function btree_lookup () is
presented. The input arguments to this function are as in the following.
• A key k to be searched into the B-tree.
• A pointer to the root node of the B-tree rootptr (may be a subtree of the B-tree) into
which the searching is taking place.
The function returns the pointer to the node where the search is found or NULL if search
is not found. If the search is found, the position of the key in the node is given by po sition, the
last argument to the function and is a pointer to integer. Unlike BST search, the B-tree search
method must search for the key in a node extensively. This node searching is done in another
function, namely, node_search (). If the key is not found in the node, it must guide the search
through a proper descendant of the node and so on. The functionbtree_lookup () presented
is a recursive function and uses the function node_search (). If node_search () finds the
key in the node, it returns TRUE(l), otherwise returns FALSE(O). In case the key is not present in
the node, node_search () sets the pointer to integer position to point to that index of the
bough array of the node through which we get the right descendant so that the search can be
continued. For a B-tree of large order this function node_search () may be changed a little to
use binary search within the node instead of sequential search. The functionbtree_lookup ()
together with node_search () is given in example 12.11 below.
Example 12.11
btreeptr btree_lookup(key_data k, btreeptr rootptr, int *position)
{
if (rootptr == NULL)
return NULL;
else if (node_search(k, rootptr, position))
return rootptr;
else
return btree_JLookup(k,rootptr->bough[*position],position);
}
int node_search (key__data k, btreeptr ptr, int *pos)
{
if (k < ptr->key[l])
{
*pos = 0;
return 0; /* Return FALSE */
}
else { /* Sequential search within the node follows */
*pos = ptr->n; /* Set the last index */
while (k < ptr->key[*pos] && *pos>l)
(*pos)-- ;
■ TREESM 287

return ( (k = = ptr->key[*pos])?1:0 );
}
}

12.7.3 Inserting a Key into B-tree


Inserting a key into the B-tree is extensively discussed earlier in this section where we consid-
ered the generation of a B-tree. We have already seen that as we insert nodes in a BST, it grows
at the leaf. Unlike BSTs, a B-tree grows at the root as we insert keys into it. This growing at the
root is mainly because of the property that all the leaves are at the same level in a B-tree. In
general, when a record with known keys is to be inserted into a B-tree, first the key is to be
searched into the tree using a process like btree_lookup (). If the key is already found in the
tree the insertion process is skipped. Otherwise the lookup process finds the leaf (L say) into
which the key should be inserted. Now, if this leaf L is not full to its capacity, then insert the
record into the leaf. But if L is full to its capacity, then a new leaf (L' say) is created so that the
keys in L together with the new key are distributed evenly in L and L' and the middle key is sent
upwards, that is, in this situation the leaf L is split into two leaves L and L' (considering that the
order n of the B-tree is odd) such that L contains the first n/2 keys, L' contains last n/2 keys, and
the middle key (must be there for odd n) is inserted into the parent (if the parent is not full). The
same process is continued until we reach the root.
We assume that the key to be inserted, k, is not present in the B-tree. Then the function
btree_inser t () requires the following two arguments: k, the key to be inserted, and rootptr,
the pointer to the root node of the B-tree, and the function returns the pointer to the root node of
the B-tree after insertion is over.
The function btree_insert() is a recursive one and may look like in the
example 12.12.
Example 12.12
btreeptr btree_insert(key_data k, btreeptr rootptr)
{
int moveup; /* 1 or 0 depending on height of B-tree has
increased or not */
key_data newk; /* Key to be reinserted as new root node */
btreeptr newkr; /* Pointer to right subtree of newk */
btreeptr ptr; /* An auxiliary pointer */
moveup = movedown(k, rootptr, &newk, &newkr);
if (moveup)
{
ptr = (btreeptr)malloc(sizeof(btreenode));
ptr->n = 1;
ptr->key[l] = newk;
ptr->bough[0] = rootptr;
ptr->bough[l] = newkr;
288 ■ DATA STRUCTURES USING C ■

return ptr;
}
return rootptr;
}
Note that the function btree_insert () calls the function move down () which is used
to search for the key k in the B-tree pointed by rootptr to find its insertion point. This function
uses four arguments as given below.
• Key k to be searched in the B-tree for insertion.
• p tr , the pointer to the root node of the subtree in which the search takes place.
• pm, pointer to the key to be inserted into a newly created root (in case of splitting).
• pmr, pointer to the new node (the right subtree of pm) after splitting.
The function move down () returns TRUE when the key pointed by pm is to be placed in a new
root node, and the height of the entire tree increases. When the height of the tree does not
increase, the function returns FALSE. In such a situation the function movedown () inserts the
key into the proper node of the B-tree, if required, movedown () is a recursive function and it
terminates when ptr, the pointer to the root node of the subtree in which the search taking place,
is NULL. When a recursive call to movedown () returns TRUE, then an attempt is made to
insert the key pointed by pm to the current node. This is straightforward if there is room for the
key in the current node. Otherwise, the current node *ptr splits into *ptr and *pmr and the
middle key is moved upwards through the tree. The function movedown () in turn uses the
following three other functions.
• node search (), to search for a key within a node as shown earlier.
• putkey (), which allows to put the key into the node *ptr when possible. Actually the
function is called only when *ptr has room for a key.
• spl it (), which splits the node *ptr into *ptr and *pmr.
The movedown (0) function may now be written as given shown in example 12.13.
Example 12.13
movedown(key__data k, btreeptr ptr, key_data *pm, btreeptr *pmr)
{
int b; /* On which bough to continue the search */

if (ptr == NULL)
{
*pm = k;
*pmr = NULL;
return TRUE;
}
else {
if ( !nodesearch(k, ptr, &b))
if (movedown(k, ptr->boughfb], pm, pmr))
■ TREES ■ 289

if (ptr->n < ATMOST)


{
putkey(*pm, *pmr, ptr/ b ) ;
return FALSE;
}
else {
split(*pm, *pmr, ptr, b, pm, pmr);
return TRUE;
}
return FALSE;
}
}
The function pu tkey () puts a key into a node *ptr after making room at a proper index
within the node and also sets the pointer to its branches at correct position. This may be written
as in example 12.14.
Example 12.14
putkey(key___data k, btreeptr pmr, btreeptr ptr, int pos)
{
int c;

for(c=ptr->n; c>pos; ~-c)


{
ptr->key[c+1] = ptr->key[c];
ptr->bough[c+l] = ptr->bough[c];
}
ptr->key[c+1] = k;
ptr->bough[c+l] = pmr;
++ptr->n;
}
Finally, the sp 1 i t () function splits the node *ptr with key k and pointer pmr at position
pos into nodes *ptr and *qmr with middle key m. Clearly, the key k cannot be inserted directly
into a node with full capacity. So we must first determine whether k will go to the left or right
half, split the node accordingly, and then insert k at its correct place. During the process the
middle key m is stored in the left half. The function spl i t () is shown in example 12.15.
Example 12.15

split(key_data k, btreeptr pmr, btreeptr ptr, int pos,


key_data *m, btreeptr *qmr)
{ r:
l.
int c , mid;
290 ■ DATA STRUCTURES USING C ■

mid = (pos <= ATLEAST)? ATLEAST: ATLEAST+1;


*qmr = (btreeptr)malloc(sizeof(btreenode));
for(c=mid+l; c<=ATMOST; ++c)
{
(*qmr)->key[c-mid] = ptr->key[c];
(*qmr)->bough[c-mid] = ptr->bough[c];
}
(*qmr)->n = ATMOST-mid;
ptr->n = mid;
(pos <= ATLEAST)? putkey(k, pmr, ptr, pos)
: putkey(k, pmr, *qmr, pos-mid);
*m = ptr->key[ptr->n];
(*qmr)->bough[0] = ptr->bough[ptr->n];
ptr->n--;
return;
}

12.7.4 Deleting a Key from B-tree


The deletion of a key from B-tree is even more complex than insertion of a key in it. We discuss
the deletion process through an example for a better understanding. This is shown in Fig. 12.19
in a step-by-step manner. Consider that we have a B-tree as shown in Fig. 12.18(h) which we
generated earlier. Firstly, let us delete the key 337 from it. This deletion is simple because the
leaf node containing the key 337 is having more than the minimum number of keys in it. What
we do is, simply place its successor key 347 into the place of 337 as shown using a dashed arrow
in Fig. 12.19(a) and shift the other keys in the node accordingly. Next, we delete the key 448
from the B-tree. Note that, in this case we are going to delete from a non-leaf node. We therefore
move the immediate successor of this key, which is 462, into this position and then delete 462
from the node containing it, which is a leaf, as earlier. The key movement from place to place is
also shown with dashed arrows in Fig. 12.19(a). After these deletions the tree takes the form
■ TREES■ 291
292 ■ DATA STRUCTURES USING C ■

Fig. 12.19

as in Fig. 12.19(b). Now let us consider the case where we try to delete a key from a leaf node
which is having minimum number of keys in it. This means that a key deletion from this node
will leave its node with too few keys. Consider that we have to delete the key 436 from its node.
This situation is tackled in the following way. Notice that its successor 462 is in its parent node.
Therefore 462 is moved down to take the place of 436 and is removed (deleted) from its original
place as earlier, that is, its original place is replaced by its successor 468 and 468 is deleted from
its leaf node. The process of deletion is depicted in Fig. 12.19(b). The deletion of the key
299 needs special attention. Notice that the deletion of 299 leaves its leaf node with too few keys.
Moreover, neither of its siblings can spare a key for this leaf. Therefore, the node is combined
with one of its siblings together with the middle key from its parent node as shown with the
loop in Fig. 12.19(c). After combining, the configuration of the B-tree is shown in Fig. 12.19(d). A
careful look at this figure tells us that the effect of combining will leave the parent node with too
few keys, and again its sibling cannot spare a key for it. Therefore, the top three nodes are again
combined to a single node as shown in Fig. 12.19(d) to yield the final B-tree after deletion which
is shown in Fig. 12.19(e).
All the cases of deletion of keys from a B-tree are discussed above in detail. We leave the
development of a detailed program for deleting a key from B-tree to the reader.

E mX i E i R m C i I m S m E m S
1. For each node in the tree in Fig. 12.7
(i) Name the parent node
(ii) List the children
(iii) List the siblings
(iv) Find the depth and height of the tree
2. Define binary tree and binary search tree and distinguish between them.
3. Give the visiting sequence of vertices of each of the following binary trees under
(i) Preorder
(ii) Inorder
(iii) Postorder traversal
■ TREESM 293

5. (a) Show the tree after inserting the following integers in sequence into an initially empty
binary search tree (BST).
18 6 20 25 32 8 12 16
(b) Show the BST after deleting the root from your BST.
6. Construct a binary tree, given the preorder and inorder sequences as below.
preorder: abceifjdghkl
inorder: eicfjbgdkhla
7. Show that in a binary tree with n nodes, there are (n+1) NULL pointers representing chil-
dren.
8. Show that the number of leaves in a non-empty binary tree is equal to the number of full
nodes plus one, where a full node is a node with two non-empty children.
9. How does a binary search tree, (BST) differ from an AVL tree and in what situation a BST is
inefficient as compared to an AVL tree with respect to searching time?
10. What is a threaded binary tree and what is its main advantage over an ordinary binary tree?
11. Write a function to generate the AVL tree of height h with minimum number of nodes in it.
12. Write a routine to perform deletion from a B-tree.
13

GRAPHS

Graphs are the most common 'abstract' structures encountered in computer science. Any sys-
tem that consists of discrete states or sites (called nodes or vertices) and connections between
them can be modeled by a graph. In computer science, mathematics, engineering, and many
other disciplines we often need to model a symmetric relationship between objects. The objects
are represented by nodes and the connections between nodes of a graph are called edges (in
case the connections are between ordered pair of nodes), or directed edges (in case the connec-
tions are between ordered pairs of nodes). Connections may also carry additional informations
as labels or weights related to the interpretation of the graph. Consequently, there are many
types of graphs and many basic notions that capture aspects of the structure of graphs. Also,
many applications require efficient algorithms that essentially operate on graphs.
Graphs occur often in real life, and we encounter them in natural situations. For example,
a road map showing the interstate highway connections between cities is an excellent example
of an undirected graph, since all interstate highways are two-way. We could also add weights
to each edge to indicate the distance in miles between the two cities, producing a weighted
undirected graph.
The sequence of courses that one must take to complete a degree in computer science can
also be represented as a graph. It is a directed graph in which the direction or edge implies the
specific order in which the course must be taken.
Another example that is a little more closely related to computer science is the graph that
represents resource allocations. It describes the relationship between a process, that is, a pro-
gram in execution and other resources in the system such as memory, a file, or printer. A re-
source allocation graph is a directed graph with edges going from a process node to a resource
node.
In a computer network, computers are interconnected via high-speed communication
channels such as phone line, optical fibres, microwave relays, or satellites. We can use a graph-
based representation to determine how to route messages from one node to another and to find
backup routes in case of node.
Graphs are frequently applied in diverse areas such as artificial intelligence, cybernetics,
chemical structures of crystals, transportation networks, electrical circuitry, and the analysis of
programming languages.
In this chapter we give graph representation and traversal algorithms. We also describe
the details of the carefully tuned data structures that may be needed to achieve the ultimate
bounds on time and space complexity for the algorithms.
■ GRAPHS ■ 295

13.2 GRAPH FUNDAMENTALS


When a computational problem is modeled in terms of graphs, the resulting graphs must be
generated and represented in some form in order to facilitate the operations of an algorithm.
The desired representation may be an 'abstract' data structure that supports certain operations
very efficiently, or it may be a 'concrete' display or drawing of the graph that allows visual
inspection and interactive manipulation.
A graph is the most generalized of all data representation. A graph structure is a gener-
alization of a hierarchical structure. The essence of a hierarchical structure is that each of its
components, with a few exceptions, has a unique predecessor and a bounded number of succes-
sors. In a graph, each component may be related to any other component. Thus a graph is a
(many:many) data structure in which each component of the graph can have an arbitrary num-
ber of successors and an arbitrary number of predecessors.
A graph is a structure G={V,E} in which V is a finite set of nodes and E is a finite set of
edges. We will denote nodes by drawing a circle around a node identifier and edges by using
line segments. A graph is a simple graph if it has no self-loop and parallel edges.
An edge connects two nodes. If the edges have a direction associated with them, then the
graph is called a directed graph or sometimes a digraph. If there is an edge between nodes n.
and n. in a directed graph, then n. is called the tail of the edge and n. is called the head of the
edge. We also say that n. is adjacent to or incident to node n.. The neighbours of a node are all
nodes that are adjacent to it.
If no direction is associated with an edge, then the graph is called undirected graph. The
presence of an edge implies that there exist connections in each direction.
Fig. 13.1 and 13.2 show the structures of both directed graph and undirected graph.

Fig. 13.1 Directed graph Fig. 13.2 Undirected graph

Note that in an undirected graph there can be at the most n(n-l)/2 edges for n nodes.
A weighted graph is a graph in which each edge has an associated value. A weighted
edge has a scaler value W associated with it. The weight is a measure of the cost of using this
edge to go from source node to destination.

15 : Directed, Weighted edge


— < b)

15
— ® : Undirected, Weighted edge

Fig. 13.3 Weighted edge for directed and undirected graphs


296 ■ DATA STRUCTURES USING C ■

A city map showing only the one-way streets is an example of a digraph where nodes are
the intersection of streets and the edges are the streets. Two-way streets in an example of an
undirected graph.
Any sequence of edges of a directed graph such that the terminal node of any node in the
sequence is the initial node of the edge, if any, appearing next in the sequence defines a path of
the graph. In other words, a simple path through a graph is a sequence of vertices Vv V2, ..., Vk
such that all vertices except possibly the first and the last, are distinct and each pair or nodes V.,
V = l, i=l, k-1 is connected by an edge. Consider Fig. 13.4.

Fig. 13.4 Directed graph showing path

In the above figure, the sequence ABCD is a simple path. DFABCD is also a path. How-
ever, ABE is not a path since there is no edge directly connecting nodes B and E. BCDBC is also
not a path since all nodes are not distinct (the nodes B and C appear twice in the sequence).
A cycle or circuit is a simple path V2, V3, ..., Vkexactly as defined above but with the
added requirement that V1=Vk.
For example, CDC and ABCDFEA are cycles. A cycle is called a simple cycle if no edges
appear more than once, otherwise it is a cycle.
The outdegree of a node in a directed graph is the number of edges exiting from the node;
the indegree of a node is the number of edges entering the node. In the following figures, the
indegree of CALCUTTA is 2 and its outdegree is 1. The indegree and outdegree of NEW DELHI
are 2 and 2, respectively.The indegree and outdegree of a node in a directed graph indicate its
relative importance of that graph. A node whose outdegree 0 acts primarily as a sink node and
a node whose indegree is 0 is called a source node.

A B

Fig. 13.5 Directed graph Fig. 13.6 Undirected graph

For example, in the figure, KALKA is the source node and PUNE is the sink node. Since
indegree and outdegree cannot apply to a node in an undirected graph, the degree of a node is
defined as the number of edges connected directly to the node. In Fig. 13.6 the degree of each
node is 2.
■ GRAPHS ■ 297

13.3 GRAPH REPRESENTATION


A graphical representation of graph has a limited usefulness. There are various popular ap-
proaches for representing graph structures. We will consider only two basic approaches— adja-
cency matrix and adjacency list. The method of matrix representation has a number of advan-
tages. It is easy to store and manipulate matrices and hence the graphs, represented by them,
are processed by computer. Matrix algebra can be used to obtain paths, cycles, and other fea-
tures of a graph.

13.3.1 Adjacency Matrix


The adjacency matrix A for a graph G of n nodes is an n x n matrix. Any element of the adja-
cency matrix is either 0 or 1. We set A.. = 1 if there is an edge from V. to V., and A.. = 0 if there is
no such edge. It is mathematically written as
Al) =1 if there is an edge
o from Vl to VJ
= 0, otherwise
The above matrix is also called a bit matrix or a boolean matrix. The ith row in the adja-
cency matrix is determined by the edge that originates in the node V . Fig. 13.7 shows the graph
and the corresponding adjacency matrix.
Since we are using only simple graph (i.e., no self-loop and parallel edges), the adjacency
matrix has zeros on its main diagonal, that is, A..= 0, for 1 < i < n. (i.e., An = 0, A22= 0 and so on).

V! V2 V3 V4 V5 V6
V, 0 1 1 0 0 0
V2 0 0 1 0 0 0
V3 0 0 0 1 0 0
V4 1 0 0 0 1 1
V5 0 0 0 1 0 0
V6 0 0 0 0 0 0

(a) Directed graph (b) Adjacency matrix

Fig. 13.7

Similarly, we can draw adjacency matrix of an undirected graph. The adjacency matrix
and the corresponding graph for an undirected graph are shown in the figures below.

( \ r\ Va V2 V3 V4
V, 0 1 1 1
A m
Vv3/
V2 1 0 1 1
, v3 1 1 0 1
V4 1 1 1 0
y

Fig. 13.8 Undirected graph Fig. 13.9 Adjacency matrix


298 ■ DATA STRUCTURES USING C ■

The adjacency matrix for an undirected graph is always symmetric whereas the adja-
cency matrix for a directed graph need not be symmetric. In other words, if the graph is undi-
rected, then A (i, j)=A(j,i) for all i, j.

13.3.2 Adjacency Lists


The adjacency list is a potentially more space-efficient method for representing graphs.
Fig. 13.10(b) is an adjacency-list representation of the undirected graph G in Fig. 13.10(a). Simi-
larly, Fig. 13.10(b) is an adjacency-list representation of the directed graphs G1 in the figure. The
corresponding adjacency matrices are shown in Fig. 13.10(c) and Fig. 13.10(c). The symbol' / '
indicates end of the corresponding list.

—► 2 5 / 1 2 3 4 5
( 7 ) ....... ( 7 ) 1
—► 1 5 3 - > 4 / 1 0 1 0 0 1
r / T V 2
—► 2 —► 4 / 2 1 0 1 1 1
—► 2 5 —► 3 3 0 1 0 1 0
/
------- Q j 5 4 1 2 / 4 0 1 1 0 1
5 1 1 0 1 0

(a) Graph G (b) Adjacency list of graph G (c) Adjacency matrix of graph G

Fig. 13.10

1 2 3 4 5 6
r\ / O>i) 1 -► 2 —► 4 / 1 0 1 0 1 0 0
-► 5 2 0 0 0 0 1 0
r
1r i
f J* 'f
1

k
2
3
4
—► 6
-* 2 /
—►
5z 3
4
0
0
0
1
0
0
0
0
1
0
1
0
( 4 )— 4— (5 D w 5 —► 4 5 0 0 0 1 0 0
6 6 0 0 0 0 0 1

(a) Graph G1 (b) Adjacency list of graph G1 (c) Adjacency matrix of graph G1

Fig. 13.11

In adjacency list representation, we store all the vertices in a list and then for each vertice,
we have a linked list of its adjacency vertices. In the next section, the graph traversal algorithm
will be discurssed.

13.4 GRAPH TRAVERSAL


Graph search is the analogue of graph traversal. In many practical applications of graphs, there
is a need to visit systematically all the vertices of a graph. One such application occurs when the
organizers of a political campaign want to make a list of all important political centres for their
candidate to visit. The presence or absence of edges between such centres will determine the
possible ways through which all the centres could be visited. Two standard ways of searching a
graph are breadth-first search and depth-first search. In the next two subsections we shall in-
vestigate the methods that ensure that all nodes are visited.
■ GRAPHS ■ 299

13.4.1 Breadth-first Search


Breadth-first search is one of the simplest algorithms for searching a graph. Although it can be
used both for directed and undirected graphs, we shall describe it in relation to undirected
graphs.
Given a graph G=(V,E) and a distinguished source vertex V, breadth-first search system-
atically explores the edges of G to discover every vertex that is reachable from V. It begins at a
given node and then proceeds to all nodes directly connected to that node. The following steps
are involved in the algorithm.
Algorithm 13.1: Breadth-first search
A graph, G, is either directed or undirected graph and is represented by an adjacency
matrix.
Step 1: Start with any vertex and mark it as visited.
Step 2: Using the adjacency matrix of the graph, find a vertex adjacent to the vertex in
step 1. Mark it as visited.
Step 3: Return to the vertex in Step 1 and move along an edge towards an unvisited
vertex, and mark the new vertex as visited.
Step 4: Repeat Step 3 until all vertices adjacent to the vertex, as selected in Step 2, have
been marked as visited.
Step 5: Repeat Step 1 through Step 4 starting from the vertex visited in Step 2, then start-
ing from the nodes visited to Step 3 in the order visited. If all vertices have been visited, then
continue to next step.
Step 6: stop
In a breadth-first search, the graph is searched breadthwise by exploring all the nodes
adjacent to a node. A queue is a convenient data structure to keep track of nodes that are visited
during a breadth-first search.
The following program implements the breadth-first algorithm as described earlier.
Example 13.1

Searching all the vertices of an undirected graph that can


be reached by a specific vertex in the same graph by the
Breadth-first search method

#include<stdio.h>
#include<stdlib.h>
void insert (int);
int queue[20], rear=-l, graph[7][7], row ;
void insert (int x)
{
rear= rear+1;
queue[rear]=x ;
}
300 ■ DATA STRUCTURES USING C ■

Remove()
{
int item, k;
item= queue[0];
for(k=o, k<rear; k++)
queue[k]= queue[k+1];
rear = rear-1;
' return(item);
}
void m a i n ()
{
int i,j,num,w,visited[10],v,jl;
int 1, vertices[10], count=0, final[10];
clrscr();
randomize();
printf("How many vertices are there ====>");
scanf("%d", & row);
printf("\n The graph's adjacency matrix is as
follows:\n\n\n");
printf(" ") ;
for(j=0, j<row; j++)
printf("Vertex %d\n", j);
for(i=0, icrow; i++)
{
for(j=count; j<row; j + +)
{
if(ii=j)
{
graph[i ][j ] = random (2);
graph[j][i] = graph [i][j ];
}
else graph[i][j] =0 ;
}
count++;
}
for(i=0; icrow; i++)
■ GRAPHS M 301

{
printf("Vertex %d", i);
for(j=0; jcrow, j++)
printf("%8d", graph[i][j ]);
printf(" \n\n");
}
for (i=0; icrow, i++)
visited[i]=0;
printf("\n Enter the vertex number from which to start ====>");
scanf("%d", &v);
visited[v]=1;
insert(v);
getch();
clrscr();
printf ("\n Starting vertex ==== = > Vertex %d \n\n",v);
count=l;
j 1 =0;
while(rear>=0)
{
v=Remove();
final[jl]=v;
j 1++ ;
1 =0 ;
for(i=0; i<row; i++)
if(graph[v][i]==1)
{
vertices[1]=i;
1++ ;
}
for(i=0; i<l; i++)
{
w=vertices[i];
printf("Step %d: Vertex visited = = = > Vertex
%d\n", count, w) ;
if(visited[w]!=1)
302 ■ DATA STRUCTURES USING C ■

{
insert(w);
visited[w]=1;
}
}
printf("Elements in the queue ===>");
if(rear>=0)
for(J=l; j<=rear; j++)
printf ("%d", queue[j]);
else
printf("EMPTY QUEUE: TRAVERSAL COMPLETE");
count*+;
printf("\n\n\n");
getch();
clrscr();
}
printf("\n The vertices that are traversed");
printf("from vertex %d are as follows:\n",v ) ;
if(count==2)
printf('\n Vertex %d: Is an isolated vertex",v);
else
for(i=o); i<jl; i++)
printf("Vertex %d,", final[i]);
getch();
}
The program produces the following output corresponding to the given input.
How many vertices are there ===> 5
The graph's adjacency matrix is as follows:
VertexO Vertexl Vertex2 Vertex3 Vertex4
VertexO 0 1 1 0 0
Vertexl 1 0 0 1 0
Vertex2 1 0 0 1 1
Vertex3 0 1 1 0 0
Vertex4 0 0 1 0 0
Enter the vertex number from which to start====>3
Starting vertex ====> Vertex3
Stepl: Vertex visited ====> Vertexl
■ GRAPHS ■ 303

Stepl: Vertex visited ====> Vertex2


Elements in the queue ====> 1,2,
Step2: Vertex visited =====> Vertes 0
Step2: Vertex visited =====> Vertex 3
Elements in the queue ===> 2,0,
Step3: Vertex visited ===> VertexO
Step 3: Vertex visited ====> Vertex3
Step 3: Vertex visited====> Vertex4
Elements in the queue ===> 0,4,
Step 4: Vertex visited ====> Vertexl
Step 4: Vertex visited ====> Vertex2
Elements in the queue ====> 4,
Step 5: Vertex visited ====> Vertex2
Elements in the queue ====> EMPTY QUEUE: TRAVERSAL COMPLETE
The vertices that are traversed from vertex3 are as follows
Vertex3, Vertexl, Vertex2, VertexO, Vertex4,

13.4.2 Depth-first Search


Depth-first search (DFS) can be described in a manner analogous to breadth-first search. The
main difference is the use of a stack instead of a queue.
In a DFS, initially all nodes in a graph are marked unvisited. Select any node, say Vo, in
the graph and proceed in the following way whenever a node V. is reached. When V. was not
reached before, mark it as visited. If there are no unexplored edges left in the edge list of V., then
backtrack to the node from which V. was reached. Otherwise traverse the first unexplored edge
(i,j) in the edge list of i. When j was visited before mark the edge and backtrack to i. When j was
not visited before, mark the edge. Now 'visit' the node reached in a similar manner.
A depth-first graph traversal algorithm utilizing the above approach is given below.
Algorithm 13.2: Depth-first search
A graph G is represented by an adjacency matrix. The input graph G may be undirected
or directed.
Step 1: Choose an arbitrary node in the graph, designate it as the search node, and mark it as
visited.
Step 2: From the adjacency matrix of the graph, find a node adjacent to the search node that has
not been visited as yet. Mark it as visited new search node.
Step 3: Repeat Step 2 using the new search node If there are no nodes satisfying on Step 2, return
to the previous search node and continue the process from there.
Step 4: When a return to the previous search node in Step 3 is impossible, the search from the
original chosen search node is complete.
Step 5: If there are any nodes in the graph which are still unvisited, choose any node that has not
been visited and repeat Step 1 through Step 4.
Step6: stop
304 ■ DATA STRUCTURES USING C ■

A connected component of graph G (not directed) is a maximal connected subgraph, that


is, a connected subgraph that is not contained in any large connected subgraph. The problem of
finding the connected components of a graph can be solved by using depth-first search with
very little modification. We may start with an arbitrary vertex, do a depth-first search to find all
other vertices (and edges) in the same component one after, if same vertices remain, choose one
and repeat.
In case of depth-first search, stack is the convenient structure to implement the depth-
first search.
Illustration: Consider the directed graph and the corresponding adjacency matrix shown
in Fig. 13.12(a) and Fig. 13.12(b). Although we will illustrate the depth-first search algorithm for
a directed graph, the approach can also be applied to undirected graphs.

V! V2 V3 V4 V5
V, 0 1 1 0 0
V2 0 0 0 0 0
V3 0 1 0 0 0
V4 1 1 1 0 1
V5 0 1 0 0 0

(b) Adjacency matrix

Fig. 13.12

Let us choose Vl as the starting node in the graph. First designate it as the search node and
mark it as visited. The nodes V2and V3are adjacent to Vr Next we will mark V2for searching the
possible path. There is no node adjacent to V2and we move to Vj again for considering the next
possibility of node, that is, V3. All nodes adjacent to V3also have been visited. So we will return
to Vr The search from Vl is now completed since all nodes except V4and V5 have been visited.
We now choose V4first and mark it as visited. Since V5is the only unvisited node adjacent
to V4 proceed to V5. V2 is the only node adjacent to V5, we will move to V2that has already been
visited. Then we will return to V4and the total search is complete.
Depth-first search is a generalization of traversing trees in preorder. The starting vertex
may be determined by the problem or may be chosen arbitrarily. While traversing vertices start-
ing from the initial vertex, a dead end may be reached. A dead end is a vertex such that all its
neighbours, that is, vertices adjacent to it, have already been visited. At a dead end we back up
along the last edge traversed and branch out in another direction. The beauty of the depth-first
algorithm, developed by J E Hopcroft and R E Tarjan, lies in the idea that the algorithm is used
to develop many important algorithms.
In a breadth-first search, vertices are visited in order of increasing distance from the start-
ing vertex, say V, where distance is simply the number of edges in a shortest path. An efficient
implementation for either method must keep a list of vertices that have been visited but whose
adjacent vertices have not yet been visited. The depth-first search backs up from a dead end, it
is supposed to branch out from the most recently visited vertex before pursuing new paths from
vertices that were visited earlier. Thus the list of vertices from which some paths remain to be
traversed must be in a stack. On the otherhand, in a breadth-first search, in order to ensure that
vertices close to V are visited first the list must be a queue.
■ GRAPHS ■ 305

The following C program implements the depth-first algorithm.


Example 13.2

Searching all the vertices of an undirected graph that


can be reached from a specific vertex, by the
Depth-first search method

#include <stdio.h>
#include<stdlib.h>
struck stack
{
int data[20];
int top;
};
void push(struct stack*, int);
int graph[10][10], row;

void main()
{
struct stack S ;
int i,j, count=0, v, visited[15];
int 1, vertices[15], w, final[15], index=0;
clrscr();

randomize();
printf("How many vertices are there ====>");
scanf("%d", &row);
printf("\n The graph's adjancency matrix is as follows:
\n\n\n");
printf(" ");
for(j=0; j<row; j+ + )
printf("Vertex %d", j);
printf("\n");
for(i=0; i<row; i++)
{
for(j=count; j<row; j++)
306 ■ DATA STRUCTURES USING C ■

{
if (iI=j )
{
graph[i][j]=random(2);
graph[j][i]=graph[i][ j ] ;
}
else
graph[i][j]=0;
}
count++;
}
for(i=0; icrow; i + +)
{
printf("Vertex %d", i);
for(j=0; jcrow; j + +)
printf("%8d", grapn[i][j];
printf ("\n\n/#) ;
}
for(i=0; ic20; i++)
S.data[i]=0;
S.top=-l;
printf("Enter the vertex from which traversal starts
=== >");
scanf("%d", &v);
getch(); clrscr();
for(i=0, icrow; i++)
visited[i]=0;
visited[v]=1;
push(&s,v);
printf "\n Starting vertex ====> vertex %d \n\n", v ) ;
count=1;
while(S.top==o)
{
v=pop(&S);
final[indx]=v;
indx++;
1 =0 ;
■ GRAPHS ■ 307

for(i=0, i<row i++)


if (graph[v] [I] « 1 )
{
vertices[1]=i;
1+ + ;
}
for(i=l-l; i>-0; i— )
{
w= vertices[1];
printf("step%d: Vertex visited ===> Vertex %d\n",
count,w);
if(visited [w]!=1)
{
push(&S,w);
visited[w]=1;
}
{
printf("Elements in the stack ====>");
if(S.top>=0)
for(j=0; j<=s.top; j++)
printf("%d," S.datatj]);
else
printf("EMPTY STACK: TRAVERSAL COMPLETE")
getch(); clrscrO;
count++;
}
printf("The traversal of the vertices by depth-first");
printf("search is as follows: \n \n");
for(i=0; i<indx; i++)
printf ("Vertex %d," final[i]);
getch();
}
void push(struct stack *sl, int x)
{
sl->top=sl->top-f 1;
sl->data[sl->top]=x;
]
308 ■ DATA STRUCTURES USING C ■

pop(struct stack *sl)


{
int p;
p=sl->data[sl->top];
sl->top=sl->top-l;
return(p) ;
}
The program produces the following output corresponding to the given input:
How many vertices are there ====> 5
The graph's adjacency matrix is as follows:
VertexO Vertexl Vertex2 Vertex3 Vertex4
VertexO 0 0 1 1 0
Vertexl 0 0 0 1 0
Vertex2 1 0 0 1 0
Vertex3 1 1 1 0 1
Vertex4 0 0 0 1 0
Enter the vertex from which traversal starts ===>2
Starting vertex ===> vertex2
Step 1: Vertex visited ===> Vertex3
Step 1: Vertex visited ===> Vertex 0
Elements in the stack ===> 3, 0,
Step 2: Vertex visited ====> Vertex3
Step 2: Vertex visited ====> Vertex2
Elements in the stack ====> 3,
Step 3: Vertex visited ====> Vertex4
Step 3: Vertex visited ====> Vertex2
Step 3: Vertex visited ====> Vertexl
Step 3: Vertex visited ====> VertexO
Elements in the stack ====> 4,1,
Step 4: Vertex visited ====> Vertex3
Elements in the stack ====> 4,
Step 5: Vertex visited ===> Vertex3
Elements in the stack ===> EMPTY STACK: TRAVERSAL COMPLETE
The traversal of the vertices by depth-first search is as follows:
Vertex2, VertexO, Vertex3, Vertexl, Vertex4
■ GRAPHS ■ 309

EnXeEsRpCiiliiS^EitS
1. Describe the adjacency matrix representation of graph. What are its advantages and disad-
vantages?
2. What are the main advantages of adjacency list representation of a graph over the adja-
cency matrix representation?
3. Explain the difference between directed graph and undirected one?
4. Write a pseudo code to implement the Breadth-first search algorithm.
5. Consider a graph in which the relationship among the nodes is linear. Describe the order in
which the nodes will be processed during Depth-first search that start at one of the nodes
with only one neighbour. What happens if the search starts at one of the nodes with two
neighbour?
6. Both Breadth-first and Depth-first search procedures probe each node in the graph at least
once and each edge twice. Prove that this effort is 0(ne+n), where n is the number of nodes
and e denotes the number of edges.
7. Write a recursive routine to implement Breadth-first search.
8. Write an algorithm to produce the shortest path from a node m to another node n in an
undirected graph if a path exists, or an indication that no path exists between two nodes.
9. Let D be a directed graph and TDbe the directed graph formed by adding an edge from dx
to d2whenever dt and d2 (with no direct edge) are nodes in D and there is a path from dato
d2. Td is the transitive closure of D. Write an algorithm to compute the transitive closure of
a digraph.
10. A topological order is a linear relationship among the nodes of a directed graph such that
each directed edge goes from a node to one of its successors. The basic idea behind topo-
logical sorting is to find a node with no successor, remove it from the graph and add it to a
list. Repeat this process until the graph is empty. Write an algorithm to implement topologi-
cal sorting.
11. Explain briefly the following terms:
a. diagraph b. adjacency list c. Traversal of graph
INDEX

A complexity analysis 207

abstract data type (ADT) 3 D


adjacency lists 298
adjacency matrix 297 data abstraction I, 20
algorithm 9 data representation 1
American National Standards Institute data structure 1
(ANSI) 13 datatype 1
American Standard Code for Information decision symbol 11
Interchange (ASCII) 2 decision tables 16
arcs 253 decision tree 17
argc 85 deletion 34
argv 85 directed graph 295
array-based implementation 152 doubly linked lists 186
arrays 1, 74
character array 80 E
extern array 32
multidimensional arrays 50 Edger W Dijkstra 25
one-dimensional arrays 36 exponent 2
three-dimensional arrays 36 expression tree 259
asymptotic analysis 23 Extended Binary Coded Decimal Inter-
AVL trees 274 change Code (EBCDIC) 2

B
B-trees 282 F Terry Baker 25
BASIC 119 Fibonacci numbers 140
binary 90 first-come-first-served (FCFS) 111
binary number system 1 first-in-first-out (FIFO) 111
Binary Search Trees 263 fixed-point storage representation 6
binary tree 224 float 5
bit 1 floating-point notation 2
Bohm 25 floating-point representation 6
bottom-up approach 21 flowchart 11
Boyer-Moore algorithm 67
Brute-Force algorithm 60 G
Graph search 298
C graph traversal 298
C language 3 Graphs 294
character pointer 80
circular linked list 184 m
COBOL 119
Hartlan Mills 25
Communications of the ACM 25
hashing schemes 238, 246
■ INDEX ■ 311

IMM]
postfix notation expressions 98
I
prefix 98
implementions 152 preorder traversals 259
informal design language 13 primitive data structures 5
initialization 78 procedural abstraction 20
inorder traversal sequences 259 processing symbol 11
insertion 34 program flowchart 11
International Organization for pseudocode 13
Standardization (ISO) 13 pseudolanguage 13
push operation 179

JW JW illiam s 224
Jacopini 25 queues 90

K R
Kunth-Morris-Pratt 52 recursive algorithms 119
Kunth-Morris-Pratt algorithm 63 root 253
RPN expression 100
L
last-come-first-serve (LCFS) 90
last-in-first-out 90 scalar 72
last-in-first-out (LIFO) 90 searching 238
linked list 1,146 binary search 240
LSB (least significant bit) 1 breadth-first search 298
depth-first search 298
M indexed sequential search 243
sequential search 238
mantissa 2 sequential lists 144
modularisation 18 singly linked list 186
MSB (most significant bit) 6 sorting 200
bubble sort 204
N heap sort 224
insertion sort 15, 200
Nested control structures 26 merge sort 217
node 160 quick sort 212
nodearray 160 shell sort 208
selection sort 202
P stacks 1, 90
PASCAL 60 static 32
Pascal triangle 45 string 52
string processing 52
pattern matching 52
pointer constant 74 structural programming 26
pointer-based implementation 152 substring 55
pointers 71, 74 symetrically linked lists 186
system flowchart 11
312 ■ DATA STRUCTURES USING C ■

T U
terminal symbol 11 unary operator 72
text processing 52
threaded binary tree 260 V
top-down design approach 19
Tower of Hanoi 126 vertices 253
traversals 256
tree 253 W
triad numbers 48
two-dimensional 36 weight 295
type specifier 72 weighted graph 295

You might also like