Chapter 1: Introduction
1.1 Introduction
1.2 Basic Terminology: Elementary Data Organization
1.3 Types of Data structutes
1.4 Data Structure operations
1.5 Algorithm Complexity and
1.6 Time-Space trade-off
1.1 Introduction:
In a beginning programming course, variables are introduced that store a single datum. In all
but trivial programs, however, we need to store multiple related data items for convenience. For
example, data items like students records many need to be stored under a single name for ease of
processing. Such convenient structuring of data is called data organization: a container for the
data is called data structure.
A data structure is a means of organizing data in primary memory in a form that is convenient to
process by a program.
In the definition of data structure, structure means a set of rules that holds the data together. In
other words, if we take a group of data and fit them into a structure such that we can define its
relating rules, we have made a data structure.
Data is stored either in main memory or in secondary memory. In order to represent data we
need some models.
The different models (logical or mathematical) to represent/organize/store data in the main
memory are together referred to as data structures.
The data structure is collection of data organized in a specific manner in computers main
memory.
The data structure is a way Abstract Data Type (ADT) can be implemented.
In the beginning of computer programming, if we wanted to read a file , we wrote the code to
read the physical file device. It did not take long to realize that we were writing the same code
over and over again. So, we created what is known as abstract data type(ADT). We wrote the
code to read the file and placed in a library for all programmers to use.
An abstract data type is a data declaration packaged together with the operations that are
meaningful for the data type.
In other words, ADT is the encapsulation of the data and the operations on the data in such a
way that they are hidden from the users.
ADT consists of a set of definitions that allow programmers to use functions while hiding the
implementation.
With an ADT, users are not concerned with how the task is done but rather with what it can do.
Suppose we have to solve a problem that requires queue data structure. The queue data structure
is not available in programming languages. There are two ways to solve this problem:
One way is to write a program that simulates the queue we need. But this is good only for one
application.
Another way is to write a queue ADT that can be used to solve any queue problem.
If we choose the latter way, the other programmers writing other applications can concentrate
on the applications rather than writing the queue data structure.
The different models (logical or mathematical) to represent/organize/store data in the secondary
memory are called file structures or data storage structures.
A data structure could be a programming construct provided in a language or it can be defined
by the programmer. Example of data structure include: Arrays, Linked lists, Stacks, Queues,
Trees, Graphs. Data structures are applied in sorting, searching, hash tables, graph algorithms,
pattern matching, data compressing etc.
The content of this course has not changed much in the last two decades except for inclusion of
some new algorithms.
Good programming and problem solving requires knowledge of data structures. Without a
sufficient understanding of data structures, the level at which a programmer can work will be
severely limited.
Motivation for Data Storage Structures:
The main purpose for computer systems is to execute programs.
These programs together with the data they act upon, must be stored in main memory during
execution.
Ideally, we would want the programs and data to reside in the main memory permanently.
However, this is not possible because
the main memory is usually too small to store all needed data permanently.
the main memory is a volatile storage device that looses its content when power is turned off or
lost.
Since large data sets cannot fit into main memory, secondary storage structures are necessary
to handle such data.
Magnetic disk is the most common form of secondary storage in computer systems.
With storage structures, a small portion of the data is kept in primary storage, and additional
data is read from secondary storage as needed.
A data storage structures is a means of organizing data as blocks in secondary memory that
optimizes I/O read/write operations.
Data storage structures include: Sequential files, Random access files, Indexed files, B-Trees,
B*-Trees.
1.2 Basic Terminology: Elementary data organization
Data: The term data means a value or set of values. For example, marks of students, figures
obtained during exit polls etc.
Data item:- A data item means a single unit of values. For example, roll number, name, address
etc.
Entity: Entity is something that has certain qualities, characteristics, properties or attributes that
may contain some values. For example, Student is an entity. The attributes of student may be roll
number, name, address, etc,. The values of these attributes may be 100, Ram, House No:133-A,
Pragati Vihar, Dehradun.
Entity Set: An entity set is a group of or set of similar entities. For example, employees of an
organization, students of a class etc.
Information: When the data are processed by applying certain rules, new processed data is
called information. The data are not useful for decision marking whereas information is useful
for decision making.
Field: File is a single elementary unit of information representing an attribute of an entity. For
example, 1, 2, 3, 4.etc are represented by a single unit called roll number field.
Record: Record is a collection of field values of a given entity. For example, roll number, name,
address etc of a particular student
File: File is a collection of records of the entities in a given entity set. For example, file
containing records of students of a particular class.
Key: A key is one or more field(s) in a record that take(s) unique values and can be used to
distinguish one record from the others.
There are three cases:
Case 1: When more than one field may have unique values.
In that case, there exist multiple keys, but at a time we use only one field as a key, called primary
key. The other key(s) are called as alternate key(s).
Case 2: There is no field that has unique values.
Then a combination of two or more fields can be used to form a key. Such a key is called
composite key.
Case 3: There is no possibility of forming a key from the within the record.
Then an extra field can be added to the record that can be used as a key.
1.3: Types of data structures
(i) Linear vs Non-Linear Data structures:-
Linear data structures: Data items form a sequence. For examples:-arrays, linked lists, stacks,
queues.
Non-Linear Data structures: Data items do not form a sequence. For examples:- trees, graphs.
(ii) Homogeneous vs Non-Homogeneous Data structures
Homogeneous Data structures: All data items are of same type.For example, arrays.
Non-Homogeneous:- All data items are may or may not be similar. For example, record or
structure.
1.4 Data structure operations:-
The common data structures that can be performed on different data structures are:-
(i) Traversal: Accessing each data item of data structure in order to process it is called
traversal.
(ii) Searching: Finding the location of a given data item.
(iii) Insertion: Adding new data item to the data structure.
(iv) Deletion: Removing an existing data item from a data structure.
(v) Sorting:- Arranging the data items in some logical order(Ascending or descending order).
(vi) Merging:- Combining the elements of two similar sorted data structures into a single
structure.
1.5: Algorithm Complexity: The time-space trade-off
Algorithm
A well-defined computational procedure that takes some value, or a set of values, as input and
produces some value, or a set of values, as output.
Sequence of computational steps that transform the input into the output.
Sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required
output for any legitimate input in a finite amount of time.
An algorithm can be expressed in three ways:-
(i) in any natural language such as English, called pseudocode.
(ii) in a programming language or
(iii) in the form of a flowchart.
Complexity of algorithm is a function of size of input of a given problem instance which
determines how much running time/memory space is needed by the algorithm in order to run to
completion.
Time Complexity: Time complexity of an algorithm is the amount of time it needs in order to run
to completion.
Space Complexity: Space Complexity of an algorithm is the amount of space it needs in order to
run to completion.
There are two points which we should consider about computer programming:-
(i) An appropriate data structure and
(ii) An appropriate algorithm.
There may be more than one algorithm to solve a particular problem.
E.g.: sorting n numbers: 106 numbers
Friends computer = 109 instructions/second
Friends algorithm = 2n2 instructions
Your computer = 107 instructions/second
Your algorithm = 50nlgn instructions
Your friend =
You =
20 times better!!
Time Space Trade-off
The best algorithm to solve a given problem is one that requires less memory space and less time
to run to completion. But in practice, it is not always possible to obtain both of these objectives.
One algorithm may require less memory space but may take more time to complete its execution.
On the other hand, the other algorithm may require more memory space but may take less time
to run to completion. Thus, we have to sacrifice one at the cost of other. In other words, there is
Space-Time trade-off between algorithms.
If we need an algorithm that requires less memory space, then we choose the first algorithm at
the cost of more execution time. On the other hand if we need an algorithm that requires less
time for execution, then we choose the second algorithm at the cost of more memory space.
Arrays
2.1 Array Definition
2.2 Representation
2.3 Analysis
2.4 Single and Multidimensional Arrays
2.5 Address calculation
2.6 Application of arrays
Array definition:-Array is linear, homogeneous data structures whose elements are stored in
contiguous memory locations.
Arrays are subscripted variables stored in contiguous memory locations.
Accessing Array elements: Elements of arrays are accessed by using index or subscripts.
Types of Arrays:
One-dimensional Array or linear array: requires only one index to access an element.
Two-Dimensional Array: requires two indices to access an element.
Multidimensional Array: requires two or more indices to access an element
Indices of array are integer numbers.
In C/C++/Java index starts from 0,that is the smallest index of array is 0.
In C/C++/Java index are written in brackets [ ].
Stacks
Stack is an abstract data type with a bounded(predefined) capacity. It is a simple data structure
that allows adding and removing elements in a particular order. Every time an element is added,
it goes on the top of the stack, the only element that can be removed is the element that was at
the top of the stack, just like a pile of objects.
Basic features of Stack
1. Stack is an ordered list of similar data type.
2. Stack is a LIFO structure. (Last in First out).
3. push() function is used to insert new elements into the Stack and pop() is used to delete
an element from the stack. Both insertion and deletion are allowed at only one end of
Stack called Top.
4. Stack is said to be in Overflow state when it is completely full and is said to be in
Underflow state if it is completely empty.
Applications of Stack
The simplest application of a stack is to reverse a word. You push a given word to stack - letter
by letter - and then pop letters from the stack.
There are other uses also like : Parsing, Expression Conversion(Infix to Postfix, Postfix to Prefix
etc) and many more.
Implementation of Stack
Stack can be easily implemented using an Array or a Linked List. Arrays are quick, but are
limited in size and Linked List requires overhead to allocate, link, unlink, and deallocate, but is
not limited in size. Here we will implement Stack using array.
Applications of stack
Infix, Prefix and Postfix Notation
We are accustomed to write arithmetic expressions with the operation between the two operands:
a+b or c/d. If we write a+b*c, however, we have to apply precedence rules to avoid the
ambiguous evaluation (add first or multiply first?).
There's no real reason to put the operation between the variables or values. They can just as well
precede or follow the operands. You should note the advantage of prefix and postfix: the need
for precedence rules and parentheses are eliminated.
Infix Prefix Postfix
a+b +ab ab+
a+b*c +a*bc abc*+
(a + b) * (c - d) *+ab-cd ab+cd-*
b*b-4*a*c
40 - 3 * 5 + 1
Postfix expressions are easily evaluated with the aid of a stack.
https://www.slideshare.net/archijamwal5/ds-ppt
The above link provides the detailed stack ppt along with algorithms for push and pop.
Recursion
Some computer programming languages allow a module or function to call itself. This technique
is known as recursion. In recursion, a function either calls itself directly or calls a function
that in turn calls the original function . The function is called recursive function.
Example a function calling itself.
Properties
A recursive function can go infinite like a loop. To avoid infinite running of recursive function,
there are two properties that a recursive function must have
Base criteria There must be at least one base criteria or condition, such that, when this
condition is met the function stops calling itself recursively.
Progressive approach The recursive calls should progress in such a way that each time
a recursive call is made it comes closer to the base criteria.
Implementation
Many programming languages implement recursion by means of stacks. Generally, whenever a
function (caller) calls another function (callee) or itself as callee, the caller function transfers
execution control to the callee. This transfer process may also involve some data to be passed
from the caller to the callee.
This implies, the caller function has to suspend its execution temporarily and resume later when
the execution control returns from the callee function. Here, the caller function needs to start
exactly from the point of execution where it puts itself on hold. It also needs the exact same data
values it was working on. For this purpose, an activation record (or stack frame) is created for
the caller function.
Analysis of Recursion
One may argue why to use recursion, as the same task can be done with iteration. The first
reason is, recursion makes a program more readable and because of latest enhanced CPU
systems, recursion is more efficient than iterations.
Tower of Hanoi Problem
Tower of Hanoi, is a mathematical puzzle which consists of three towers (pegs) and more than
one rings is as depicted
These rings are of different sizes and stacked upon in an ascending order, i.e. the smaller one sits
over the larger one. There are other variations of the puzzle where the number of disks increase,
but the tower count remains the same.
Rules
The mission is to move all the disks to some another tower without violating the sequence of
arrangement. A few rules to be followed for Tower of Hanoi are
Only one disk can be moved among the towers at any given time.
Only the "top" disk can be removed.
No large disk can sit over a small disk.
Following is an animated representation of solving a Tower of Hanoi puzzle with three disks.
Tower of Hanoi puzzle with n disks can be solved in minimum 2n1 steps. This presentation
shows that a puzzle with 3 disks has taken 23 - 1 = 7 steps.
Algorithm
To write an algorithm for Tower of Hanoi, first we need to learn how to solve this problem with
lesser amount of disks, say 1 or 2. We mark three towers with name, source, destination and
aux (only to help moving the disks). If we have only one disk, then it can easily be moved from
source to destination peg.
If we have 2 disks
First, we move the smaller (top) disk to aux peg.
Then, we move the larger (bottom) disk to destination peg.
And finally, we move the smaller disk from aux to destination peg.
So now, we are in a position to design an algorithm for Tower of Hanoi with more than two
disks. We divide the stack of disks in two parts. The largest disk (nth disk) is in one part and all
other (n-1) disks are in the second part.
Our ultimate aim is to move disk n from source to destination and then put all other (n1) disks
onto it. We can imagine to apply the same in a recursive way for all given set of disks.
The steps to follow are
Step 1 Move n-1 disks from source to aux
Step 2 Move nth disk from source to dest
Step 3 Move n-1 disks from aux to dest
A recursive algorithm for Tower of Hanoi can be driven as follows
START
Procedure Hanoi(disk, source, dest, aux)
IF disk == 0, THEN
move disk from source to dest
ELSE
Hanoi(disk - 1, source, aux, dest) // Step 1
move disk from source to dest // Step 2
Hanoi(disk - 1, aux, dest, source) // Step 3
END IF
END Procedure
STOP