Advanced Data Structures
Saurabh Pandey
                                               Department of Computer Application
                                            Azad Institute of Engineering & Technology
                                                           Lucknow, India
                                             e-mail: saurabhpandey7@indiatimes.com
Abstract— This paper investigates how to maintain an efficient       and communication. We hope to give a cleaner and more
dynamic ordered set of bit strings, which is an important            modern view than you might have seen before.
problem in the field of information search and information               • Data structures for range queries (and other geometric
processing. Generally, a dynamic ordered set is required to          problems) and problems on trees. There are surprising
support 5 essential operations including search, insertion,          equivalences between such problems, as well as interesting
deletion, max-value retrieval and next-larger-value retrieval.       solutions.
Based on previous research fruits, we present an advanced
data structure named rich binary tree (RBT), which follows
both the binary-search-tree property and the digital-search-
tree property. Also, every key K keeps the most significant
difference bit (MSDB) between itself and the next larger value
among K’s ancestors, as well as that between itself and the next
smaller one among its ancestors. With the new data structure,
we can maintain a dynamic ordered set in O(L) time. Since
computers represent objects in binary mode, our method has a
big potential in application. In fact, RBT can be viewed as a
general-purpose data structure for problems concerning order,
such as search, sorting and maintaining a priority queue. For
example, when RBT is applied in sorting, we get a linear-time
algorithm with regard to the key number and its performance
is far better than quick-sort. What is more powerful than
                                                                                      Figure 1: Structure in lab view
quick-sort is that RBT supports constant-time dynamic                    Data structures optimized for external memory, and
insertion/deletion.                                                  cache-oblivious data structures. Any problem (e.g., sorting,
                                                                     priority queues) is different when you're dealing with disk
    Keywords- - information processing - dynamic ordered set -       instead of main memory, or you care about cache
algorithms and data structures - rich binary tree Introduction.      performance. Memory hierarchies have become important in
                                                                     practice because of the recent escalation in data size.
                         I.   INTRODUCTION
                                                                                     I.    COMPLEXITY OF ALGORTHIM
    Data structures play a central role in modern computer
science. You interact with data structures much more often               An algorthim is a sequence of steps that gives method of
than with algorithms (think of Google, your mail server, and         solving a problem.It creates the logic program.Efficency of
even your network routers). In addition, data structures are         alogrithm depends upon on two major criteria,first one is run
essential building blocks in obtaining efficient algorithms.         time of that algorthim and second is the space.Run time of
This course will cover major results and current directions of       algorthim means the time taken by program for
research in data structures:                                         exeucation.complexity tends to be used to characterize
    •      Classic comparison-based data structures. The area        something with many parts in intricate arrangement. The
is still rich with open problems, such as whether there is a         study of these complex linkages is the main goal of network
single best (dynamically optimal) binary search tree.                theory and network science. In science[1] there are at this
    •      Dynamic graph problems. In almost any network, a          time a number of approaches to characterizing complexity,
link's availability and speed are anything but a constant,           many of which are reflected in this article. In a business
which has led to a re-evaluation of the common                       context, complexity management is the methodology to
understanding of graph problems: how to maintain essential           minimize value-destroying complexity and efficiently
information such as a minimum-weight spanning forest                 control value-adding complexity in a cross-functional
while the graph changes.
    •      Integer data structures: beating the O(lg n) barrier in
sorting and searching. If you haven't seen this before, beating
O(lg n) may come as a surprise. If you have seen this before,
you might think that it's about a bunch of grudgy bit-tricks.
In fact, it is about fundamental issues regarding information
approach.                                                       much time might be needed in the worst case to guarantee
                                                                that the algorithm will always finish on time.
                                                                     Average performance and worst-case performance are
                                                                the most used in algorithm analysis. Less widely found is
                                                                best-case performance, but it does have uses: for example,
                                                                where the best cases of individual tasks are known, they can
                                                                be used to improve the accuracy of an overall worst-case
                                                                analysis.
                                                                     The term best-case performance is used in computer
                                                                science to describe the way an algorithm behaves under
                                                                optimal conditions. For example, the best case for a simple
                                                                linear search on a list occurs when the desired element is the
                                                                first element of the list.
                     Figure 2: Web Mining Process                    Worst-case performance analysis and average case
                 II.     CHOOSING   AN ALOGORITHM               performance analysis have some similarities, but in practice
                                                                usually require different tools and approaches.
    • For every problem there is a multitude of algorithms           On the other hand some algorithms like hash tables have
that solve the problem. So youhave a choice of algorithms to    very poor worst case behaviours, but a well written hash
code up as programs.                                            table of sufficient size will statistically never give the worst
                                                                case; the average number of operations performed follows an
    • If a program is likely to be used only once on a small    exponential decay curve, and so the run time of an operation
amount of data, then you should select the algorithm that is    is statistically bounded.
easiest to implement. Code it up correctly, run it and move          Linear search on a list of n elements. In the worst case,
on to something else.                                           the search must visit every element once. This happens when
                                                                the value being searched for is either the last element in the
    • But if the program will be used many times and has a      list, or is not in the list. However, on average, assuming the
lifetime that makes maintenance likely, then other factors      value searched for is in the list and each list element is
come into play including readability, extensibility,            equally likely to be the value searched for, the search visits
portability, reusability, ease of use and efficiency. It is     only n/2 elements.
efficiency that we be looking at in this part of the module.         Quicksort applied to a list of n elements, again assumed
                                                                to be all different and initially in random order. This popular
                                                                sorting algorithm has an average-case performance of O(n
              III.     BEST ,WORST AND AVERAGE CASE
                                                                log n), which contributes to making it a very fast algorithm
   In computer science, best, worst and average cases of a      in practice. But given a worst-case input, its performance
given algorithm express what the resource usage is at least,    degrades to O(n2).
at most and on average, respectively. Usually the resource
being considered is running time, but it could also be                             IV.   SORTING ALGORITHM
memory or other resources.                                           In computer science and mathematics, a sorting
                                                                algorithm is an algorithm that puts elements of a list in a
                                                                certain order. The most-used orders are numerical order and
                                                                lexicographical order. Efficient sorting is important to
                                                                optimizing the use of other algorithms (such as search and
                                                                merge algorithms) that require sorted lists to work correctly;
                                                                it is also often useful for canonical data and for producing
                                                                human-readable output.
                                                                     The output is in non decreasing order (each element is no
                                                                smaller than the previous element according to the desired
                                                                total order).
                                                                     since the dawn of computing, the sorting problem has
                                                                attracted a great deal of research, perhaps due to the
                                                                complexity of solving it efficiently despite its simple,
                                                                familiar statement. For example, bubble sort was analyzed as
                                                                early as 1956.Although many consider it a solved problem,
                                                                useful new sorting algorithms are still being invented (for
                                                                example, library sort was first published in 2004). Sorting
                                                                algorithms are prevalent in introductory computer science
                                                                classes, where the abundance of algorithms for the problem
                                                                provides a gentle introduction to a variety of core algorithm
                                                                concepts, such as big O notation, divide-and-conquer
    In real-time computing, the worst-case execution time is    algorithms, data structures, randomized algorithms, best,
often of particular concern since it is important to know how
worst and average case analysis, time-space trade offs, and            A red-black tree is a binary search tree where each node
lower bounds.                                                      has a color attribute, the value of which is either red or black.
                                                                   In addition to the ordinary requirements imposed on binary
                                                                   search trees, the following requirements apply to red-black
                                                                   trees:
                                                                       A node is either red or black.
                                                                       The root is black. (This rule is sometimes omitted from
                                                                   other definitions. Since the root can always be changed from
                                                                   red to black but not necessarily vice-versa this rule has little
                                                                   effect on analysis.)
                                                                       All leaves are black.
                                                                       Both children of every red node are black.
                                                                       Every simple path from a given node to any of its
                                                                   descendant leaves contains the same number of black nodes.
                                                                      Time complexity
                                                                      in big O notation
                                                                                Average           Worst case
                                                                      Space          O(n)          O(n)
                                                                      Search         O(log n)      O(log n)
                                                                      Insert         O(log n)      O(log n)
                                                                      Delete         O(log n)       O(log n)
                 Figure 3: Bubble Sort                                                  VI.   EULER TOUR TREE
                                                                       Dynamic trees, also known as link-cut trees. Link-cut
                      V.    RED BLACK TREE                         trees are able to represent a dynamic forest of rooted trees in
                                                                   O(log n) amortized time per operation. examining a simpler,
                                                                   although not strictly better, alternative to link-cut trees
    A red-black tree is a type of self-balancing binary search
                                                                   known as Euler tour trees. We will then use Euler tour trees
tree, a data structure used in computing science, typically
                                                                   to achieve dynamic connectivity in general graphs in O(log2
used to implement associative arrays. The original structure
                                                                   n) time. Finally we will survey some of what is and is not
was invented in 1972 by Rudolf Bayer[1] and named
                                                                   known for dynamic graphs.
"symmetric binary B-tree," but acquired its modern name in
a paper in 1978 by Leonidas J. Guibas and Robert
                                                                       Euler Tour trees are due to Henzinger and King [1] and
Sedgewick.[2] It is complex, but has good worst-case
                                                                   are an alternative to link-cut trees for representing dynamic
running time for its operations and is efficient in practice: it
                                                                   trees. Euler tour trees are simpler and easier to analyze than
can search, insert, and delete in O(log n) time, where n is
                                                                   link-cut trees, but do not naturally store aggregate
total number of elements in the tree. Put very simply, a red-
                                                                   information about paths in the tree. Euler tour trees are well
black tree is a binary search tree that inserts and removes
                                                                   suited for storing aggregate information on subtrees, which is
intelligently, to ensure the tree is reasonably balanced.. The
                                                                   a feature we will use in the next section. The idea behind
resulting algorithm is named as ‘Weighted Page Rank’[9].
                                                                   Euler Tour trees is to store the Euler tour of the tree. In an
    A red-black tree is a special type of binary tree, used in     arbitrary graph, an Euler tour is a path that traverses each
computer science to organize pieces of comparable data,            edge exactly once. For trees we say that each edge is
such as text fragments or numbers. Red-black trees, like all       bidirectional, so the Euler tour of a tree is the path through
binary search trees, allow efficient in-order traversal in the     the tree that begins at the root and ends at the root, traversing
fashion, Left-Root-Right, of their elements. The search-time       each edge exactly twice | once to enter the subtree, and once
results from the traversal from root to leaf, and therefore a      to exit it. The Euler tour of a tree is essentially the depth
balanced tree, having the least possible tree height, results in   traversal of a tree that returns to the root at the end.
O(log n) search time.
                                                                     input, are cited in most algorithms going quadratic in real-
                                                                     life applications.3 No matter how hard implementers try,
                                                                     they cannot (without great sacrifice of speed) defend against
                                                                     all inputs. This note describes an adversarial method that
                                                                     finds chinks in the defenses of any implementation. A
                                                                     polymorphic implementation of quicksort, such as the
                                                                     standard C function qsort, never looks at the data. It relies
                                                                     instead on an externally supplied comparison function. And
                                                                     that allows us to monitor and influence the program’s
                                                                     progress noninvasively. To do so we make a comparison
                                                                     function that observes the pattern of comparisons and
                                                                     constructs adverse data on the fly. Recall that quicksort sorts
                                                                     a sequence of n data items in three phases:
                                                                          1. Pick a data item as pivot.We assume that this phase
Figure 5:The correspondence between a tree and its Euler             uses O(1) comparisons.
tour                                                                      2. Partition the data into three parts that respectively
                                                                     contain all items less than the pivot, the pivot item itself, and
An Euler-Tour tree supports the following operations:                all items greater than the pivot. The placement of items equal
FindRoot(v) Find the root of the tree containing node v. In          to the pivot varies among implementations.
the Euler tour of a tree, the root is Visited rest and last.                        quicksort( void *a, int low, int high )
Therefore we simply return the minimum or maximum                                                         {
element in the BST.                                                                                  int pivot;
Cut(v) Cut the subtree rooted at v from the rest of the tree.                            /* Termination condition! */
Note that the Euler tour of v's subtree is a contiguous                                         if ( high > low )
subsequence of visits that starts and ends with v, contained in                                            {
the sequence of visits for the whole tree. To cut the subtree                           pivot = partition( a, low, high );
rooted at v, we may simply split the BST before its rst and                                quicksort( a, low, pivot-1 );
after its last visit to v. This splitting gives us the Euler tour of
the tree before reaching v, the tour of v's subtree, and the tour                         quicksort(   a, pivot+1, high );
of the tree after leaving v. Concatenating the rst and last                                                }
pieces together, and possibly deleting one redundant                                                      }
visitation between the end of the rst piece and beginning of
the last, gives us our answer.
Link(u; v) Insert u's subtree as a child of v. In the resulting           Quicksort is a divide and conquer algorithm. Quicksort
Euler tour, we need to traverse u's subtree immediately after        first divides a large list into two smaller sub-lists: the low
and immediately before visits to v. Therefore we will split v's      elements and the high elements. Quicksort can then
traversal before the last visit to v, and then concatenate onto      recursively sort the sub-lists.
the left piece a singleton visit to v, followed by the Euler tour         The steps are:
of u's subtree, followed by the right piece.                              Pick an element, called a pivot, from the list.
Each of these operations performs a constant number of                    Reorder the list so that all elements with values less than
search, split, and merge operations on the Euler tour tree.          the pivot come before the pivot, while all elements with
Each of these operations takes O(log n) per operation on a           values greater than the pivot come after it (equal values can
balanced BST data structure. Consequently, the total running         go either way). After this partitioning, the pivot is in its final
time for Euler-Tour trees is O(log n) per operation. If we use       position. This is called the partition operation.
a B-tree with fanout of (log n) instead of a balanced BST in              Recursively sort the sub-list of lesser elements and the
our Euler tour trees, we can achieve O(log n= log log n)             sub-list of greater elements.we boost the weights of links in
searches (from the depth of the tree) and O(log2 n= log log          whose anchor - a window of a fixed width - query terms
n) updates (from the depth times the branching factor).              occur.
                                                                          AVERAGE COMPLEXITY:-
                             QUICK SORT                                   Even if pivots aren't chosen randomly, quicksort still
                                                                     requires only time over all possible permutations of its
                                                                     input. Because this average is simply the sum of the times
 Quicksort can be made to go quadratic by constructing input         over all permutations of the input divided by n factorial, it's
on the fly in response to the sequence of items compared.The         equivalent to choosing a random permutation of the input.
technique is illustrated by a specific adversary for the             When we do this, the pivot choices are essentially random,
standard C qsort function. The generalmethod works against           leading to an algorithm with the same running time as
any implementation of quicksort–even a randomizing one–              randomized quicksort. More precisely, the average number
that satisfies certain very mild and realistic assumptions.          of comparisons over all permutations of the input sequence
When using quicksort one often feels a nagging tension:              can be estimated accurately by solving the recurrence
suppose it goes quadratic? Tactics to avoid embarrassing             relation:
results in some low-entropy cases, such as already ordered
Here, n − 1 is the number of comparisons the partition uses.
Since the pivot is equally likely to fall anywhere in the sorted
list order, the sum is averaging over all possible splits.
This means that, on average, quicksort performs only about
39% worse than the ideal number of comparisons, which is
its best case. In this sense it is closer to the best case than the
worst case. This fast average runtime is another reason for
quicksort's practical dominance over other sorting
algorithms. The disadvantage of the simple version above is
that it requires O(n) extra storage space, which is as bad as
merge sort. The additional memory allocations required can
also drastically impact speed and cache performance in
practical implementations. There is a more complex version
which uses an in-place partition algorithm and can achieve
the complete sort using O(log n). The outline of a formal
proof of the O(nlogn) expected time complexity follows.
Assume that there are no duplicates as duplicates could be
handled with linear time pre- and post-processing, or
considered cases easier than the analyzed. Choosing a pivot,
uniformly at random from 0 to n − 1, is then equivalent to
choosing the size of one particular partition, uniformly at
random from 0 to n − 1.
                                                                            Figure 8. Quick sort
                                                                      search time. The operations we are allowed to do are to
                                                                      copy a finger, follow a pointer to a node, and to retrieve or
                      VII. CONCLUSION                                 change the contents of a node’s fields. So, the BST model
                                                                      allows for a subset of the data structures allowable in the
Various algorithms and sorting used in advanced data                  pointer-machine model. We will focus on data structures
struture to arrange the data in memory. Algorithm analysis            in this model that solve the dynamic connectivity problem.
is important in practice because the accidental or
unintentional use of an inefficient algorithm can
significantly impact system performance. In time-sensitive                                  VIII. ACKNOWLEGMENT
applications, an algorithm taking too long to run can
render its results outdated or useless. Quicksort is a space-         I would like to acknowledge and extend my sincere
optimized version of the binary tree sort. Instead of                 gratitude to the following persons who have made the
inserting items sequentially into an explicit tree, quicksort         completion of this Research possible: Head of the
organizes them concurrently into a tree that is implied by            Department of Computer Science and Engineering Mr.
the recursive calls. The algorithms make exactly the same             Shafeeq Ahmad. Asst. Prof. Mr. Hemant Kumar Singh, for
comparisons, but in a different order. The most direct                providing me with timely guidance and motivation and all
competitor of quicksort is heapsort. Heapsort's worst-case            Computer Science and Engineering Department faculty
running time is always . But, heapsort is assumed to be on            members and Staff.
average somewhat slower than quicksort. This is still
debated and in research, with some publications indicating                                         REFERENCES
the opposite. In Quicksort remains the chance of worst
case performance except in the introsort variant, which
switches to heapsort when a bad case is detected. If it is            [1]    Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. &
known in advance that heapsort is going to be necessary,                     Stein, Clifford (2001). Introduction to Algorithms. Chapter 1:
                                                                             Foundations (Second ed.). Cambridge, MA: MIT Press and
using it directly will be faster than waiting for intro sort to              McGraw-Hill. pp. 3–122. ISBN 0262032937.
switch to it. Red-black trees, like all binary search trees,
                                                                      [2]    Sedgewick, Robert (1998). Algorithms in C, Parts 1-4:
allow efficient in-order traversal in the fashion, Left-Root-                Fundamentals, Data Structures, Sorting, Searching (3rd ed.).
Right, of their elements. The search-time results from the                   Reading,     MA:     Addison-Wesley      Professional.  ISBN
traversal from root to leaf, and therefore a balanced tree,                  9780201314526.
having the least possible tree height, results in O(log n)
[3]   Knuth, Donald. The Art of Computer Programming. Addison-          [10] ^ J. M. Zayed, N. Nouvel, U. Rauwald, O. A. Scherman, Chemical
      Wesley.                                                                Complexity – supramolecular self-assembly of synthetic and
[4]   Greene, Daniel A.; Knuth, Donald E. (1982). Mathematics for the        biological building blocks in water, Chemical Society Reviews,
      Analysis of Algorithms (Second ed.). Birkhäuser. ISBN                  2010,                        39,                       2806–2816
      374633102X.                                                            http://pubs.rsc.org/en/Content/ArticleLanding/2010/CS/b922348g
[5]   Goldreich, Oded (2008). Computational Complexity: A               [11] ^ Weaver, Warren (1948). "Science and Complexity". American
      Conceptual Perspective. Cambridge University Press. ISBN               Scientist       36      (4):      536.      PMID       18882675.
      9780521884730.                                                         http://www.ceptualinstitute.com/genre/weaver/weaver-1947b.htm.
                                                                             Retrieved 2007-11-21
[6]   Mathworld: Red-Black Tree
                                                                        [12] ^ R.E. Tarjan, U. Vishkin: "Finding biconnected components and
[7]   San Diego State University: CS 660: Red-Black tree notes, by           computing tree functions in logarithmic parallel time"
      Roger Whitney
                                                                        [13] A. LaMarca and R. E. Ladner. "The Influence of Caches on the
[8]   Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and          Performance of Sorting." Proceedings of the Eighth Annual ACM-
      Clifford Stein. Introduction to Algorithms, Second Edition. MIT        SIAM Symposium on Discrete Algorithms, 1997. pp. 370–379.
      Press and McGraw-Hill, 2001. ISBN 0-262-03293-7 . Chapter 13:
      Red-Black Trees                                                   [14] Faron Moller. Analysis of Quicksort. CS 332: Designing
                                                                             Algorithms. Department of Computer Science, Swansea
[9]   Brin, Sergey; Page, Lawrence; “The Anatomy of a Large-Scale            University.
      Hypertextual Web Search Engine;” 7th Int. WWW Conf.
      Proceedings, Brisbane, Australia; April 1998.                          Conrado Martínez and Salvador Roura, Optimal sampling
                                                                             strategies in quicksort and quickselect. SIAM J. Computing
                                                                             31(3):683-705, 2001.