A Comparison of Current Graph Database Models
A Comparison of Current Graph Database Models
     Abstract— The limitations of traditional databases, in partic-               The benefits of using a graph data model are given by: the
  ular the relational model, to cover the requirements of current                 introduction of a level of abstraction which allows a more
  applications has lead the development of new database technolo-                 natural modeling of graph data; query languages and operators
  gies. Among them, the Graph Databases are calling the attention
  of the database community because in trendy projects where a                    for querying directly the graph structure; and ad-hoc structures
  database is needed, the extraction of worthy information relies                 and algorithms for storing and querying graphs.
  on processing the graph-like structure of the data. In this paper               Motivation. In order to choose the most suitable graph
  we present a systematic comparison of current graph database
  models. Our review includes general features (for data storing                  database, to be used in some application domain, we need
  and querying), data modeling features (i.e., data structures,                   to known its features, advantages and disadvantages. In par-
  query languages, and integrity constraints), and the support for                ticular, the data model implemented by a database play an
  essential graph queries.                                                        important role in its selection because this ensure (in advance)
                          I. I NTRODUCTION                                        an innate support for storing and querying the data for a
                                                                                  desired application domain.
     The limitations of traditional databases, in particular the
  relational model, to cover the requirements of current applica-                 Related Work. Concerning the origins of graph databases,
  tion domains, has lead the development of new technologies                      Angles and Gutierrez [2] developed a survey on graph database
  called NOSQL databases [1]. According to its data model,                        models proposed before the year 2002. The authors synthe-
  these databases can be categorized as: Wide-column stores,                      sized the notion of a “graph database model” and compare the
  which follow the BigTable model of Google (e.g., Cassandra);                    proposals available at the moment. It is important to emphasize
  Document stores, which are oriented to store semi-structured                    that most of the works reviewed by the authors followed a
  data (e.g., MongoDB); Key-value stores, which implement a                       theoretical interest more than practical developments. Hence,
  key to value persistent map for data indexing and retrieval                     a practical evaluation of the models is not available.
  (e.g. BerkeleyDB); and Graph Databases, which are oriented                         With respect to the recent developments in the area. Pere
  to store graph-like data.                                                       Burton [9] reviewed six graph databases (Neo4j, Hyper-
     Activity around graph databases flourished in the first half                   GraphDB, DEX, InfoGrid, Sones and VertexDB) and pub-
  of the nineties and then the topic almost disappeared [2]. Re-                  lished a comparison-matrix that included information like
  cently the area is gaining attention because in trendy projects                 software features (e.g., license), schema features (e.g., types
  where a database is needed (for example chemistry [3], biology                  of nodes and edges), query features (e.g., language and traver-
  [4], Web Mining [5] and semantic Web [6]), the importance                       sals), general database features (e.g., transactions, indexing),
  of the information relies on the relations more or equal than                   database operation utilities (e.g., protocols), language bindings
  on the entities (a basic principle of every graph database).                    and operating systems. This work summarizes the features
  Moreover, the continued emergence and increase of massive                       but does not include major discussion nor analysis. Another
  and complex graph-like data makes a graph database a crucial                    informal review [10] included an interesting questionnaire
  requirement. This renascence is showed by the availability of                   about desirable features for graph databases. Domingues-Sal
  several graph databases systems.                                                et al.[11] evaluated the performance of three graph databases
     One of the most important elements conforming a database                     (Neo4j, HypergraphDB and DEX) and an RDF Database
  is its database model (or simply data model). In the most                       (Jena). The tests, that included the evaluation of several typical
  general sense a data model is a collection of conceptual tools                  graph operations over different graph sizes, shown that DEX
  used to model representations of real-world entities and the                    and Neo4j were the most efficient implementations.
  relations among these entities [7]. From a database point of                       In summary, a systematic analysis and comparison of the
  view, a data model consists of three components: a set of data                  current graph data models is, to the best of our knowledge,
  structure types, a set of operators or inference rules, and a set               not available.
  of integrity rules [8].                                                         Objective and Contribution. In this paper we compare
     Graph database models can be characterized as those where                    current graph databases concentrating on their data model
  data structures for the schema and instances are modeled as                     features, that is data structures, query facilities, and integrity
  graphs or generalizations of them, and data manipulation is ex-                 constraints. We restrict our study to the logical level and avoid
  pressed by graph-oriented operations and type constructors [2].                 physical and implementation considerations. Additionally, and
        Authorized licensed use limited to: Zhejiang University. Downloaded on October 12,2021 at 11:58:12 UTC from IEEE Xplore. Restrictions apply.
considering that graph databases are oriented to store and                      Graph Stores. This category grouped implementations pro-
query graph data, we evaluate graph databases in terms of                       viding basic facilities for storing and querying graphs. Among
their support for querying a set of essential graph queries.                    them, Filament [20] is a project for a graph storage library
   The paper is organized as follows: In Section II, we sur-                    with default support for SQL through JDB; G-Store [21]
veyed current graph databases. Section III, presents the com-                   is a basic storage manager for large vertex-labeled graphs;
parison of graph database models. The support for essential                     redis graph [22] provides a basic Python implementation for
graph queries is discussed in Section IV. Finally, in Section                   storing graphs; and VertexDB [23] implements a graph store
V we draw some conclusions and future work.                                     on top of TokyoCabinet (a B-tree key/value disk store). Ad-
                                                                                ditionally, we can mention CloudGraph [24], Horton [25] and
              II. C URRENT G RAPH DATABASES
                                                                                Trinity [26] as prototypes of graph databases.
   In the last time there have been an increasing work on graph
                                                                                Technologies related to graph databases. Graph databases
databases. Next we present a list of current implementations,
                                                                                are oriented to store any type of graph, hence they are distinct
including a short description of each of them. Considering
                                                                                from specialized data management tools that use graph notions
their level of maturity, in terms of the facilities provided by
                                                                                in their implementation. Among them we can mention: Web-
a database management system, we consider two types of
                                                                                oriented databases, which use graph structures for modeling
developments: graph databases and graph stores. Additionally,
                                                                                data used in Web-oriented applications (e.g., InfoGrid and
we review several developments related to graph databases.
                                                                                FlockDB); Document-oriented databases, which implement
Graph Databases. We assume that a graph database must                           graph algorithms in order to traverse the relations between
provide most of the major components in database manage-                        documents (e.g., OrientDB [27]); and Triple Stores (also called
ment systems, being them: external interfaces (user interface                   RDF databases), which are oriented to store RDF graphs
or API), database languages (for data definition, manipulation                   consisting in statements of the form subject-predicate-object
and querying), query optimizer, database engine (middle-level                   (e.g., 4Store, Virtuoso and Bigdata). Giraph [28] is a graph
model), storage engine (low-level model), transaction engine,                   processing infrastructure that runs on Hadoop (see Pregel);
management and operation features (tuning, backup, recovery,                    AnGrapa [29] is an large-scale graph data management frame-
etc.). Among the developments satisfying the above condition,                   work for analytical processing; GoldenOrb [30] is a cloud-
we found AllegroGraph, DEX, HypergraphDB, InfiniteGraph,                         based open source project for massive-scale graph analysis;
Neo4J and Sones.                                                                Phoebus [31] is an implementation of Google’s Pregel for
   AllegroGraph[12] is one of the precursors in the current                     distributed processing of very large graphs.
generation of graph databases. Although it was born as a                           Additionally, we can found several in-memory graph tools
graph database, its current development is oriented to meet                     characterized by their restriction to work with small graphs.
the Semantic Web standards (i.e., RDF/S, SPARQL and                             For example, we can mention complex analysis tools (e.g.,
OWL). Additionally, AllegroGraph provides special features                      Cytoscape), and graph visualization tools (e.g., JUNG, IGraph,
for GeoTemporal Reasoning and Social Network Analysis.                          GraphViz, Gephi and NodeXL).
   DEX[13], [14] provides a Java library for management of
persistent and temporary graphs. Its implementation, based on                       III. C OMPARISON OF G RAPH DATABASE M ODELS
bitmaps and other secondary structures, is oriented to ensure                      A comparison among databases is typically done by either
a good performance in the management of very large graphs.                      using a set of common features or by defining a general model
   HyperGraphDB [15], [16] is a database that implements the                    used as a comparison basis. The evaluation presented in this
hypergraph data model where the notion of edge is extended                      section is oriented to evaluate the data model provided by each
to connect more than two nodes. This model allows a natural                     graph database, in terms of data structures, query language and
representation of higher-order relations, and is particularly use-              integrity constraints.
ful for modeling data of areas like knowledge representation,                      As an initial approach we consider some general features for
artificial intelligence and bio-informatics.                                     data storing, operation and manipulation (see Table I). From
   InfiniteGraph [17] is a database oriented to support large-                   the point of view of data storing, we review the support for
scale graphs in a distributed environment. It aims the efficient                 three storing schemas (i.e., main memory, external memory,
traversal of relations across massive and distributed data                      and back-end storage) and the implementation of indexes. It is
stores. Its focus of attention is to extend business, social and                important to emphasize that managing a huge amount of data
government intelligence with graph analysis.                                    is a important requirement in real-life applications for graph
   Neo4j [18] is based on a network oriented model where                        databases. Hence the support for external memory storage is
relations are first class objects. It implements an object-                      a main requirement. Additionally, indexes are the basis to
oriented API, a native disk-based storage manager for graphs,                   improve data retrieval operations.
and a framework for graph traversals.                                              From the point of view of data operation and manipulation,
   Sones [19] is a graph database which provides an inherent                    we evaluate whether a graph database implements database
support for high-level data abstraction concepts for graphs                     languages, application programming interfaces (API) and
(e.g., walks). It defines its own graph query language and a                     graphical user interfaces (GUI). We consider three database
underlying distributed file system.                                              languages: the Data Definition Language, which allows to
                                                                          172
                                                                          163
      Authorized licensed use limited to: Zhejiang University. Downloaded on October 12,2021 at 11:58:12 UTC from IEEE Xplore. Restrictions apply.
modify the schema of the database by adding, changing, or                          We consider four graph data structures: simple graphs,
deleting its objects; the Data Manipulation Language, which                     hypergraphs, nested graphs and attributed graphs. The basic
allows to insert, delete and update data in the database; and                   structure is a simple flat graph defined as a set of nodes
the Query Language, which allows to retrieve data by using                      (or vertices) connected by edges (i.e., a binary relation over
a query expression. Data operation and manipulation features                    the set of nodes). An Hypergraph extends this notion by
are summarized in Table II.                                                     allowing an edge to relate an arbitrary set of nodes (called
   In comparison with the traditional approach in databases,                    an hyperedge). A nested graph is a graph whose nodes can
where high-level languages for data operation and manipu-                       be themselves graphs (called hypernodes). Attributed graphs
lation are provided, the most common mechanism in graph                         are graphs where nodes and edges can contain attributes for
databases is the use of APIs. It means several advantages:                      describing their properties [32]. Additionally, over the above
standard vocabulary (for functions and procedures), easy de-                    types of graphs, we consider directed or undirected edges,
velopment of applications, and an unlimited power for query-                    labeled or unlabeled nodes/edges, and attributed nodes/edges
ing data. However, it also brings serious problems: lower level                 (i.e., edges between edges are possible).
of abstraction (for the general user), programming language                        Note that most graph databases are based on simple graphs
restrictions, implementation-dependent efficiency, and decid-                    or attributed graphs. Only two support hypergraphs and no
ability problems.                                                               one nested graphs. We can remark that hypergraphs and
   An important feature, not included in Table I, is the support                attributed graphs can be modeled by nested graphs. In contrast,
to import and export data in different data formats. Although                   the multilevel nesting provided by nested graphs cannot be
there exists some data formats for encoding graphs (e.g,                        modeled by any of the other structures [2].
GraphML and TGV) none of them has been selected as                                 In comparison with past graph database models, the in-
the standard one. This issue is particularly relevant for data                  clusion of attributes for nodes and edges is a particular
exchange and sharing.                                                           feature in current proposals. The introduction of attributes is
                             TABLE I                                            oriented to improve the speed of retrieval for the data directly
                      DATA STORING FEATURES                                     related to a given node. This feature shows the influence of
                                                                                implementation issues in the selection and definition of the
         Graph           Main      External    Backend     Indexes
                                                                                data structures (and consequently of the data model).
        Database        memory     memory      Storage
      AllegroGraph        •           •                         •                                             TABLE III
          DEX             •           •                         •                                      G RAPH DATA STRUCTURES
        Filament          •                       •
         G-Store                        •
     HyperGraphDB          •            •         •             •                                                       Graphs                                            Nodes                                      Edges
                                        •                       •
                                                                                                                                                    Attributed graphs
      InfiniteGraph
Node attribution
                                                                                                                                                                                                                                     Edge attribution
          Neo4j            •            •                       •
                                                                                                      Simple graphs
Nested graphs
Node labeled
                                                                                                                                                                                                                      Edge labeled
                                                                                                                      Hypergraphs
          Sones            •                                    •
        vertexDB                        •         •
                                                                                                                                                                                                          Directed
                                                                                   Graph Database
                               TABLE II                                             AllegroGraph        •                                                                •                                •             •
                                                                                        DEX                                                            •                 •              •                 •             •               •
             O PERATION AND MANIPULATION FEATURES                                     Filament          •                                                                •                                •             •
                                                                                       G-Store          •                                                                •                                •             •
                    Data          Data         Query      API       GUI            HyperGraphDB                         •                                                •                                •             •
     Graph        Definition     Manipulat.    Language                              InfiniteGraph                                                       •                 •              •                 •             •               •
    Database      Language      Language                                                Neo4j                                                          •                 •              •                 •             •               •
  AllegroGraph       •             •              •         •        •                  Sones                           •                              •                 •              •                 •             •               •
      DEX                                                   •                         vertexDB          •                                                                •                                •             •
    Filament                                                •
     G-Store           •                          •         •
 HyperGraphDB                                               •                      The expressive power for data modeling can be analyzed by
  InfiniteGraph                                              •
      Neo4j                                                 •
                                                                                comparing the support for representing entities, properties and
      Sones            •            •             •         •        •          relations at both instance and schema levels. This evaluation
    vertexDB                                                •                   is shown in Table IV.
                                                                                   At the schema level we found that models support the defini-
                                                                                tion of node, attribute and relation types. We also evaluate the
A. Graph data structures                                                        support for several nodes and relations at the instance level: an
   The data structures refer to the types of entities or objects                object node, identified by an object-ID, represents an instance
that can be used to model data. In the case of graph databases,                 of a node type; a value node represents an entity identified
the data structure is naturally defined around the notions of                    by a primitive value (i.e., its name); a complex node can
graphs, nodes and edges (see Table III).                                        represent an special complex entity, for example a tuple or a
                                                                          173
                                                                          164
      Authorized licensed use limited to: Zhejiang University. Downloaded on October 12,2021 at 11:58:12 UTC from IEEE Xplore. Restrictions apply.
set; an object relation, identified by a relation-ID, is an instance                                                                                                                   Data retrieval is the main objective in current graph
of a relation type; a simple relation represents a node-edge-                                                                                                                      databases. AllegroGraph supports reasoning via its Prolog
node instance; a complex relation is a relation with special                                                                                                                       implementation. Data analysis is supported in terms of special
semantics, for example grouping, derivation, and inheritance.                                                                                                                      functions (e.g., shortest path) for querying graph properties.
   Value nodes and simple relations are supported by all the                                                                                                                          The lack of a standard query language is a disadvantage
models. The reason is that both conform the most basic and                                                                                                                         of current graph databases. Recall that in mature databases
simple model for representing graph data. The inclusion of                                                                                                                         the operation of the database is performed via standard and
object-oriented concepts (e.g., IDs for objects) for representing                                                                                                                  well-defined database languages. Instead, the focus in current
entities and relations reflects the use of APIs as the favorite                                                                                                                     graph databases is to provide APIs for popular programming
high-level interface for the database. Note that this issue is not                                                                                                                 languages. Hence, the selection is hardly determined by the
new in graph databases. In fact, it was naturally introduced by                                                                                                                    programmer skills or by application requirements.
the so called graph object-oriented data models [2].
   Finally, the use of objects (for both nodes and relations)                                                                                                                                                  TABLE V
is different of using values. For example, an object node                                                                                                                             C OMPARISON OF QUERY FACILITIES (• INDICATES SUPPORT, AND ◦
represents an entity identified by an object-ID, but it does                                                                                                                                               PARTIAL SUPPORT )
                                                                                                                                                                                                                                    Graphical Q. L.
in order to specify the name of the entity. The same applies for
                                                                                                                                                                                                               Query Lang.
relations. This issue generates an unnatural form of modeling
                                                                                                                                                                                                                                                                  Reasoning
                                                                                                                                                                                                                                                      Retrieval
                                                                                                                                                                                                                                                                              Analysis
graph data.
                                                                                                                                                                                                                              API
                           TABLE IV                                                                                                                                                          Graph Database
                                                                                                                                                                                              AllegroGraph       ◦            •        •               •            •          •
           R EPRESENTATION OF ENTITIES AND RELATIONS
                                                                                                                                                                                                  DEX                         •                        •                       •
                                                                                                                                                                                                Filament                      •                        •
                                  Schema                                                            Instance                                                                                     G-Store         •                                     •
                                                                                                                                                                                             HyperGraphDB                     •                        •
                                                                                                                                                         Complex relations
                                                                                                                                                                                                                              •                        •
                                                                                                                                      Simple relations
                                                                                                                                                                                              InfiniteGraph
                                                                                                                   Object relations
                                                                                                   Complex nodes
                                    Property types
Relation types
                                                                                                                                                                                                  Neo4j          ◦            •                        •
                                                                      Object nodes
                                                                                     Value nodes
                     Node types
                                                                                                                                                                                                  Sones          •                     •               •                       •
                                                                                                                                                                                                vertexDB                      •                        •
   Graph Database
    AllegroGraph                                                                      •                                                  •
        DEX            •                                •               •             •                               •                  •
      Filament                                                                        •                                                  •                                         C. Integrity constraints
       G-Store                                                                        •                                                  •
   HyperGraphDB        •                                •                             •                                                  •                  •                         Integrity constraints are general statements and rules that
    InfiniteGraph       •                                •               •             •                               •                  •                                         define the set of consistent database states, or changes of state,
        Neo4j                                                           •             •                               •                  •                                         or both [2]. Table VI shows that integrity constraints are poorly
        Sones                                                                         •                                                  •                  •
      vertexDB                                                                        •                                                  •
                                                                                                                                                                                   studied in graph databases. In fact, there are not important
                                                                                                                                                                                   variations of the notions studied in the past.
                                                                                                                                                                                      We consider several integrity constraints: types checking,
                                                                                                                                                                                   to test the consistency of an instance with respect to the
B. Query languages                                                                                                                                                                 definitions in the schema; node/edge identity, to verify that
   A query language is a collection of operators or inference                                                                                                                      an entity or relation can be identified by either a value (e.g.,
rules that can be applied to any valid instance of the database,                                                                                                                   name or ID) or the values of its attributes (e.g., neighborhood
this with the objective of manipulating and querying data in                                                                                                                       identification); referential integrity, to test that only existing
any combination desired [2]. As is shown in Table III, query                                                                                                                       entities are referenced; cardinality checking, to verify unique-
languages are not frequent in current graph databases. In fact,                                                                                                                    ness of properties or relations; functional dependency, to test
there is not proposal for a standard one.                                                                                                                                          that an element in the graph determines the value of another;
   AllegroGraph supports SPARQL, the standard query lan-                                                                                                                           and graph pattern constraints, to verify an structural restriction
guage for RDF. SPARQL is based on graph pattern matching                                                                                                                           (e.g., path constraints).
but is not oriented to querying the graph structure of RDF                                                                                                                            The support for evolving schemas is a characteristic of graph
data. Neo4j is developing Cypher, a query language for                                                                                                                             databases that is commonly used to justify the lack of integrity
property graphs. G-Store and Sones include SQL-based query                                                                                                                         constraints. We aim that is not a valid argument assuming that
languages with special instructions for querying graphs. To the                                                                                                                    data consistency in a database is equal or even more important
best of our knowledge, there is not a formal definition of the                                                                                                                      than a flexible schema. Moreover, an evolving schema can be
semantics for the above query languages, making a systematic                                                                                                                       supported by allowing flexible structures in the schema (as in
study of their complexity and expressive power difficult.                                                                                                                           semi-structure data models). For example, the definition of a
                                                                                                                                                                             174
                                                                                                                                                                             165
      Authorized licensed use limited to: Zhejiang University. Downloaded on October 12,2021 at 11:58:12 UTC from IEEE Xplore. Restrictions apply.
relation type as optional, enables the user to decide either the                                                                                                             complexity. For example, finding simple paths with desired
inclusion or the absence of such relation for a given entity.                                                                                                                properties in direct graphs is an NP-complete problem [34].
                                                                                                                                                                                3) Pattern matching queries: Graph pattern matching con-
                               TABLE VI
                                                                                                                                                                             sists in to find all sub-graphs of a data graph that are
               C OMPARISON OF INTEGRITY CONSTRAINTS
                                                                                                                                                                             isomorphic to a pattern graph. Particularly, it deals with two
                                                                                                                                                                             problems: the graph isomorphism problem that has a unknown
                                                                                                                    Functional dependency
                                                                                             Cardinality checking
                                                                                                                                                                             problem which is an NP-complete problem [33].
                                                                     Referential integrity
                                                Node/edge identity
                               Types checking
                                                                                                                                                                                Pattern matching has attracted a great deal of attention
                                                                                                                                                                             in database theory [33], [34], [46], [47], data mining [48],
                                                                                                                                                                             bioinformatics [49], and semantic Web [50].
                                                                                                                                                                                4) Summarization queries: This type of queries are not
           Graph Database                                                                                                                                                    related to consult the graph structure. Instead they are based
                DEX               •                 •                    •
           HyperGraphDB           •                 •
                                                                                                                                                                             on special functions that allow to summarize or operate on
            InfiniteGraph          •                 •                                                                                                                        the query results, normally returning a single value. Aggregate
               Sones                                •                                            •                                                                           functions (e.g., average, Count, maximum, etc.) are included
                                                                                                                                                                             in this group.
                                                                                                                                                                                Additionally, we consider functions to compute some prop-
  IV. C URRENT G RAPH DATABASES AND THEIR SUPPORT                                                                                                                            erties of a graph and its elements. For example: the order of
                      FOR QUERYING GRAPHS                                                                                                                                    the graph (i.e., the number of vertices), the degree of a node
                                                                                                                                                                             (i.e., the number of neighbors of the node), the minimum,
   In the database literature we can found several works
                                                                                                                                                                             maximum and average degree in the graph, the length of a path
studying problems related to storing and querying graphs [33],
                                                                                                                                                                             (i.e., the number of edges in the path), the distance between
[34]. Such problems play an important role for applications
                                                                                                                                                                             nodes (i.e., the length of a shortest path between the nodes),
that use graphs as their data model [4]. In this section, we
                                                                                                                                                                             the diameter of the graph (i.e., the greatest distance between
compare graph databases in terms of their facilities to solve
                                                                                                                                                                             any two nodes), etc.
several queries which can be considered essential in graphs.
                                                                                                                                                                                In Table VII, we summarized the support that current graph
We grouped then in adjacency, reachability, pattern matching
                                                                                                                                                                             databases provide for answering the queries defined above.
and summarization queries.1
                                                                                                                                                                             Recall that most graph databases implements an API instead
   1) Adjacency queries: The primary notion in this type of
                                                                                                                                                                             of a query language. Hence, we evaluate whether an API
queries is node/edge adjacency. Two nodes are adjacent (or
                                                                                                                                                                             can answer an essential query by using a combination of
neighbors) when there is and edge between them. Similarly,
                                                                                                                                                                             basic functions, more than the facilities to implement an
two edges are adjacent when they share a common node.
                                                                                                                                                                             algorithm that solve the query (feature clearly supported by
   Some typical queries in this group are: basic node/edge
                                                                                                                                                                             a programming language).
adjacency, to test whether two nodes (edges) are adjacent [36];
                                                                                                                                                                                Additionally, Table VIII presents the results of a previous
and k-neighborhood of a node, to list all the neighbors of the
                                                                                                                                                                             study [35] about (past) graph query languages and their sup-
node [37]. Adjacency queries are useful in several contexts, for
                                                                                                                                                                             port for querying essential graph queries. Such study provides
example in molecular biology [38], information retrieval[39]
                                                                                                                                                                             a positive conclusion about the feasibility of developing a well-
and the semantic Web [40].
                                                                                                                                                                             designed graph query language. It is important to note that
   2) Reachability queries: This type of queries is character-
                                                                                                                                                                             these languages were well-studied from a theoretical point
ized by path or traversal problems. The problem of reachability
                                                                                                                                                                             of view. Hence, they provide a formal background for the
tests whether two given nodes are connected by a path.
                                                                                                                                                                             definition of a standard query language for graph databases.
   In this context, we can consider two types of paths: fixed-
length paths, which contain a fixed number of nodes and                                                                                                                                             V. C ONCLUSIONS
edges; and regular simple paths, which allow some node and
                                                                                                                                                                                In this paper we surveyed current graph databases and
edge restrictions (e.g., regular expressions). A related but more
                                                                                                                                                                             compare them according to their data modeling features. We
complicated problem is to find the shortest path, that is to
                                                                                                                                                                             shown that most graph database models provide an innate
compute the quickest/shortest route between two nodes.
                                                                                                                                                                             support for different graph structures, query facilities in the
   Reachability queries are studied and required in several
                                                                                                                                                                             form of APIs (most of the models) and query languages
application domains. For example in database theory [41],
                                                                                                                                                                             (a few of them), and basic notions of integrity constraints.
[42] (where recursive queries are in practice graph traversals),
                                                                                                                                                                             Additionally, we defined a set of essential graph queries and
spatial databases [43], biological databases [44], and the
                                                                                                                                                                             evaluated the query facilities provided by graph databases in
semantic Web [45]. One of the challenges to incorporate reach-
                                                                                                                                                                             order to answer such queries.
ability queries into a query language is their computational
                                                                                                                                                                                The review shown that some aspects of current graph
  1 A similar classification was proposed and used in a previous evaluation                                                                                                   database models deserve more development. In particular, the
of graph query languages [35].                                                                                                                                               definition of standard graph database languages (for defining,
                                                                                                                                                                       175
                                                                                                                                                                       166
       Authorized licensed use limited to: Zhejiang University. Downloaded on October 12,2021 at 11:58:12 UTC from IEEE Xplore. Restrictions apply.
                                                TABLE VII                                                                                                                                                                                                      [4] B. A. Eckman and P. G. Brown, “Graph data management for molecular
  C URRENT GRAPH DATABASES AND THEIR SUPPORT FOR ESSENTIAL                                                                                                                                                                                                         and cell biology,” IBM Journal of Research and Development, vol. 50,
                                                                                                                                                                                                                                                                   no. 6, pp. 545–560, 2006.
                                        GRAPH QUERIES                                                                                                                                                                                                          [5] A. Schenker, H. Bunke, M. Last, and A. Kandel, Graph-Theoretic
                                                                                                                                                                                                                                                                   Techniques for Web Content Mining, ser. Series in Machine Perception
                                                                                                                                                                                                                                                                   and Artificial Intelligence. World Scientific, 2005, vol. 62.
                          Adjacency                                                                          Reachability
                                                                                                                                                                                                                                                               [6] J. Hayes and C. Gutierrez, “Bipartite Graphs as Intermediate Model for
Node/edge adjacency
                                                                                                       Fixed-length paths
                                                                                                                                                                                                                                                                   (ISWC), ser. LNCS, no. 3298. Springer-Verlag, Nov 2004, pp. 47–61.
                                                                                                                                                                                                                     Pattern matching
                                                                 k-neighborhood
                                                                                                                                                                                                                                                               [7] A. Silberschatz, H. F. Korth, and S. Sudarshan, “Data Models,” ACM
                                                                                                                                                                                                                                        Summarization
                                                                                                                                                                                 Shortest path
                                                                                                                                                                                                                                                                   Computing Surveys, vol. 28, no. 1, pp. 105–108, 1996.
                                                                                                                                                                                                                                                               [8] E. F. Codd, “Data Models in Database Management,” in Proceedings
                                                                                                                                                                                                                                                                   of the 1980 Workshop on Data abstraction, Databases and Conceptual
                                                                                                                                                                                                                                                                   Modeling. ACM Press, 1980, pp. 112–114.
      Graph Database                                                                                                                                                                                                                                           [9] P.        Urbón,       “Nosql        graph       database        matrix,”
          Allegro             •                                                                            •                                                                                                            •                                          http://nosql.mypopescu.com/post/619181345/nosql-graph-database-
           DEX                •                                                                            •                                     •                                 •                                    •                                          matrix, May 2010.
         Filament             •                                                                            •                                                                                                            •                                     [10] “Short overview on the emerging world of graph databases,”
          G-Store             •                                                                            •                                     •                                 •                                    •                                          http://www.graph-database.org/overview.html.
       HyperGraph             •                                                                                                                                                                                         •                                     [11] D. Dominguez-Sal, P. Urbón-Bayes, A. Giménez-Vañó, S. Gómez-
          Infinite             •                                                                            •                                     •                                 •                                    •                                          Villamor, N. Martı́nez-Bazán, and J. L. Larriba-Pey, “Survey of graph
           Neo4j              •                                                                            •                                     •                                 •                                    •                                          database performance on the hpc scalable graph analysis benchmark,”
           Sones              •                                                                                                                                                                                         •                                          in Proc. of the 2010 international conference on Web-age information
         vertexDB             •                                                                            •                                     •                                                                      •                                          management (WAIM). Springer-Verlag, 2010, pp. 37–48.
                                                                                                                                                                                                                                                              [12] “AllegroGraph,” http://www.franz.com/agraph/allegrograph/.
                                                                                                                                                                                                                                                              [13] N. Martı́nez-Bazan, V. Muntés-Mulero, S. Gómez-Villamor, J. Nin, M.-
                                                TABLE VIII                                                                                                                                                                                                         A. Sánchez-Martı́nez, and J.-L. Larriba-Pey, “DEX: High-Performance
                                                                                                                                                                                                                                                                   Exploration on Large Graphs for Information Retrieval,” in Proceedings
 PAST GRAPH QUERY LANGUAGES AND THEIR SUPPORT FOR ESSENTIAL
                                                                                                                                                                                                                                                                   of the 16th Conference on Information and Knowledge Management
   GRAPH QUERIES   (• INDICATES SUPPORT, AND ◦ PARTIAL SUPPORT )                                                                                                                                                                                                   (CIKM). ACM, 2007, pp. 573–582.
                                                                                                                                                                                                                                                              [14] “DEX,” http://www.sparsity-technologies.com/dex.
                                                                                                                                                                                                                                                              [15] “HyperGraphDB,” http://www.hypergraphdb.org/.
                                                                                                                                                                                            Distance between nodes
                                                                                                                                                                                                                                                        176
                                                                                                                                                                                                                                                        167
      Authorized licensed use limited to: Zhejiang University. Downloaded on October 12,2021 at 11:58:12 UTC from IEEE Xplore. Restrictions apply.
[39] C.-S. Chang and A. L. P. Chen, “Supporting conceptual and neighbor-
     hood queries on the world wide web,” IEEE Transactions on Systems,
     Man, and Cybernetics (TSMC), vol. 28, no. 2, pp. 300–308, 1998.
[40] R. Guha, R. McCool, and E. Miller, “Semantic Search,” in Proceedings
     of the 12th International Conference on World Wide Web (WWW). ACM
     Press, 2003, pp. 700–709.
[41] P. Barceló, C. Hurtado, L. Libkin, and P. Wood, “Expressive Languages
     for Path Queries over Graph-structured Data,” in Proc. of the 29th
     ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database
     Systems (PODS), 2010, pp. 3–14.
[42] W. Fan, J. Li, S. Ma, N. Tang, and Y. Wu, “Adding regular expressions
     to graph reachability and pattern queries,” in Proc. of the IEEE 27th
     International Conference on Data Engineering (ICDE), 2011, pp. 39–
     50.
[43] J. Paredaens and B. Kuijpers, “Data Models and Query Languages for
     Spatial Databases,” Data & Knowledge Engineering (DKE), vol. 25, no.
     1-2, pp. 29–53, 1998.
[44] A. Matono, T. Amagasa, M. Yoshikawa, and S. Uemura, “An Eficient
     Pathway Search Using an Indexing Scheme for RDF,” Genome Infor-
     matics, no. 14, pp. 374–375, 2003.
[45] A. Gubichev and T. Neumann, “Path query processing on very large rdf
     graphs,” in Proc. of the 14th International Workshop on the Web and
     Databases (WebDB), 2011.
[46] P. Barcelo, L. Libkin, and J. Reutter, “Querying graph patterns,” in Proc.
     of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles
     of Database Systems (PODS), 2011, pp. 199–210.
[47] J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang, “Fast graph pattern
     matching,” in Proc. of the 2008 IEEE 24th International Conference on
     Data Engineering (ICDE). IEEE Computer Society, 2008, pp. 913–922.
[48] T. Washio and H. Motoda, “State of the Art of Graph-based Data
     Mining,” SIGKDD Explorer Newsletter, vol. 5, no. 1, pp. 59–68, 2003.
[49] X. Wang, “Finding patterns on protein surfaces: Algorithms and appli-
     cations to protein classification,” IEEE Transactions on Knowledge and
     Data Engineering, vol. 17, pp. 1065–1078, 2005.
[50] J. Carroll, “Matching RDF Graphs,” in Proceedings of the International
     Semantic Web Conference (ISWC), 2002.
                                                                                  177
                                                                                  168
Authorized licensed use limited to: Zhejiang University. Downloaded on October 12,2021 at 11:58:12 UTC from IEEE Xplore. Restrictions apply.