RDF and Property Graphs
RDF and Property Graphs
Olaf Hartig
University of Waterloo
http://olafhartig.de
arXiv:1409.3288v2 [cs.DB] 13 Nov 2014
Abstract
Both the notion of Property Graphs (PG) and the Resource Description Framework (RDF) are
commonly used models for representing graph-shaped data. While there exist some system-
specific solutions to convert data from one model to the other, these solutions are not entirely
compatible with one another and none of them appears to be based on a formal foundation. In
fact, for the PG model, there does not even exist a commonly agreed-upon formal definition.
The aim of this document is to reconcile both models formally. To this end, the document
proposes a formalization of the PG model and introduces well-defined transformations between
PGs and RDF. As a result, the document provides a basis for the following two innovations: On
one hand, by implementing the RDF-to-PG transformations defined in this document, PG-based
systems can enable their users to load RDF data and make it accessible in a compatible, system-
independent manner using, e.g., the graph traversal language Gremlin or the declarative graph
query language Cypher. On the other hand, the PG-to-RDF transformation in this document
enables RDF data management systems to support compatible, system-independent queries over
the content of Property Graphs by using the standard RDF query language SPARQL. Addi-
tionally, this document represents a foundation for systematic research on relationships between
the two models and between their query languages.
1 Introduction
This document reconciles two commonly used graph-based data models, namely the Property
Graphs model which is used by popular graph database systems such as Neo4j1, Titan2 and Spark-
see3, and the RDF data model [CWL+ 14], which has been standardized by the World Wide Web
Consortium (W3C) and is supported by numerous systems including IBM’s DB24, OpenLink’s Vir-
tuoso5 and Systap’s Bigdata6. The primary goal of this reconciliation is to enable a user who is
familiar with either of these data models to access data, represented in the other model, based on
a well-defined, system-independent view of this data given in the model most familiar to the user.
1
http://neo4j.org/
2
http://thinkaurelius.github.com/titan/
3
http://sparsity-technologies.com/#sparksee
4
http://www-01.ibm.com/software/data/db2/linux-unix-windows/nosql-support.html
5
http://virtuoso.openlinksw.com/
6
http://www.systap.com/
1
The document is structured as follows: Section 2 gives a very brief informal overview of Property
Graphs, RDF, and an extension of RDF that provides for a more user-friendly representation of
statement-level metadata and serves as a basis of the reconciliation. Section 3 then describes the
proposal informally. Thereafter, Sections 4 to 7 provide the formal details.
Example 1. Figure 1 illustrates a simple Property Graph with two nodes and two relationships
between them. One of these relationships has the label mentioned, the start node is the node for
Orson Welles, and the end node is the node for Stanley Kubrick. The other relationship, labeled
influencedBy, starts from the Kubrick node and ends in the Welles node. The light gray boxes as-
sociated with some of the graph elements represent properties of these elements. For instance, the
node for Stanley Kubrick has two properties giving the name and the birth year of the famous direc-
tor, and the influencedBy relationship has a property that specifies the certainty of whether Kubrick
was influenced by Welles. The other relationship does not have a property in the given graph.
In contrast to Property Graphs, RDF is a standardized data model. This model represents data as
sets of triples where each triple consists of three elements that are referred to as the subject, the
predicate, and the object of the triple. These triples allow users to describe arbitrary things
in terms of their attributes and their relationships to other things. That is, such things may be
the subject or object of an RDF triple in which, usually, they are denoted by an Internationalized
Resource Identifier (IRI)—which is a form of a Web-wide unique identifier (hence, in contrast to
vertex and edge identity that exists only within a given Property Graph, IRIs identify things across
datasets). Relationships and attributes are also denoted by IRIs, which appear as the predicate
of RDF triples. Finally, attribute values in RDF are called literals and appear in the object of
RDF triples. Any set of RDF triples can be conceived of as a directed graph in which each triple
2
Figure 2: A simple RDF graph consisting of four RDF triples.
represents an edge from the subject to the object of that triple; hence, the vertices in this graph
are all subjects and objects of all triples in the set.
Example 2. The following RDF data, represented in the Turtle format [PC14], contains four RDF
triples that describe two persons, Alice and Bob, denoted by the IRIs http://example.org/alice
and http://example.org/bob, respectively. The objects of the last three triples are literals. Fig-
ure 2 illustrates the graph representation of these four triples.
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.org/> .
A shortcoming that RDF has been widely criticized for is the lack of an approach to represent
statement-level metadata that is as intuitive and user-friendly as edge properties in Property
Graphs (such as the certainty statement in the sample Property Graph in Figure 1). While RDF
provides a notion of reification to support this use case [HPS14], this approach is awkward to use,
the resulting metadata is cumbersome to query, and it may blow up the dataset size significantly.
However, a recently proposed extension of RDF addresses this shortcoming by making triples about
triples a first class citizen in the data model [HT14]. That is, this extension, which is called RDF⋆,
augments RDF with the possibility to use a triple directly as the subject or object of other triples.
Example 3. Consider an extended version of the RDF Turtle format, called Turtle⋆ [HT14], that
implements the idea of embedding RDF triples into other RDF triples by enclosing the embedded
triple in ’<<’ and ’>>’. Then, the data in Example 2 can be augmented with metadata as follows:
<<ex:alice foaf:knows ex:bob>> ex:certainty 0.5 .
ex:alice foaf:name "Alice" .
ex:bob foaf:name "Bob" .
<<ex:bob foaf:age 23>> ex:certainty 0.9 .
Note that the last line represents a metadata triple whose subject is the triple that provides infor-
mation about the age of Bob, and the metadata triple describes the certainty of this information.
Similarly, the first line represents a metadata triple about the triple that is embedded as subject.
It is important to emphasize that RDF⋆ is simply a syntactic extension of RDF that makes dealing
with statement-level metadata more intuitive. In fact, there exists a well-defined transformation7
of RDF⋆ data back to standard RDF data [HT14]. On the other hand, standard RDF data can be
understood trivially as RDF⋆ data. This document leverages these properties and uses RDF⋆ as a
basis for reconciling the RDF data model and the Property Graphs model.
7
In a nutshell, this transformation employs standard RDF reification to convert any RDF⋆ metadata triple into a
set of ordinary RDF triples [HT14].
3
3 Informal Overview of the Proposal
This document formalizes three transformations: two from RDF⋆ data to Property Graphs, and
one from Property Graphs to RDF⋆ data. All three transformations also cover ordinary RDF data
because of the aforementioned relationship between RDF⋆ and the standard RDF data model. This
section provides an informal overview of the transformations and outlines some use cases.
Figure 3: An (RDF-like) Property Graph that represents the RDF⋆ data in Example 3.
A caveat of the straightforward approach to represent RDF⋆ data as Property Graphs is that the
transformation is not applicable in some cases because of differences in the expressiveness of RDF⋆
and RDF on the one hand and Property Graphs on the other. For instance, the object of a
metadata triple (with an RDF triple as its subject) may be an IRI that is also mentioned in other
triples, whereas the value of an edge (or vertex) property cannot be a vertex itself. For a more
detailed discussion of these differences and a formal characterization of the RDF⋆ data that the
transformation can be applied to, refer to Section 4.3. A formalization of the transformation itself
can be found in Section 5.
Even if not generally applicable, for the large number of cases in which the transformation can be
applied, the transformation can be used to enable users who are familiar with the graph query lan-
guage Cypher to query RDF/RDF⋆ data without having to learn the RDF query language SPARQL.
Example 5. The following Cypher query can be used to query the sample data to find the name of
persons known by somebody whose name is Alice (an execution over the sample data would return
a single value, namely, Bob).
START a=node(*)
MATCH (a)-[:foaf:name]->(bn { literal="Alice" }),
(a)-[:foaf:knows]->(p)-[:foaf:name]->(pn)
RETURN pn.literal
4
An equivalent 8 SPARQL representation of this Cypher query is given as follows.
SELECT ?pn WHERE {
?a foaf:name "Alice" .
?a foaf:knows ?p .
?p foaf:name ?pn }
The example demonstrates that Property Graphs obtained by the proposed transformation can
be queried using Cypher queries for which there exist corresponding SPARQL over the original
RDF (or RDF⋆ ) data. However, the transformation also enables users to benefit from features of
Cypher that are not available in SPARQL. For instance, Cypher allows for path expressions that
are more powerful than the property path feature provided by SPARQL 1.1. Another example is
querying statement-level metadata by accessing the corresponding edge properties using Cypher.
Example 6. Consider an extension of the query in the previous example with the additional condi-
tion that the certainty of the information that Alice knows the listed persons must be greater than 0.7.
The following Cypher query captures this condition by filtering on the corresponding edge property.
START a=node(*)
MATCH (a)-[:foaf:name]->(bn { literal="Alice" }),
(a)-[r:foaf:knows]->(p)-[:foaf:name]->(pn)
WHERE r.ex:certainty > 0.7
RETURN pn.literal
Any system that is based on Property Graphs can enable users to query RDF (or RDF⋆ ) data using
Cypher as outlined in the examples. To this end, the system only has to implement an import
procedure that applies the first (or the second) transformation as formalized in this document. On
the other hand, any RDF-based system—independent of whether it supports the RDF⋆ extension
or not—may support Cypher queries on top of a virtual Property Graph view of the data (where the
view is defined by the given transformation). However, for RDF⋆-enabled systems, if the primary
use case for supporting Cypher are more user-friendly queries over statement-level metadata (as
demonstrated in Example 6), a more native approach is SPARQL⋆ , which is an RDF⋆-specific
extension of SPARQL that carries over the idea of embedding triples and comes with a well-defined
query semantics [HT14].
Example 7. The following SPARQL⋆ query is equivalent to the Cypher query in Example 6.
SELECT ?pn WHERE {
?a foaf:name "Alice" .
<<?a foaf:knows ?p>> ex:certainty ?c .
FILTER ( ?c > 0.7 )
?p foaf:name ?pn }
An alternative form of this SPARQL⋆ query is to use a BIND clause as follows.
SELECT ?pn WHERE {
?a foaf:name "Alice" .
BIND( <<?a foaf:knows ?p>> AS ?t )
?t ex:certainty ?c .
FILTER ( ?c > 0.7 )
?p foaf:name ?pn }
8
The equivalence cannot be shown formally because Cypher does not have a formally defined query semantics.
5
Figure 4: A simple Property Graph that represents the RDF⋆ data in Example 3 partially.
6
Example 9. An RDF⋆ representation (given in Turtle⋆ format) that is the result of applying the
proposed transformation to the Property Graph in Example 1 (cf. Figure 1) is given as follows:
@prefix p: <http://example.org/property/> .
@prefix r: <http://example.org/relationship/> .
The symbols :b1 and :b2 represent RDF blank nodes in Turtle and in Turtle⋆ format. Blank nodes
can be understood as dataset-specific identifiers that cannot be referred to from other datasets (in
contrast to IRIs). Hence, a blank node in some RDF or RDF⋆ data is similar to a vertex identity
in a Property Graph. However, while this example assumes a mapping of vertices to blank nodes,
the transformation also allows a user to specify a mapping of vertices to IRIs (cf. Definition 13).
For this transformation, there also exists a minor limitation: The transformation cannot be used
for a Property Graph that contains distinct edges with the same start node, the same end node, and
the same label.9 Section 7 elaborates more on this limitation and formalizes the transformation.
As a counterpart to the first two transformations, this third transformation enables RDF-based
systems to import Property Graphs and execute SPARQL or SPARQL⋆ queries over the resulting
RDF or RDF⋆ data. On the other hand, a Property Graph system may provide virtual RDF⋆ views
of its Property Graphs that can be queried using SPARQL⋆ .
Example 10. Given the RDF⋆ representation in Example 9, a user can query the data originally
represented in the Property Graph in Figure 1 by using SPARQL⋆ queries such as the following.
SELECT ?n WHERE {
?w p:name "Orson Welles" .
<<?p r:influencedBy ?w>> p:certainty ?c .
?p p:name ?n .
}
ORDER BY ?c
The query returns an ordered list of persons influenced by Orson Welles; the order represents the
certainty of whether Welles influenced these persons. An equivalent Cypher query that can be
evaluated over the original Property Graph is given as follows.
START p=node(*)
MATCH (p)-[r:influencedBy]->(w { name="Orson Welles" })
RETURN p.name
ORDER BY r.certainty
The remainder of this document defines the proposed transformations formally. As a basis, Section 4
provides a formalization of the data models. Thereafter, Section 5 defines the lossless transformation
of RDF⋆ data to RDF-like Property Graphs (as outlined in Section 3.1 above), Section 6 defines the
lossy transformation of RDF⋆ data to simple Property Graphs (cf. Section 3.2 above), and Section 7
formalizes the transformation of Property Graphs to RDF⋆ (cf. Section 3.3 above).
9
It remains to be seen whether this limitation has an impact in practice (there exist alternative, more explicit
approaches to represent information in a Property Graph without having to use multiple edges between vertices).
7
4 Formalization of the Data Models
This section recalls the relevant definitions of RDF and RDF⋆, provides a formal definition of the no-
tion of a Property Graph, and identifies a class of RDF⋆ data that has an expressive power analogous
to Property Graphs and, thus, can be used as a basis for the transformations in this document.
4.1 RDF⋆
RDF⋆ is an extension of the RDF data model that makes metadata statements a first class citizen.
This section provides the relevant parts of the formalization of RDF [CWL+ 14] and RDF⋆ [HT14].
Assume pairwise disjoint sets I (all IRIs), B (blank nodes), and L (literals). Each literal is
associated with the IRI of a datatype [CWL+ 14]. Hence, formally, there exists a (total) mapping
dtype : L → I. Additionally, literals may be associated with a language tag [CWL+ 14]. Hence,
there exists a partial mapping lang : L → SLTags such that SLTags is the set of all strings that
represent well-formed language tags as defined in [PD09].
An RDF triple is a tuple hs, p, oi ∈ (I ∪ B) × I × (I ∪ B ∪ L) and a set of RDF triples is called
an RDF graph.
RDF⋆ extends such triples by permitting the embedding of a given triple in the subject or object
position of another triple. Triples whose subject or object is an embedded triple represent some
form of metadata. An embedded triple may itself be a metadata triple and, thus, may also contain
embedded triples; and so forth. The following definition captures this notion.
Moreover, for any tuple hs, p, oi ∈ Te and any natural number k, call the tuple k-nested if either:
Then, an RDF⋆ triple is a tuple t ∈ Te for which there exists a natural number k such that t is
k-nested. A set of RDF⋆ triples is called an RDF⋆ graph.
Note that the notion of k-nestedness in Definition 1 rules out infinitely deep nesting of RDF⋆ triples.
Hereafter, the set of all RDF⋆ triples is denoted by T ⋆, and, for any RDF⋆ triple t ∈ T ⋆, Elmts+(t)
denotes the set of all RDF
terms and all RDF⋆ triples mentioned in t; i.e., let t = hs, p, oi, then
Elmts+(t) = {s, p, o} ∪ x′ ∈ Elmts+(x) x ∈ {s, o} ∩ T ⋆ . An RDF⋆ triple t with Elmts+(t) ∩ T ⋆ 6= ∅
is called a metadata triple (note that any other RDF⋆ triple is an ordinary S RDF triple).
Overloading function Elmts , for any RDF graph G , Elmts (G ) = t∈G⋆ Elmts+(t). Further-
+ ⋆ ⋆ + ⋆
more, Emb+(G⋆ ) denotes the set of all RDF⋆ triples that are (recursively) embedded in RDF⋆ triples
of RDF⋆ graph G⋆ ; i.e., Emb+(G⋆ ) = Elmts+(G⋆ ) ∩ T ⋆.
8
Example 11. Formally, the Turtle⋆ document in Example 3 represents the following RDF⋆ graph.
G⋆ex = hex:alice, foaf:knows, ex:bobi, ex:certainty, 0.5 ,
hex:alice, foaf:name, "Alice"i,
hex:bob, foaf:name, "Bob"i,
hex:bob, foaf:age, 23i, ex:certainty, 0.9
Note that this RDF⋆ graph consists of four RDF⋆ triples, two of which are ordinary RDF triples
and the other are metadata triples. It holds that
Elmts+(G⋆ex ) = ex:alice, foaf:knows, ex:bob, hex:alice, foaf:knows, ex:bobi, ex:certainty, 0.5,
foaf:name, "Alice", "Bob", foaf:age, 23, hex:bob, foaf:age, 23i, 0.9
and, thus,
Emb+(G⋆ex ) = hex:alice, foaf:knows, ex:bobi, hex:bob, foaf:age, 23i .
For a more detailed discussion of RDF⋆, including a well-defined transformation from RDF⋆ graphs
to ordinary RDF graphs, an RDF⋆-enabled extension of the Turtle syntax (as used in Examples
3 and 9), and a definition of SPARQL⋆ (the RDF⋆-aware extension of SPARQL; cf. Examples 7
and 10), refer to [HT14].
– a set of vertices V ,
– a set of edges E,
– a function src : E → V that associates each edge with its source vertex,
– a function tgt : E → V that associates each edge with its target vertex, and
– a function lbl : E → dom(S) that associates each edge its label; and
• P is a function P : (V ∪ E) → 2P.
9
The following definition identifies a more restricted class of Property Graphs; that is, Property
Graphs in which, for each vertex or edge, all properties of this vertex or edge have a unique key.
Definition 3. A Property Graph G = hV, E, src, tgt, lbl , Pi is property-unique if, for each vertex
or edge x ∈ (V ∪ E), k1 6= k2 for all pairs of distinct properties hk1 , v1 i, hk2 , v2 i ∈ P(x).
Example 12. Reconsider the Property Graph introduced in Example 1 (cf. Figure 1). A formal
representation of this graph is given by the tuple Gex = hVex , Eex , src ex , tgt ex , lbl ex , Pex i consisting of
the following elements:
• Eex = {e1 , e2 }
• Pex (e1 ) = ∅
• Pex (e2 ) = h"certainty", 0.8i
While the given definition of a Property Graph does not restrict the types of property values, the
remainder of this document assumes that all values of the data types in D can be mapped to distinct
RDF literals. More
S precisely, the following definitions assume a bijective function vm : V → L⋆
such that V = D∈D dom(D) and L is a set of literals; i.e., L⋆ ⊆ L and |L⋆ | = |V|. Hereafter,
⋆
function vm is called the value-to-literal mapping. Note that this assumption rules out Property
Graphs with property values that are of some data type for collections.
2. Metadata triples embed triples as their subject only (not as their object).
10
4. For any literal in the triples it must be possible to convert the literal to a data value.
/ T ⋆.
2. It holds that o ∈ (i.e., o is not an embedded triple; cf. condition 2 above)
4. For every literal l ∈ L, if l ∈ Elmts+(t), then l ∈ dom(vm−1 ); where vm−1 is the inverse of
the aforementioned (bijective) value-to-literal mapping vm. (cf. condition 4 above)
Example 13. The sample RDF⋆ graph G⋆ex (cf. Example 11) is PG-convertible.
An even stronger notion of PG-convertibility is necessary for the second transformation. As men-
tioned in Section 3.2, this transformation can be applied only to RDF⋆ graphs that satisfy the
conditions for PG-convertibility and, additionally, do not contain metadata triples about attribute
triples. The following definition captures this stronger notion of PG-convertibility.
Example 14. The sample RDF⋆ graph G⋆ex (cf. Example 11) is not strongly PG-convertible because
it contains the metadata triple hex:bob, foaf:age, 23i, ex:certainty, 0.9 . However, the following
subgraph G⋆ex′ ⊆ G⋆ex , which does not contain this metadata triple, is strongly PG-convertible.
G⋆ex′ = hex:alice, foaf:knows, ex:bobi, ex:certainty, 0.5 ,
hex:alice, foaf:name, "Alice"i,
hex:bob, foaf:name, "Bob"i .
11
Definition 6. Let G⋆ be an RDF⋆ graph that is PG-convertible. Furthermore, let:
meta(G⋆ ) = t ∈ G⋆ Elmts+(t) ∩ T ⋆ 6= ∅ ,
ord(G⋆ ) = G⋆ ∪ Emb+(G⋆ ) \ meta(G⋆ ), and
SOTerms+(G⋆ ) = x ∈ (I ∪ B ∪ L) hs, p, oi ∈ ord(G⋆ ) and x ∈ {s, o} ;
i.e., meta(G⋆ ) is the set of all metadata triples in G⋆, ord(G⋆ ) is the set of all ordinary RDF triples
in G⋆, and SOTerms+(G⋆ ) is the set of all RDF terms in the subject position or object position
of some ordinary triple in G⋆. The RDF-like Property Graph representation of G⋆ is the
Property Graph G = hV, E, src, tgt, lbl , Pi that has the following properties:
1. The set of vertices V contains nV = SOTerms+(G⋆ ) vertices, each of which represents a differ-
ent RDF term in SOTerms+(G⋆ ); i.e., there exists a bijective function v : SOTerms+(G⋆ ) → V
that maps each RDF term x ∈ SOTerms+(G⋆ ) to a (different) vertex v(x) ∈ V .
+ ⋆
2. For each IRI u ∈ I with
u ∈ SOTerms (G ), the set of properties P v(u) of vertex v(u) ∈ V
is defined as P v(u) = h"kind", "IRI"i, h"IRI", im(u)i where im is the aforementioned
IRI-to-string mapping.
+ ⋆
3. For each blank node b ∈ B with b ∈ SOTerms (G ), the set of properties P v(b) of vertex
v(b) ∈ V is defined as P v(b) = h"kind", "blank node"i .
4. For each literal l ∈ L with l ∈ SOTerms+(G⋆ ), the set of properties P v(l) of vertex v(l) ∈ V
is defined as
P v(l) = h"kind", "literal"i, h"literal", vm−1 (l)i, h"datatype", im dtype(l) i ∪ lang
where vm−1 is the inverse of the aforementioned (bijective) value-to-literal mapping vm, and
(
h"language", lang(l)i if l ∈ dom(lang),
lang =
∅ else.
5. The set of edges E contains nE = ord(G⋆ ) edges, each of which represents a different RDF
triple t ∈ ord(G⋆ ). Hence, there exists a bijective function e : ord(G⋆ ) → E that maps each
t ∈ ord(G⋆ ) to a (different) edge e(t) ∈ E.
6. For each triple t ∈ ord(G⋆ ) with t = hs, p, oi, the label of edge e(t) ∈ E is im(p), and
the source
and target vertex of e(t) is v(s) and v(o), respectively; i.e., lbl e(t) = im(p),
src e(t) = v(s), and tgt e(t) = v(o).
7. For each triple t ∈ ord(G⋆ ), the set of properties P e(t) of edge e(t) ∈ E is defined as
[
P e(t) = him(p), vm−1 (o)i .
hs,p,oi∈meta(G⋆ ) such that s=t
12
Example 15. Since the sample RDF⋆ graph G⋆ex (cf. Example 11) is PG-convertible, the given
transformation can be used to convert G⋆ex to a Property Graph. Figure 3 (in Example 4) illustrates
the resulting Property Graph G′ex , which is the RDF-like Property Graph representation of G⋆ex .
′ , src ′ , tgt ′ , lbl ′ , P′ i that
Formally, this Property Graph is given by the tuple G′ex = hVex′ , Eex ex ex ex ex
consists of the following elements:
• Vex′ = {v1 , v2 , v3 , v4 , v5 }
′ = {e , e , e , e }
• Eex 1 2 4 4
• src ′ex (e1 ) = v1 , tgt ′ex (e1 ) = v3 , and lbl ′ex (e1 ) = "http://xmlns.com/foaf/0.1/name"
• src ′ex (e2 ) = v1 , tgt ′ex (e2 ) = v2 , and lbl ′ex (e2 ) = "http://xmlns.com/foaf/0.1/knows"
• src ′ex (e3 ) = v2 , tgt ′ex (e3 ) = v4 , and lbl ′ex (e3 ) = "http://xmlns.com/foaf/0.1/name"
• src ′ex (e4 ) = v2 , tgt ′ex (e4 ) = v5 , and lbl ′ex (e4 ) = "http://xmlns.com/foaf/0.1/age"
• P′ex (v1 ) = h"kind", "IRI"i, h"IRI", "http://example.org/alice"i
• P′ex (v2 ) = h"kind", "IRI"i, h"IRI", "http://example.org/bob"i
• P′ex (v3 ) = h"kind", "literal"i, h"literal", "Alice"i
• P′ex (v4 ) = h"kind", "literal"i, h"literal", "Bob"i
• P′ex (v5 ) = h"kind", "literal"i, h"literal", 23i
• P′ex (e1 ) = ∅
• P′ex (e2 ) = h"http://example.org/certainty", 0.5i
• P′ex (e3 ) = ∅
• P′ex (e4 ) = h"http://example.org/certainty", 0.9i
Remark 1. For any PG-convertible RDF⋆ graph G⋆ it holds that if G⋆ contains two (distinct)
metadata triples that differ only in their objects (i.e., there exist hs, p, oi, hs′, p′, o′ i ∈ G⋆ such that
s = s′, p = p′, o 6= o′, s ∈ T ⋆, and s′ ∈ T ⋆), then the RDF-like Property Graph representation of G⋆
is not property-unique.
While the given transformation is lossless (i.e., resulting Property Graphs contain all information
present in the original RDF⋆ graph), some use cases may have the stronger requirement that the
original RDF⋆ graph can be reconstructed exactly from its RDF-like Property Graph representation.
Hence, for these use cases the transformation must be invertible.
To ensure an invertible transformation, any RDF⋆ graph to be transformed must not contain
redundant RDF⋆ triples; that is, RDF⋆ triples that are embedded in metadata triples in the RDF⋆
graph and, additionally, appear directly (as a separate element) in the RDF⋆ graph. The following
definition captures this notion of redundancy formally.
Definition 7. Let G⋆ be an RDF⋆ graph and let t ∈ G⋆ be an RDF⋆ triple in G⋆ . RDF⋆ triple t is
redundant in G⋆ if there exists another (metadata) triple t′ ∈ G⋆ such that t ∈ Elmts+(t′ ).
RDF⋆ graphs must have the following minimality property to ensure an invertible transformation.
13
Definition 8. An RDF⋆ graph G⋆ is minimal if there does not exist an RDF⋆ triple t ∈ G⋆ that
is redundant in G⋆.
Example 16. It is easily verified that the sample RDF⋆ graph G⋆ex (cf. Example 11) is minimal.
Remark 2. Note that every RDF⋆ graph can be converted into a minimal RDF⋆ graph by removing
all redundant RDF⋆ triples. Then, any unfolded RDF graph [HT14] of the resulting (minimal) RDF⋆
graph is equivalent to the corresponding unfolded RDF graph of the original RDF⋆ graph.
i.e., meta(G⋆ ) is the set of all metadata triples in G⋆, ord(G⋆ ) is the set of all ordinary RDF triples
in G⋆, ordA(G⋆ ) is the set of all ordinary attribute triples in G⋆, ordR(G⋆ ) is the set of all ordinary
relationship triples in G⋆, and SO+(G⋆ ) is the set of all IRIs and blank nodes in the subject position
or object position of some ordinary triple in G⋆. The simple Property Graph representation
of G⋆ is the Property Graph G = hV, E, src, tgt, lbl , Pi that has the following properties:
1. The set of vertices V contains nV = SO+(G⋆ ) vertices, each of which represents a different
IRI or blank node x ∈ SO+(G⋆ ). Hence, there exists a bijective function v : SO+(G⋆ ) → V
that maps each x ∈ SO+(G⋆ ) to a (different) vertex v(x) ∈ V .
2. For each IRI u ∈ I with u ∈ SO+(G⋆ ), the set of properties P v(u) of vertex v(u) ∈ V is
defined as
[
P v(u) = h"IRI", im(u)i ∪ him(p), vm−1 (o)i ,
hs,p,oi∈ordA(G⋆ ) such that s=u
where im is the aforementioned IRI-to-string mapping, and vm−1 is the inverse of the afore-
mentioned (bijective) value-to-literal mapping vm.
14
3. For each blank node b ∈ B with b ∈ SO+(G⋆ ), the set of properties P v(b) of vertex v(b) ∈ V
is defined as [
P v(b) = him(p), vm−1 (o)i .
hs,p,oi∈ordA(G⋆ ) such that s=b
4. The set of edges E contains nE = ordR(G⋆ ) edges, each of which represents a different (re-
lationship) triple t ∈ ordR(G⋆ ). Hence, there exists a bijective function e : ordR(G⋆ ) → E
that maps each t ∈ ordR(G⋆ ) to a (different) edge e(t) ∈ E.
5. For each (relationship) triple t ∈ ordR(G⋆ ) with t = hs, p, oi, the label of edge e(t) ∈ E is
im(p), and
the source and target
vertex is v(s) and v(o), respectively; i.e., lbl e(t) = im(p),
src e(t) = v(s), and tgt e(t) = v(o).
6. For each (relationship) triple t ∈ ordR(G⋆ ), the set of properties P e(t) of edge e(t) ∈ E is
defined as [
P e(t) = him(p), vm−1 (o)i .
hs,p,oi∈meta(G⋆ ) such that s=t
Example 17. The RDF⋆ graph G⋆ex′ , as introduced in Example 14, is strongly PG-convertible and,
thus, it can be converted to a simple Property Graph representation. Figure 4 (in Example 8)
illustrates the resulting Property Graph G′′ex , which is the simple Property Graph representation of
′′ , src ′′ , tgt ′′ , lbl ′′ , P′′ i that
G⋆ex′ . Formally, this Property Graph is given by the tuple G′′ex = hVex′′ , Eex ex ex ex ex
consists of the following elements:
• Vex′′ = {v1 , v2 }
′′ = {e }
• Eex 1
• src ′′ex (e1 ) = v1 , tgt ′′ex (e1 ) = v2 , and lbl ′′ex (e1 ) = "http://xmlns.com/foaf/0.1/knows"
• P′′ex (v1 ) = h"IRI", "http://example.org/alice"i, h"http://xmlns.com/foaf/0.1/name", "Alice"i
• P′′ex (v2 ) = h"IRI", "http://example.org/bob"i, h"http://xmlns.com/foaf/0.1/name", "Bob"i
• P′′ex (e1 ) = h"http://example.org/certainty", 0.5i
Remark 3. For any strongly PG-convertible RDF⋆ graph G⋆ it holds that if G⋆ contains two (dis-
tinct) RDF⋆ triples that differ only in their objects and these objects are literals (i.e., there exist
hs, p, oi, hs′, p′, o′ i ∈ G⋆ ∪ Emb+(G⋆ ) such that s = s′, p = p′, o 6= o′, o ∈ L, and o′ ∈ L), then the
simple Property Graph representation of G⋆ is not property-unique.
15
ordinary triples, and edge properties are represented as metadata triples whose subject is the triple
for the corresponding edge.
While representing each edge (including its label) as a single triple is perhaps the most intuitive
approach to transform such edges, this approach has the following shortcoming: If there are two (or
more) distinct edges that connect the same vertices and have the same label (but may have different
properties), the approach would represent both edges by a single triple. As a result, this triple
would not represent any one of the edges unambiguously when embedded in a metadata triple for
a property of the edge. To avoid this problem, the transformation is restricted to Property Graphs
that do not contain distinct edges with the same source vertex, the same target vertex, and the
same label. Hereafter, these Property Graphs are called edge-unique.
Definition 10. A Property Graph G = hV, E, src, tgt , lbl , Pi is edge-unique if there does not
exist a pair of edges (e, e′ ) ∈ E × E such that e 6= e′, src(e) = src(e′ ), tgt(e) = tgt(e′ ), and
lbl (e) = lbl (e′ ).
Example 18. The Property Graph Gex in Example 12 is edge-unique (cf. Figure 1).
Remark 4. Note that the requirement of edge-uniqueness does not present a restriction on the
expressiveness of Property Graphs. Information represented by introducing multiple edges between
vertices can also be modeled by using an alternative, more explicit approach. For instance, the
relationship captured by each of the multiple edges may be modeled as a separate vertex.
The transformation assumes three user-specified templates for generating IRIs. The first two of
these templates can be used to generate IRIs that denote arbitrary edge labels and arbitrary proper-
ties keys, respectively. The following two mappings capture the notion of these templates formally.
Definition 11. An edge label mapping lm is a bijective function lm : dom(S) → I such that I
is a set of IRIs; i.e., I ⊆ I and |I| = |dom(S)|. (Recall that dom(S) is the set of all strings.)
Example 19. An example of a property key mapping is a function kmex such that, for every
string s ∈ dom(S), function kmex returns the IRI http://example.org/property/urlenc(s) where
urlenc(s) is the URL encoded version of string s. Similarly, lmex is an edge label mapping such that,
for every string s ∈ dom(S), lmex returns the IRI http://example.org/relationship/urlenc(s).
Remark 5. While a user-specified edge label mapping (resp. a property key mapping) can be used
for transforming different Property Graphs, this should be done only if the same labels (resp. the
same property keys) in these different Property Graphs have the same meaning. If that is not the
case, a different edge label mapping (resp. property key mapping) should be used.
The third user-specified template can be used to generate IRIs or blank nodes for the vertices of a
Property Graph. Formally, the following mapping captures such a template.
Definition 13. Let G = hV, E, src, tgt, lbl , Pi be a Property Graph. A vertex identity mapping
id for G is an injective function id : V → (B ∪ I).
16
Definition 14. Let G = hV, E, src, tgt, lbl , Pi be a Property Graph that is property-unique (cf. Def-
inition 3) and edge-unique, let id be a vertex identity mapping for G,
let lm be an edge label
map-
ping, and, for each edge e ∈ E, let te be the RDF triple id src(e) , lm lbl (e) , id tgt(e) . Then,
given a property key mapping km (and the value-to-literal mapping vm), the (id, lm, km)-specific
RDF⋆ representation of G is the set of RDF⋆ triples G⋆ = G⋆vp ∪ G⋆ep ∪ G⋆en that consists of the
following three subsets:
G⋆vp = hid(v), km(k), vm(x)i v ∈ V and hk, xi ∈ P(v) (vertex properties)
G⋆ep = hte , km(k), vm(x)i e ∈ E and hk, xi ∈ P(e) (edges with properties)
G⋆en = te e ∈ E and P(e) = ∅ (edges without properties)
Example 20. Recall the sample Property Graph Gex as given in Example 12 (and illustrated in
Figure 1). Assume a vertex identity mapping idex that maps each vertex in Gex to a distinct blank
node; i.e., idex (Kubrick) = bK and idex (Welles) = bW , with bK , bW ∈ B. Then, the (idex , lmex , kmex )-
specific RDF⋆ representation of Property Graph Gex is the following RDF⋆ graph (Example 9 provides
a Turtle⋆ serialization of this RDF⋆ graph):
bK , http://example.org/property/name, "Stanley Kubrick" ,
bK , http://example.org/property/birthyear, 1928 ,
bW , http://example.org/property/name, "Orson Welles" ,
bW , http://example.org/relationship/mentioned, bK ,
hbK , http://example.org/relationship/influencedBy, bW i,
http://example.org/property/certainty, 0.8 .
8 Acknowledgements
The author would like to thank a number of people for providing feedback on the proposal, as well
as on earlier versions of this document. In alphabetical order, these are: Bryan Thompson, Cosmin
Basca, Grant Weddell, Juan Sequeda, Kavitha Srinivas, Maria-Esther Vidal, Mike Personick, Peter
Boncz, and Tamer Özsu.
References
[CWL+ 14] Richard Cyganiak, David Wood, Markus Lanthaler, Graham Klyne, Jeremy J. Carroll,
and Brian McBride. RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation,
Online at http://www.w3.org/TR/rdf11-concepts/, February 2014.
[HPS14] Patrick J. Hayes and Peter F. Patel-Schneider. RDF 1.1 Semantics. W3C Recommen-
dation, Online at http://www.w3.org/TR/rdf11-mt/, February 2014.
[HT14] Olaf Hartig and Bryan Thompson. Foundations of an Alternative Approach to Reifica-
tion in RDF. CoRR, abs/1406.3399, 2014. Online: http://arxiv.org/abs/1406.3399.
17
[PC14] Eric Prud’hommeaux and Gavin Carothers. RDF 1.1 Turtle. W3C Recommendation,
Online at http://www.w3.org/TR/turtle/, February 2014.
[PD09] A. Phillips and M. Davis. Tags for Identifying Languages. IETF Best Current Practice,
Online at http://tools.ietf.org/html/bcp47, September 2009.
[RWE13] Ian Robinson, Jim Webber, and Emil Eifrém. Graph Databases. O’Reilly Media, 2013.
18