KEMBAR78
Neo4j Graph Database Modeling Guide | PDF | Computing | Graph Theory
50% found this document useful (2 votes)
240 views158 pages

Neo4j Graph Database Modeling Guide

Uploaded by

maitphang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
240 views158 pages

Neo4j Graph Database Modeling Guide

Uploaded by

maitphang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 158

I

CopyrightedMaterial
GraphDatabaseModelingwithneo4j
Copyright©2020-21byAjitSingh,AllRightsReserved.

Nopartofthispublicationmaybereproduced,storedinaretrievalsystem or
transmitted,inanyform orbyanymeans—electronic,mechanical,photocopying,
recordingorotherwise—
withoutpriorwrittenpermissionfromtheauthor,exceptfor
theinclusionofbriefquotationsinareview.

Forinformationaboutthistitleortoorderotherbooksand/orelectronicmedia,
contactthepublisher:

AjitSingh&AnantKumar
e:ajit_singh24@yahoo.com
e:anant@jdwcpatna.com
w:https://www.ajitvoice.wordpress.com
Preface

Thisbookisdesignedtowalkyouthroughthegraphdatamodeling.
Youwillbeintroducedtothe
basicprocessofdesigningagraphdatamodelthatcananswerawiderangeofbusiness
questionsacrossavarietyofdomains.

Graphdatamodelingistheprocessinwhichauserdescribesanarbitrarydomainasa
connectedgraphofnodesandrelationshipswithpropertiesandlabels.Agraphdatam
odelis designedtoanswerquestionsintheform
ofCypherqueriesandsolvebusinessandtechnical
problemsbyorganizingadatastructureforthegraphdatabase.

Thisbookissimplytheintroductiontodatamodelingusingasimple,straightforward
scenario.
Thereareplentyofopportunitiesthroughouttheupcomingguidestopracticemodelin
gdomains andanalyzingchangestothemodelthatmightneedtobemade.

Dataislikewater.It’sprobablyuselessifyoudon’tputitinahelpfulcontainer.Thesha
pe,size
andfunctionalityofthatcontainerdependsonyourintendeduse,butingeneral,aconta
ineris necessary.

Thesamegoeswithdata.Whenitcomestocreatinganewapplicationordatasolution,
you
needtoprovideastructureforthatdata.Thatstructuringprocessisknownasdatamod
eling.

Oftenreservedsolelyforseniordatabaseadministrators(DBAs)orprincipaldevel
opers,data
modelingissometimespresentedasanesotericartunknownabletomeremortals.Yo
umay worshiptheexpertdatamodelerfromafar.

Whilesomedatamodelingscenariosreallyarebestleftuptotheexperts,itdoesn’thav
etobe
difficultbydefault.Infact,datamodelingisasmuchabusinessconcernasatechnologi
calone. Soifyoudon’tknowasinglelineofcode,you’reinluck.

Anyonecandobasicdatamodeling,andwiththeadventofgraphdatabasetechnology,
matchingyourdatatoacoherentmodeliseasierthanever.Datamodelingisanabstract
ion
process.Youstartwithyourbusinessanduserneeds(i.e.,whatyouwantyourapplicat
iontodo).
Then,inthemodelingprocessyoumapthoseneedsintoastructureforstoringandorga
nizing yourdata.

Everydatamodelisunique,dependingontheusecaseandthetypesofquestionsthatus
ers needtoanswerwiththedata.Becauseofthis,thereisno“one-size-fits-
all”approachtodata
modeling.Usingbestpracticesandcarefulmodelingwillprovidethemostvaluabler
esultin producinganaccuratedatamodelthatbenefitsyourprocessesandusecase.

Thegraphdatabasesarenecessaryforaveryconcretedatasets:hugeamounts
ofdataofhigh
complexity,whereentitiesareveryrelatedtooneanother.Thatisbecause,they
efficientlyquerythroughtherelationshipsamongentities,incontrasttorelational
databases.
Graphdatabasessupportalgorithmstoperform concretequeriesthatareoutof
reachtorelationaldatabases,fortheirtabularstructureandstaticschema.Also,the
biggerthevolumeofdata,theslowerthequerieswouldbeinSQL,becausethey
would requireto lookup joined tableswith agreatnumberoftuples.Graph
databasesallow totraversethroughthegraphandreachahighlevelofdepth,
withouthavingtoreadallthedatastored.
Neo4jisanopensourceNoSQLgraphdatabase.Itisafullytransactionaldatabase
(ACID)thatstoresdatastructuredasgraphsconsistingofnodes,connectedby
relationships.Inspiredbythestructureoftherealworld,itallowsforhighquery
performanceoncomplexdata,whileremainingintuitiveandsimpleforthedeveloper
.

Neo4jis,byfar,theleadingtechnologyofgraphdatabases.Itanalyzeandtraverseof
alldatainrealtimeandgivestheresultsveryfast.Ithasgreatuserinterfaceand
support.Butthegreatestfeatureofitis;evendatasizegrow exponentially,
performanceofNeo4jdoesnotaffectedbyit.

Usingthisbook,you'llgetto learnthetheoryofgraphdatabaseandhowtouse
Neo4jtobuilduprecommendations,relationships,andcalculatetheshortestroute
betweentwolocations.Withexampledatamodels,bestpractices,use-cases,andan
applicationputtingeverythingtogether,thisbookwillgiveyoueverythingyouneedto
reallygetstartedwithNeo4j.Startingwithabriefintroductiontographtheory,this
bookwillshow youtheadvantagesofusinggraphdatabasesalongwithdata
modelingtechniquesforgraphdatabases.You'llgainpracticalhands-onexperience
withcommonlyusedandlesserknownfeaturesforupdatinggraphstorewith
Neo4j'sCypherquerylanguage.Thisbookincludesalotofbackgroundinformation,
helpsyougraspthefundamentalconceptsbehindthisradicalnewwayofdealing with
connected data,and willgiveyou lotsofexamplesofuse casesand
environmentswhereagraphdatabasewouldbeagreatinterest.

Neo4jisbeingusedbysocialmediaandecommerceindustrygiants.Youcantake
advantageofNeo4j'spowerfulfeaturesandbenefits-addBeginningNeo4jtoyour
librarytoday.

Contents

1.GraphDataModel
Graphdatabases
2.GraphSchemas
Selectingvertexlabels
Examplesoflabelselection
Drawingagraphschema
Summary
3.ConvertingERmodelstographschemas
ERmodelsanddiagrams
Example
ProceduretoconvertanERmodeltoagraphschema
Rule#1:Entitytypesbecomevertextypes
Rule#2:Binaryrelationshiptypesbecomeedgetypes
Rule#3:Naryrelationshiptypesbecomevertextypes
Conversionexample
Verticesarevertices,andedgesareedges
Summary
4.NormalizingGraphSchemas
Normalizationofrelationaldatabases
Transformationrulesthatproduceequivalentschemas
RuleA:Renamingpropertiesandlabels
RuleB:Reversingedgedirections
RuleC:Propertydisplacement RuleD:Specializationandgeneralization
RuleE:Edgepromotion
RuleF:Propertypromotion
RuleG:Propertyexpansion
Summary
5.Onemetarulefornormalization
Schemasandconstraints
Graphuniverses,transformationsandequivalence
Derivedtypes
Metarule:Addingandremovingderivedtypes
Provingthemetarule
Provingthe7rules:Renaming,Reversing,PropertyDisplacement,
Beyondtransformationrules
Summary
6.Validatinggraphschemas
7.Pixy:Firstorderlogicongraphdatabases
Background
OnSQL
Onfirstorderlogic
OnGremlin
Pixy:FirstorderlogicwithGremlin
ERmodelsinPixy
Queryrequirementsdon'tusuallymatterwhilemodeling
7.IntroductiontoDatabase
StateoftheartofDatabases
TypesofDBMS
NoSQLDBMS ComparisonofDBMS
Currenttrends
9.GraphDatabases
GraphTheoryandItsApplications
ConceptsofGraphDatabases
Queryperformance
10.Neo4j
IntroductionofNeo4j
AdvantagesofNeo4j
PropertiesofNeo4j
PerformanceInNeo4j
HowToIncreasePerformanceOfNeo4j?
CypherQueryLanguage
Structure
OperationsInCypher
LoadingDataWithCypher
UseCasesofNeo4j
11.Gettingstartedwithneo4j
InstallationorSetup
Installation&StartingaNeo4jserver
StartNeo4jfromconsole(headless,withoutwebserver)
StartNeo4jwebserver
StartNeo4jwebserver
Deleteoneofthedatabases
CypherQueryLanguage
RDBMSVsGraphDatabase
Cypher-Implementation
Creation Createanode
Createarelationship
QueryTemplates
CreateanEdge
Deletion
Deleteallnodes
Deleteallnodesofaspecificlabel
Match(capturegroup)andlinkmatchednodes
UpdateaNode
DeleteAllOrphanNodes
Python&Noe4j
12.Neo4jApplication
UseCaseSelected
Data
ImplementingData
Exportdata
QueryExamples(Neo4j-SQL)
ShortestPath
Betweennesscentrality:
Closenesscentrality:
PageRank:
CommunityDetection:
PossiblequeriesonSQL Bibliography

PartI
Chapter1

GraphDataModel

Arelationaldatabasehasaledger-stylestructure.ItcanbequeriedthroughSQL,and
itiswhatmostpeoplearefamiliarwith.Eachentryiscomposedofarowinatable.
Tablesarerelatedbyforeign-keyconstraints,whichishowyoucanconnectone
table’sinformationtoanother,liketheprimarykeys.Slowmulti-leveljoinsareoften
involvedwhenqueryingrelationaldatabases.

Foragraph,specificallyascatterplot,thinkoftheelementsasnodesor,dots.The
elementsforalinegrapharesimilarlyrepresentedbyvertices.Eachnodehaskeyvalu
e pairs and a label.Nodes are connected by relationships oredges.
Relationshipshaveatypeandadirection,andtheycanhaveproperties.Agraph
databaseissimplycomposedofdotesandlines.Thistypeofdatabaseissimpler
andmorepowerfulwhenthemeaningisintherelationshipsbetweenthedata.
Relationaldatabasescaneasilyhandledirectrelationships,butindirectrelationship
s aremoredifficulttodealwithinrelationaldatabases.

Figure1a

Whenbuildingarelationaldatabase,itisbuiltwithquestionsinmind.Whatkindsof
questionswillwebewantingtoanswer?Forexample,youwanttoknowhowmany
peoplewhoboughtatoaster,liveinKansas,haveacriminalrecord,anduseda
coupontobuythattoaster.Ifthedatabaseadministrator,orthepersonwhocreated
thedatabasedidnotanticipateaquestionlikethis,itmaybeverydifficulttoretrieve
thatinformationfrom arelationaldatabase.Forgraphdatabases,itispossibleto
answerunanticipatedquestions.Withagraph,youcanansweranyquestionaslong
asthatdataexistsandthereisapathbetweenthem.Agraphisdesignedto
traverseindirectrelationships.Withgraphdatabasesyoucanevenaddmore
relationshipsandstillmaintainperformance.Agraphdatabasetranscendsstoring
datapoints,rather,itstoresdatarelationships.Graphdatabasesstorerelationship
information.
Therearetwopropertiesofgraphdatabasesweshouldconsiderwheninvestigatingg
raph databasetechnologies:

Theunderlyingstorage
Somegraphdatabasesusenativegraphstoragethatisoptimizedanddesignedforstori
ng
andmanaginggraphs.Notallgraphdatabasetechnologiesusenativegraphstorage,
however.Someserializethegraphdataintoarelationaldatabase,anobject-oriented
database,orsomeothergeneral-purposedatastore.

Theprocessingengine
Somedefinitionsrequirethatagraphdatabaseuseindex-
freeadjacency,meaningthat
connectednodesphysically“point”toeachotherinthedatabase.Herewetakeaslight
ly broaderview:anydatabasethatfrom
theuser’sperspectivebehaveslikeagraphdatabase
(i.e.,exposesagraphdatamodelthroughCRUDoperations)qualifiesasagraphdata
base. We do acknowledge,however,the significantperformance advantages
ofindex-free
adjacency,andthereforeusethetermnativegraphprocessingtodescribegraphdatab
ases thatleverageindex-freeadjacency.

From adatabasepointofview,theconceptualtoolsdefiningaDB-Modelshould
addressatleastthestructuringanddescriptionofthedata,itsmaintainabilityand
theform toretrieveorquerythedata.Accordingtothesecriteria,aDB-Modelis
definedasacombinationofthreecomponents,firstacollectionofdatastructure
types,secondacollectionofoperatorsorinferencerulesandthirdacollectionof
generalintegrityrules.NotethatseveralproposalsofDB-Modelsdefineonlythe
datastructures,omittingsometimesoperatorsand/orintegrityrules.

Duetotheimportanceofmodelingconceptually,philosophicallyandinpractice,DB
Modelshavebecomeessentialabstractiontools.AmongthepurposesofaDB-
Model
are:Toolforspecifyingthekindsofdatapermissible;generaldesignmethodologyfor
databases;copingwithevolutionofdatabases;developmentoffamiliesofhighlevel
languagesforqueryanddatamanipulation;focusinDBMSarchitecture;vehiclefor
researchintothebehavioralpropertiesofalternativeorganizationsofdata.
Sincetheemergenceofdatabasemanagementsystems,therehasbeenanongoing
debateaboutwhattheDB-Modelforsuchasystem shouldbe.Theevolutionand
diversityofexistentDB-Modelsshowthatthereisnosilverbulletfordatamodeling.
Theparametersinfluencingtheirdevelopmentaremanifold,andamongthemost
importantwecanmentionthecharacteristicsorstructureofthedomaintobe
modeled,thetypeofintellectualtoolsthatappealstheuser,andofcourse,the
hardwareandsoftwareconstraintsimposed.Additionally,eachDB-
Modelproposal
isgroundedoncertaintheoreticaltools,andservesasbaseforthedevelopmentof
relatedmodels.
Figure1b:Evolutionofdatabasemodels.Rectanglesdenotemodels,arrowsindicateinfluences,and
circlesdenotetheoreticaldevelopments.Onthelefthandsideatimelineinyears.
DatabaseModelsEvolution–BriefHistoricalOverview
InthebeginningsofthedesignofDB-Models,physical(hardware)constraintswere
oneofthefundamentalparameterstobeconsidered.Beforetheadventofthe
relationalmodel,mostDB-Modelfocusedessentiallyinthespecificationofthe
structureofdatainactualfilesystems.Kerschbergetal.˜cite50130developeda
taxonomyofDB-Modelspriorto1976,comparingessentiallytheirmathematical
structuresandfoundation,andthelevelsofabstractionused.

TworepresentativeDB-Modelsarethehierarchical andnetworkmodels,which
emphasizethephysicallevel,andoffertheuserthemeanstonavigatethedatabase
attherecordlevel,thusprovidinglow leveloperationstoderivemoreabstract
structures.

TherelationalDB-ModelwasintroducedbyCoddandhighlightsthe
conceptoflevelofabstractionbyintroducingtheideaofseparation
betweenphysicalandlogicallevels.Itisbasedonthenotionsofsetsand
relations.Duetoitssimplicityofmodeling,itgainedawidepopularity
amongbusinessapplications.

SemanticDB-Modelsallowdatabasedesignerstorepresentobjects
andtheirrelationsinanaturalandclearmannertotheuser(asopposed
topreviousmodels).Theyintendedtoprovidetheuserwithtoolsthat
couldcapturefaithfullythesemanticsoftheinformationtobemodeled.
Awellknownexampleistheentityrelationshipmodel.

ObjectorientedDB-Modelsappearedintheeighties,whenmostofthe
researchwasconcernedwithsocalled“advancedsystemsfornewtypesof
applications.TheseDB-Modelsarebasedontheobjectorientedparadigm
andtheirgoalisrepresentingdataasacollectionofobjectsthatare
organizedinclassesandhavecomplexvaluesassociatedwiththem.

SemistructuredDB-Modelsaredesignedtomodeldatawithaflexible
structure,e.g.,documentsandWebpages.Semistructureddata(also called
unstructured data)is neitherraw norstrictly typed as in
conventionaldatabasesystems.Additionally,dataismixedwiththe
schema,afeaturewhichallowsextensibleexchangeofdata.TheseDBModelsappea
redintheninetiesandarecurrentlyinevolution.
TheXML(eXtendedMarkupLanguage)modeldidnotoriginateinthe
databasecommunity.Althoughoriginallyintroducedasastandardto
exchangeandmodeldocuments,soonitbecameageneralpurpose
model,withfocusoninformationwithtreelikestructure.Similarto
semistructuredmodel,schemeanddataaremixed.SeeSection2.3fora
moreindepthcomparisonamongthesemodels.

OtherModelsandFrameworks.ThereareotherimportantDB-Models
designedforparticularapplications,aswellasmodelingframeworksnot directly
focusing in database issues,which indirectly concern graph database
modeling. Among the DB-Models are Spatial databases, Geographical
Information Systems (GIS), Temporal DB-Models], MultidimensionalDB-
Models].Frameworksrelatedtoourtopic,butnot
directlyfocusingindatabaseissuesareSemanticNetworks.

GraphDatabaseModels–BriefHistoricalOverview

ThenotionofgraphDB-Modelmadeitsappearancealmostinparallelwiththe
objectorientedDB-
Models,asanalternativetothelimitationsoftraditionalDBModelsforcapturingthei
nherentgraphstructureofdataappearinginapplications
suchashypertextorgeographicdatabasesystems,wheretheinterconnectivityof
dataisanimportantaspect.

Activityaroundgraphdatabasesflourishedinthefirsthalfoftheninetiesandthenthe
topicalmostdisappeared.Thereasonsforthisdeclinearemanifold:thedatabase
communitymovedtowardsemistructureddata(aresearchtopicwhichdidnothave
linkstothegraphdatabaseworkinthenineties);theemergenceofXMLcapturedallth
e attentionoftheworkonhypertext;peopleworkingongraphdatabasesmovedto
particularapplicationslikespatialdata,web,documents;thetreelikestructureiseno
ugh
formostapplications.Figure2reflectsthisevolutionbymeansofpaperspublishedin
mainconferencesandjournals.

GraphDB-Modelsemergedwiththeobjectiveofmodelinginformationwhose
structureisagraph.Inanearlyapproach,RoussopoulosandMylopoulosfacingthe
failureofcurrent(atthetime)systemstotakeintoaccountthesemanticsofthe
database,proposedasemanticnetworktostoredataaboutthedatabase.An
implicitstructureofgraphsforthedataitselfwaspresentedintheFunctionalData
Model,whosegoalwastoprovidea“conceptuallynatural”databaseinterface.A
differentapproachproposedtheLogicalDataModel,whereanexplicitgraphDBMo
delintendedtogeneralizetherelational,hierarchicalandnetworkmodels.Years
laterKuniiproposedagraphDB-Modelforrepresentingcomplexstructuresof
knowledgecalledGBASE.

GraphDatamodeling
WhatisaGraphDataModel?

GraphDB-ModelisconceptualizedaccordingtothethreebasiccomponentsofaDB
-Model,namelydatastructures,transformationlanguage,andintegrityconstraints.
AgraphDB-Modelischaracterizedby:

Thedataand/ortheschemaarerepresentedbygraphs,orbydata structures
generalizing the notion of graph (hypergraphs,
hypernodes,hygraphs,etc.).Almosteverybodycoincideonthis
pointmoduloslightvariations.

Letusreviewdifferentwordingsofauthorsonthisissue.Theapproach
istomodelthedatabasedirectlyandentirelyasagraph[58].Agraph DB-
Modelisonewhosesingleunderlyingdatastructureisalabeled
directedgraph;thedatabaseconsistsofasingledigraph.Adatabase
schemainthismodelisadirectedgraph,whereleavesrepresentdata
andinternalnodesrepresentconnectionsbetweenthedata.Directed
labeledgraphsareusedastheformalism tospecifyandrepresent database
schemes,instances,and rules.The modelis basically
definedasalabeleddirectedgraph.Inthismodel,adatabaseis
describedintermsofalabeleddirectedgraphcalledschemagraph.A graphDB-
Modelformalizestherepresentationofthedatastructures
storedinthedatabasesasagraph.Theschemaaswellasthe
instanceofanobjectdatabaseisrepresentedbyagraph.Thenodesof
theinstancegraphrepresenttheobjectsofthedatabase.Database
instancesanddatabasesschemesaredescribedbycertaintypesof
labeledgraphs[68].Themodelfordataisorganizedasgraphs.
Labeledgraphsareusedtorepresentschemesandinstances.
Ontopofthesedescriptions,onecouldaddthefactthatsometimestheschema
andthedata(instances)aredifficulttodifferentiateinthesemodels,afactthat
resemblescloselysemistructuredmodels.Butinmostcasestheschemaandthe
instancesareseparated.

Data manipulation is expressed by graph transformations orby


operationswhosemainprimitivesaddressdirectlytypicalfeaturesof graphs,like
paths,neighborhoods,subgraphs,graph patterns,
connectivity,andstatisticsaboutgraphs(diameter,centrality,etc.). TheDB-
Modeldefinesaflexiblecollectionoftypeconstructorsand
operatorswhichcreateandaccessthegraphdatastructuresorin
otherterms,theapproachistoexpressallqueriesintermsofafew
powerfulgraphmanipulationprimitives.Theoperatorsofthelanguage
canbebasedonpatternmatching,i.e.findingofalloccurrencesofa
prototypicalpieceofaninstancegraph.

Theexistenceofintegrityconstraintsenforcingtheconsistencyofthe data,which
aredirectlyrelated to thegraph datastructure.For example,labelswith unique
names typing constraints on nodes
functionaldependencies,domainandrangeofproperties.

Summarizing,agraphDB-Modelisamodelwherethedatastructuresfortheschema
and/orinstancesaremodeledasa(labeled)
(directed)graph,orgeneralizationsofthe
graphdatastructure,wheredatamanipulationisexpressedbygraphorientedoperati
ons
andtypeconstructors,andhasintegrityconstraintsappropriateforthegraphstructure
.

WhyaGraphDataModel?

TheapplicationareasofgraphDB-
Modelmodelsarethosewereinformationaboutthe
interconnectivityorthetopologyofthedataismoreimportant,orasimportantas,the
dataitself.Thisisusuallyaccompaniedbythefactthatdataandrelationsamongdata
areatthesamelevel.Infact,introducinggraphsasamodelingtoolhasseveral
advantagesforthistypeofdata.
First,itleadstoamorenaturalmodeling:graphstructuresarevisibletotheuser.They
allow anaturalwayofhandlingdataappearinginapplications(e.g.hypertextor
geographicdatabases).Graphshaveanimportantadvantage:theycankeepallthe
informationaboutanentityinasinglenodeandshow relatedinformationbyarcs
connected to it.Graph objects(likepaths,neighborhoods)mayhavefirstorder
citizenship;auser
Typeof Abstract. Basedata Main Model level structure Focus Datacomplex. homogeneity.

Network physical point+rec. records simple/hom.


Relational logical relations data/attributes simple/hom.
Semantic user graphs schema/relations medium/hom.
Object logical/physicalobjects object/methods high/het.
Semistructurelogical tree data/components. medium/het.
Graph logical graph data/relations medium/het

Table1:Acoarsegranularitycomparativeviewamongdifferentgeneral
purposedatabasemodels.Theparametersare:abstractionlevel,base
datastructureused,whatarethetypesofinformationobjectstheDBModelfocusin,co
mplexityandhomogeneityofthedataitemsmodeled.

Second,queriescanreferdirectlytothisgraphstructure.Associatedwithgraphsare
specificgraphoperationsinthequerylanguagealgebra,suchasfindingshortest
paths,determiningcertainsubgraphs,andsoforth.Explicitgraphsandgraph
operationsallowausertoexpressaqueryataveryhighlevel.Tosomeextent,this
isincontrasttographmanipulationindeductivedatabases,whereoftenfairly
complexruleprogramsneedtobewritten..Lastbutnotleast,forpurposesof
browsingitmaybeconvenienttoforgettheschema.

Third,asfarasimplementationisconcerned,graphdatabasesmayprovidespecial
storagegraphstructuresfortherepresentationofgraphsandthemostefficient
graphalgorithmsavailableforrealizingspecificoperations.Althoughthedatamay
havesomestructure,thestructureisnotasrigid,regularorcompleteastraditional
DBMS.Itisnotimportanttorequirefullknowledgeofthestructuretoexpress
meaningfulqueries.Thesystem canuseefficientgraphalgorithmsdesignedto
utilizethespecialgraphdatastructures[58].

ComparisonwithotherDatabaseModels
InthissectionwecomparethemostinfluentialDB-ModelswithgraphDB-Models.
Table1presentsacoarsegranularityoverviewofthemostinfluentialmodels.Below
wepresentthedetails.

PhysicalDB-Models.Theywerethefirstonestoofferthepossibility
toorganizelargecollectionsofdata.Amongthemostimportantonesare the
hierarchicaland network models.These models lack good
abstractionlevelandareveryclosetophysicalimplementations.The
datastructuring isnotflexibleand notaptto modelnontraditional
applications.Forourdiscussiontheydonothavemuchrelevance.

RelationalDB-ModelwasintroducedbyCoddtohighlighttheconcept
oflevelofabstractionbyintroducingacleanseparationbetweenphysical
andlogicallevels.Graduallythefocusshiftedtomodelingdataasseen
byapplicationsandusers.Thisistheemphasisandtheachievementof
therelationalmodel,inatimewherethedomainofapplicationwere
basicallysimpledata(banks,payments,commercialandadministrative
applications).

The relationalmodelwas a landmark developmentbecause it


providedamathematicalbasistothedisciplineofdatamodeling.Itis
basedonthesimplenotionofrelation,whichtogetherwithitsassociated algebraand
logic,madetherelationalmodelaprimarymodelfor
databaseresearch.Inparticular,itsstandardqueryandtransformation
language,SQL,becameaparadigmaticlanguageforquerying.

ThedifferencesbetweengraphDB-ModelsandtherelationalDB-Model
aremanifold.Amongthemostrelevantonesare:therelationalmodelwas
directedtosimplerecordtypedatawithastructureknowninadvance
(airlinereservations,accounting,inventories,etc.).Theschemaisfixedand
extensibilityisadifficulttask.Integrationofdifferentschemesisnoteasy nor
automatizable. The query language does not support paths,
neighborhoodsandseveralothergraphoperations,likeconnectivity(an
exceptionistransitivity).Therearenoobjectsidentifiers,butvalues.

SemanticDB-Modelshavetheirorigininthenecessitytoprovidemore
expressivenessandincorporatearichersetofsemanticsintothedatabase from
theuserpointofview.Theyallow databasedesignerstorepresent
objectsandtheirrelationsinanaturalandclearmanner(similartothewaythe
userviewanapplication)byusinghighlevelabstractionconceptssuchas
aggregation,classificationandinstantiation,subandsuperclassing,attribute
inheritanceandhierarchies.Awellknownexampleistheentityrelationship
model.Ithasbecomeabasisfortheearlystagesofdatabasedesign,butdue
tolackofprecisenesscannotreplacemodelslikerelationalorObjectOriented.
OtherexamplesofsemanticDB-ModelsareIFO
andSDM.ForgraphDBModelsresearch,semanticDB-
Modelsarerelevantbecausetheyarebasedon
agraphlikestructurewhichhighlightstherelationsbetweentheentitiestobe
modeled.

Objectoriented(OO)DB-Models[75]appearedintheeighties,when the database


community realized thatthe relationalmodelwas
inadequatefordataintensivedomains(Knowledgebase,engineering
applications).OO databases were motivated bythe emergence of
nonconventionaldatabaseapplicationsconsistingofcomplexobjects systems
with many semantically interrelated components as in
CAD/CAM,computergraphicsorinformationretrieval.Accordingtothe
OOprogrammingparadigm onwhichtheyarebased,theirobjectiveis
representingdataasacollectionofobjectsthatareorganizedinclasses
andhavecomplexvaluesandmethodsassociatedwiththem.Although OO DB-
ModelspermitmuchricherstructuresthantherelationalDBModel,theystillrequiret
hatalldataconformtoapredefinedschema.

OO DB-ModelshavebeenrelatedtographDB-Modelsduetothe
explicitorimplicitgraphstructureintheirdefinitions.Nevertheless,there
remainimportantdifferencesrootedintheform thateachofthem
modelstheworld.OODB-Modelsviewtheworldasasetofcomplex
objectshavingcertainstate(data)andinteractingamongthem by
methods.Onthecontrary,graphDB-Models,viewtheworldasanetwork
ofrelations,emphasizing theinterconnection ofthedata,and the
propertiesoftheserelations.TheemphasisofOODB-Modelsisonthe
dynamicsoftheobjects,theirvaluesandmethods.Incontrast,graphDBModelsemph
asizestheinterconnectionwhilemaintainingthestructural
andsemanticcomplexityofthedata.

SemistructuredDB-Models.Theneedforsemistructureddata(alsocalled
unstructureddata)wasmotivatedby:theincreasedexistenceofunstructured
data,dataexchangeand,databrowsing.Insemistructureddatathestructureis
irregular,implicitandpartial;theschemadoesnotrestrictthedata,only
describesit,isverylargeandrapidlyevolving;theinformationassociatedwitha
schemaiscontainedwithinthedata(datacontainsdataanditsdescription,soit
isselfdescribing).AmongthemostrepresentativemodelsareOEM,Lorel,UnQL,
ACeDBandStrudel.Generally,semistructureddataisrepresentedbyatreelike
structure.Neverthelesscyclesbetweendataarepossible,establishinginthis
wayastructuralrelationwithgraphDB-Models.Someauthorscharacterize
semistructureddataasrooteddirectedconnectedgraphs.

GraphDataModelMotivationsandApplications

GraphDB-Modelsaremotivatedbyreallifeapplicationswhereinformationabout
interconnectivityofitspiecesisasalientfeature.Wewilldividetheseapplication
areasinClassicalandComplexnetworks.

Classical Applications. The applications that motivated the


introductionofthenotionofgraphdatabasesweremanifold:

Generalizations ofclassicalDB-Models.Classicalmodels were


criticizedfortheirlackofsemantics,theflatstructureofthedata
theyallow,thedifficultiesfortheuserto“see”theconnectivityof
thedata,andthedifficulttomodelcomplexobjects.

Onthesamedirection,theobservationthatgraphshavebeenintegral
partofthedatabasedesignprocessinsemanticandobjectorientedDB
-Models,broughttheideaofintroducingamodelinwhichboth,data
manipulationanddatarepresentationweregraphbased.

Limitations of expressive power of languages for complex


applicationsmotivatedalsothesearchformodelsthatresemble
morecloselysuchapplications.

Limitations(atthetime)ofknowledgerepresentationsystems,and
theneedforintricatebutflexibleknowledgerepresentationand
derivationtechniques.

TheneedforimprovingfunctionalitiesofobjectorientedDB-Models.In this
direction the application in mind were CASE,CAD,image
processing,andscientificdataanalysis.

Graphicalandvisualinterfaces,geographical,pictorialandmultimedia systems.

ApplicationswheredatacomplexityexceededtherelationalDB-Model
capabilitiesalsomotivatedgraphdatabases.Forinstance,managing
transportnetworks(train,plane,water,telecommunications),spatially
embeddednetworkslikehighway,publictransport.Severalofthese
applicationsarenowinthefieldofGeographicalinformationsystems
andspatialdatabases.

There are otherapplications who motivated graph DB-Models:


softwaresystems,integration.

ComplexNetworks.Severalareashavewitnesstheemergenceof huge
networksofdata which share some particularmathematical parameters, called
complex networks. The need for database
managementforsomeclassesofthesenetworkshasbeenrecently
highlighted.Althoughitisnotevidentyetiffrom thepointofviewof
databasesonecantreatthemasawhole,wewilldescribethemtogether
forpresentationpurposes.AfterthesurveyofNewman,wewillgroup them in four
categories:socialnetworks,information networks,
technologicalnetworksandbiologicalnetworks.Followingwedescribe
specificexamplesforeachofthem.

Insocialnetworks,nodesarepeopleandgroupswhilelinksshow
relationshipsorflowsbetweenthenodes.Someexamplesarefriendship,
businessrelationships,patternsofsexualcontacts,researchnetworks
(collaboration,co-authorship),communicationrecords(mail,telephone
calls,email),Computernetworks,Nationalsecurity.Thereisgrowing
activityintheareaofSocialNetworkanalysis,visualizationanddata
processinginsuchnetworks.

Ininformationnetworksoccurrelationssuchascitationsbetween
academicpapers,WorldWideWeb(hypertext,hypermedia),peertopeer
networks,relationsbetweenwordclassesinathesaurus,preference networks.
Intechnologicalnetworksthestructureismainlygovernedbyspaceand
geography.SomeexamplesareInternet(asnetworkofcomputers),Electric
powergrids,airlineroutes,telephonenetworks,deliverynetwork(postoffice).
TheareaofGeographicInformationSystems(GIS)istodaycoveringabigpart
ofthisarea(roads,railways,pedestriantraffic,rivers).

Biologicalnetworks represent biologicalinformation whose volume,


managementandanalysishasbecomeanissueduetotheautomationofthe
processofdatagathering.GoodexampleistheareaofGenomics,where
networksoccuringeneregulation,metabolicpathways,chemicalstructure,map
orderand homologyrelationshipsbetween species.There otherkindsof
biologicalnetworks,suchasfoodwebs,neuralnetworks,etc.Theareahasa
tremendousgrowth rate.The readercan consultdatabase proposalsfor
genomics,anoverview ofmodelsforbiochemicalpathways,atutorialon
GraphDataManagementforBiology,andamodelforChemistry.

Itisimportanttostressthatclassicalquerylanguagesofferlittlehelp
whendealingwiththetypeofqueryneededintheaboveareas.Asexamples,
dataprocessinginGISincludegeometricoperations(areaorboundary,
intersection,inclusions,etc),topologicaloperations(connectedness,paths,
neighbors,etc)andmetricoperations(distancebetweenentities,diameter
ofthenetwork,etc).Ingeneticregulatorynetworksexamplesofmeasures
areconnectedcomponents(interactionsbetweenproteins)anddegreesof
nearestneighbors(strongpaircorrelations).Insocialnetworks,distance,
neighborhoods,clusteringcoefficientofavertex,clusteringcoefficientofa
network,betweenness,sizeofgiantconnectedcomponents,sizedistribution
offiniteconnectedcomponents.SimilarproblemsariseintheSemanticWeb,
wherequeryingRDFdataincreasinglyneedsgraphfeatures.

RepresentativeGraphDatabaseModels

InthissectionwedescribeinsomedetailthemostrepresentativegraphDB-Models,
choosingthosethatdefineanduseexplicitlygraphstructuresorgeneralizationsof
them.Additionallywedescribeotherrelatedmodelsthatusegraphs,donotfit
properlyasgraphDB-Models.Inthem,graphsareused,forexample,fornavigation,
fordefiningviews,oraslanguagerepresentation.
Foreachproposal,wepresenttheirdatastructuresand,whenavailable,theirquery
languagesandintegrityconstraintrules.Ingeneral,therearefewimplementationsan
d nostandardbenchmarks,henceweavoidsurveyingthisissue.Togiveaflavorofthe
modelingineachproposal,wewillrunthefollowingexampleaboutatoygenealogy
showninFigure3.

Figure2:Agenealogydiagram(righthandside)representedastwotables(lefthand
side)NAMELASTNAMEandPERSONPARENT.
(Childreninheritthelastnameofthe fatherjustformodelingpurposes.)

Figtype3:LogicalDataModel.Theschema(ontheleft)usestwobasictypenodes
forrepresentingdatavalues(NandL),andtwoproducttypenodes(NLandPP)
toestablishrelationsbetweendatavaluesinarelationalstyle.Theinstance
(ontheright)isacollectionoftables,oneforeachnodeoftheschema.Notethat
internalnodesusepointers(names)tomakereferencetobasicandsetdata
datavaluesdefinedbyothernodes.

LogicalDataModel(LDM)

MotivatedbythelackofsemanticsintherelationalDB-
Model,KuperandVardiproposed aDB-
Modelthatgeneralizestherelational,hierarchicalandnetworkmodels.Themodel
describesmechanismstorestructuredata,alogicalquerylanguageandanalgebraic
querylanguage.

InLDM aschemaisanarbitrarydirectedgraphwhereeachnodehasoneofthe
followingtypes:TheBasictypedescribesanodethatcontainsthedatastored;the
CompositiontypeTEXdescribesanodethatcontainstupleswhosecomponents

aretakenfromthechildrenofit;theCollectiontypedescribesanodethatcontains
sets,whoseelementsaretakenfromchildrenofit.Summarizing,internalnodesare
oftype⊗ or⊛ representingstructureddata,terminalnodesareoftypeand
representatomicdata,andedgesrepresentconnectionsbetweendata.

Asecondversionofthemodel,besidesrenamingthenodes ⊗and
⊛ asproductandpowerrespectively,incorporatesanewtype,theUniontype∪ ,
intendedtorepresentacollectionwhosedomainistheunionofthedomainsofits
children(seeexampleinFigure4).

ALDMdatabaseinstanceconsistsofanassignmentofvaluestoeachnodeofthe
schema.Inthissense,theinstanceofanodeisasetofelementsfrom the
underlyingdomain(forbasictypenodes)andtuplesorsetstakenfromtheinstance
ofthenode’schildren(for⊗,⊛andtypes).

Withtheobjectiveofavoidingcyclicityattheinstancelevel,themodelproposestoke
ep
adistinctionbetweenmemorylocationsandtheircontent.Thus,instancesconsistofa
setoflvalues(theaddressspace),plusanrvalue(thedataspace)assignedtoeachof
them.Thesefeaturesallowtomodeltransitiverelationslikehierarchiesandgenealo
gies.
Overthisstructureafirstordermanysortedlanguageisdefined.Withthislanguage,
aquerylanguageandintegrityconstraintsaredefined.Finally,andalgebraic
language–equivalenttothelogicallanguage–isproposed,providingoperations
fornodeandrelationcreation,transformationandreductionofinstances,andother
operationslikeunion,differenceandprojection.

LDM isacompleteDB-Model(i.e.datastructuresplusquerylanguagesandintegrity
constraints)Themodelsupportsmodelingofcomplexrelations(e.g.hierarchies,
recursiverelations).Thenotionorvirtualrecords(pointerstophysicalrecords)pro
ves
usefultoavoidredundancyofdatabyallowingcyclicityattheschemaandinstancelev
el. Duetothefactthatthemodelisageneralizationofothermodels(liketherelational
model),theirtechniquesorpropertiescanbetranslatedintothegeneralizedmodel.A
relevantexampleisthedefinitionofintegrityconstraints.

Figure4:HypernodeModel.Theschema(left)definesapersonasacomplexobject
withthepropertiesnameandlastnameoftypestring,andparentoftypeperson
(recursivelydefined).Theinstance(ontheright)showstherelationsinthegenealogy
amongdifferentinstancesofperson.

Chapter2

GraphSchemas

Selectingvertexlabels

TheTinkerpoppropertygraphmodelcanbesummarizedasfollows.Agraphhasaset
of
verticesandasetofedges.Eachedgeconnectsanoutvertextoaninvertex.Verticesand
edgescanhavepropertieswhicharekeyvaluepairswithStringkeysandprettymucha
ny valuethattheunderlyingdatabasesupports.

Sofar,themodellooksschemalesssinceverticesandedgescan'tbedistinguishedfro
m otherverticesandedgeswithoutknowingwhatthepropertiesmean
.However,edgeshave
alwayshadlabels.AndwithTinkerpop3,verticeswillhavelabelsaswell.Thesamei
strue withNeo4J'slatestmajorversion.

Ifeveryvertexmust belabeled,whatisthecorrectmethodtoselectalabel?
Whatshoulda labelsayaboutavertexoranedge,fromtheapplication'sperspective?
Wethinkavertexlabelshouldrepresentthemostgranuartype
ofthevertex,whereeach "vertextype"
isassociatedwithaunquecombnaonof:
meaning(semantics),
setofpropertykeynamesandvaluetypes,and
setofoutgoingedgelabels,whereeachlabeltypeisannotatedwiththepossible
directionsoftheedge(in/out/both)andcardinality.

Whyso?
Becauselabelsrepresentingvertextypesgivetheapplicationthemostdetailed
informationaboutthe behaor ofthatvertex,therebyensuringthattheapplicationcan
processthevertexaccordingly.Inotherwords,oneshouldnotbeabletosubdivideav
ertex
typetogettwovertextypesthatbehavedifferentlyfromtheapplication'sstandpoint.

Examplesoflabelselection

Let'sgothroughthelabelselectionexercisewiththeclassic6vertextinkergraphshow
nin
thepropertygraphmodelpage.SincethisisaTinkerpop2stylegraph,itdoesn'thavev
ertex
labels.We'llnowtrytocomeupwiththevertexlabelsbysimplylookingatthevertexbe
havior.
Fgure1:TnkerGraphexampe

Ifyoulookclosely,therearetwotypesofvertices:oneswith'name'and'age',andones
with 'name'and'lang'.Letuslabe
theformervertextypeas'Person'andthelattervertextypeas
'Software'.Inotherwords,youhavepersonsnamed'marko','vadas','peter'and'josh'
and softwaresnamed'lop'and'ripple'.

Afteranalyzingtheedgelabelsanddirection,youcouldsaythatthe'Person'vertextyp
ehas:
Propertykeys'name'and'age'
Edgeslabeled'knows'intheOUTdirection
Edgeslabeled'created'intheOUTdirection
The'Software'vertextypehas:

Propertykeys'name'and'lang'
Edgeslabeled'created'intheINdirection
Now,anapplicationlookingatthisgraphautomaticallyknowswhattoexpectwhenitr
eadsa
vertexlabeled'Person'or'Software'.Wecandefinetwodifferentindexeson'name',o
nefor
PersonandoneforSoftware,tomakesurethatsoftwaresearchesdotpckuppeope
,or viceversa.

Thelabelselectionprocesscan'tbefullymechanicalthough.Forinstance,apersonw
ithno
friendscanbethoughtofasaseparatevertextype,becausetherearenoadjacent'know
s'
edgestosuchvertices.However,unlessthismakessenseinthecontextoftheapplicati
onor
thedatamodel,thereisnopointinsubdividingthe'Person'vertextypeas'Loner'and'P
erson
withFriends'.Thesameargumentgoesforsubdividingthepersonvertextypeasthe
Developer'and'NonDeveloper'basedonwhetherthatpersoncreatedasoftware.

Torecap,therightwaytoselectvertexlabelsforapropertygraphistofirstfigureoutthe
vertextypesandthebehaviorsofeachvertextype.Thetotityofthesebehaorssthe
graphschema.

Drawingagraphschema
Thebestwaytorepresentagraphschemais,ofcourse,agraph.Thisishowthe
graphschemalooksfortheclassicTinkerpopgraph.
Fgure2:Exampegraphschemashownasapropertygraph

Thegraphschemaisprettymuchapropertygraph.Theverticescorrespondtovertexty
pes,
andedgescorrespondtoedgetypes.Thepropertykeysarenamedaftertheallowed
propertykeysforthatvertextype.Everypropertyvalueintheschemagraphcontainsth
e nameofthemostspefcsuperass representingthecorrespondingpropertyvalues
oftheinstancegraph.Optionalpropertiescanhavea'?'aftertheclassname(notshown
here).

Edgepropertiesarelikevertexproperties,exceptthatthereisaspecialpropertyname
d'#' thatholdsthecarnaty from
theouttoinvertextypes.Commonsensedictatesthatthe
cardinalityisM:N,i.e.,many-to-
many,forboth'knows'and'created'.Onecouldbemisledto
thinkthatsomeoftheserelationshipsare1:Nbylookingatthe6vertexgraph.Thisisan
other reasonfornotfullyrelyingonreverseengineeringmethodstoderiveschemas.

WehavegonethroughasimilarexercisefortheGratefulDeadgraph.Asyoucansee,th
egraph
schemaisverysimple,althoughthevisualizationofthegraphshowninthelinklooks
complicated.

Fgure3:GratefulDeadgraphschema
OurthirdandfinalexampleistheschemafortheKennedyfamilytreegraph.Again,the
schemaisextremelysimple(simplisticgivenrecentUSSupremeCourtrulings).
Fgure4:Famlytreegraphschema

NotethatinthePixyschema,thepropertylistsarethesamefor'Man'and'Woman',butth
e
directionofthe'wife'edgeisfunctionallydependentonthevalueofthe'sex'property.
Thisis veryinterestingbecausethismeansthatgraphschemascoudbenormazed
usingruleslike relationaldatabases.Wewilldiscussthisinlatersections!

Summary

Thissectionintroducedtheideaofschemasforpropertygraphsanddescribedhow
the
schemaitselfcanberepresentedasapropertygraph.Furthermore,itdescribedameth
odto deve
thegraphschemaforanexistingpropertygraphbyfindingthemostgranulardivision
ofitsverticesintovertextypes.

Graphschemas(orschemagraphs)helpapplicationdevelopersbetterunderstandth
e graph'sstructure.
Inthenextsection,wewilllookattheproblem theotherwayaround.Canwe deve
agraph schemafrom
ahigherlevelconceptualmodelsuchasanEntityRelationshipmodel?Could
thisbeasystematicmethodtoselectvertexandedgelabels,andpropertykeyswhen
designingagraphdatabaseapplication?

Chapter3

ConvertingERmodelstographschemas

Thissectionwilldescribeageneralmethodtoconvertanentityrelationshipmodelto
a
propertygraphschema.Usingthismethod,adatabasedesignercandevelopERmode
lsusing
standardconceptualmodelingpractices,butstorethedatainagraphdatabaseinstead
ofa relationaldatabase.

ERmodelsanddiagrams

TheentityrelationshipmodelwasproposedbyPeterCheninhis1976papertitled"Th
e
EntityRelationshipModelTowardaUnifiedViewofData".Theideasinthispaperar
e taughtinmostdatabasecourses.ThiscoursepagegivesaquickdescriptionoftheER
model.

Conceptualmodelingisaparticularlyusefulexercisewhenembarkingonaprojectth
at involvesanewdomain.Thegoalofthisexerciseistoidentifykeyconceptsinthe
domainthatmustbecapturedinthedatamodel.Oneofthetechniquesinconceptual
modelingistolookatthenaturallanguagedescriptionofanapplication'srequirement
s.
Theserequirementscanbeanalyzedtoidentifytheentityandrelationshiptypes,using
Chen's"rulesofthumb"(quotedfromWikipedia):

Commonnoun Entitytype
Propernoun Entity
Transitiveverb Relationshiptype IntransitiveverbAttributetype
Adjective Attributeforentity Adverb Attributeforrelationship
Example
Letusconsiderthefollowingrequirements:

Modelasystem whereuserscreatepages,whichtheyown.Userscaninviteother
userstolookatcertainpagesthattheyown.Apagecanspecifyoneormoretags
whicharethenusedtorecommendothersectionstotheauthorsandinvitedreaders.
Youcouldanalyzethisrequirementandcomeupwiththreeentitytypes,viz.User,Page
and
Tag.TherelationshiptypesOwns,InvitesandTaggedAscapturetherelationships.N
otethat
allverbsdon'tbecomerelationships(likecreate).Similarly,thefactthatinvitationso
nly applytopagesthatauserownsislostinthismodel.

Fgure5:ExampeERdagram

Thesquareshapedboxesshow
entitytypes,whichrepresentsetsofsimilarentities.The
diamondshapedboxesshowrelationshiptypes,whichrepresentsetsofsimilarrelati
onships. Arelationshiptyperelatestwoormoreentitytypestoeachother.

Thediagram
showsthecardinalityofeachentity'scontributiontoarelationship,suchas1:N
(onetomany)orN:N(manytomany).Thecardinalityisspecifiedusingthe'lookacros
s'
method.Forexample,aUserownsNpages,andapageisownedby1user.Therearekn
own limitationsoflookacrosscardinalityforternaryrelationshipslikeInvites.

Thediagram
alsoshowssomeovalshapedattributes,likeusername.Theseattributesmustbe
assignedtoentityorrelationshipstypes.Attributesthatserveasexternalidentifiersm
ustbe underlined.

Now,itisarguablewhetherTagmustbeanentityornotinthefinaldatamodel.But
fromanERperspective,itmakessensetomodeltagasanentity,especiallyiftagsare
usedtoestablishrelationshipsacrossusersforrecommendations.

ProceduretoconvertanERmodeltoagraphschema

TheproceduretoconvertanERmodeltoarelationalmodeliswellknownanddiscuss
edin
thesameOSUcoursenotesthatwereferencedearlier.Wewillnowgothroughasimila
r proceduretheERdiagramwiththeaboveexample.

Rule#1:Entitytypesbecomevertextypes
EntitytypessuchasUser,PageandTagbecomevertextypes.
Thenameoftheentitytypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.

Notethatwearedrawingagraphschema,notagraphinstance.SotheUsertyperefersto
any
numberofusersinboththeERandthegraphschemarepresentation.Henceweusethet
erm
"vertextype"andnotvertex.Theentityrelationshipmodelusessimilartermssuchas"
entity types"(likeUser)andentities(likeJohnDoe,theuser).

Rule#2:Binaryrelationshiptypesbecomeedgetypes
AllbinaryrelationshiptypesintheERdiagram
canbeconvertedtoedgetypesinthegraph schema.
Thenameoftherelationshiptypebecomesthelabeloftheedgetype.
Theassociatedattributesbecomethepropertiesoftheedgetype.
Theendpointsoftheedgetypearethevertextypescorrespondingtotherelatedentity
types.Thedirectiondoesn'tmatter.

Hereisanexampleshowingthe"Owns"relationshiptypetranslatedtoan"owns"edg
etype:
Notethatonetomanyandmanytomanybinaryrelationshipscanbemodeledasedges
without
introducingnewvertices.Withrelationalmodels,youwouldneedanadditionaltable
tocapture manytomanyrelationships.

Fgure7:Ownsraonshpconvertedtoanownsedge

Aminorpointisthatthecardinalityiswrittenas1:NbecausetheUser(outvertextype)t
o
Page(invertextype)relationshipisa1:Nrelationship,usingthelookacrossmethod.I
nother
words,auserhasNpagesandapagehas1user.Ifthedirectionoftheedgewerereverse
d, thecardinalitywouldbeN:1.

Rule#3:Naryrelationshiptypesbecomevertextypes
Naryrelationshiptypesrelatemorethantwoentitytypes.Suchrelationshiptypesbec
ome vertextypesinthepropertygraphmodel.
Thenameoftherelationshiptypebecomesthelabelofthevertextype.
Theassociatedattributesbecomethepropertiesofthevertextype.

Thenewvertextypeincludesedgestothevertextypescorrespondingtotherelatedent
ity types(seeexample).Theseedgetypesarelabeledaftertheroleoftheparticipating
entityintherelationship.Thedirectiondoesn'tmatterforanyoftheseedges.

HereisanexampleshowingtheternaryrelationshipInvitestranslatedtothevertextyp
e Invitation:
ThecardinalityinthegraphschemaisN:1becausetheInvitationtoPagerelationshipi
sanN:1
relationship,usingthelookacrossmethod.Inotherwords,aninvitationcouldbeissue
dto1
page,andapage(invertex)couldbepartofNinvitations.Itispossibletojustreverses
omeof
theroletypes,likeinvitee,withoutaffectingtheoverallmodel.Inthatcasethecardinal
itywill be1:N.

Fgure8:IntesraonspconvertedtoanIntaonvertextype

Wehaven'tshowntheprocessforweakentitytypesandidentifyingrelationshiptypes
but
theseareexactlythesameasentitytypesandrelationshiptypes.Graphdatabasesare
more
forgivingthanrelationaldatabasesinthattheyallowtwoverticestohavethesamelab
eland
propertykeyvaluepairs.Thissimplifiesthetranslationofweakentitytypesandidenti
fying relationshiptypesintothepropertygraphmodel.

Conversionexample

HereisthegraphschemacorrespondingtotheexampleERdiagram.Asyoucansee,thi
s
diagramprovidesenoughinformationforanapplicationdevelopertoworkwiththeg
raph database.

Fgure9:GraphschemaforUserPageTagERdagram

Thisisthe"logicalmodel"fortheexampleconceptualmodelintroducedinthefirstfig
ure.We
cantweakthismodelfurtherbyrenamingthelabels,changingdirectionsoftheedges,a
ndso on.Thiswillbethetopicofthenextsection.

Verticesarevertices,andedgesare...edges

Naryrelationshipsareverycommoninconceptualmodels.Forexample,"Joebought
a
headphoneatTarget"isanexampleofa"Bought"relationshipthatrelatesaUsertoaPr
oduct
toaStore.Suchrelationshipsmustbemodeledasvertices,notedges(unlessyouareus
ing hypergraphs).Hencewethinkitismseadng
tothinkofedgesasrelationshipsandvertices asentities.

Itisbettertothinkofgraphsare vsuizeaberepresentaons ofaconceptualmodel.We


emphasizethevisualnatureofgraphsbecausedrawingandthinkingintermsgraphsis
easy.
Forinstance,yougototheWikipediaentryforhypergraphs,youwillseewhyvisualizi
ng hypergraphsisn'taseasyasvisualizing(binary)graphs.
Summary

This section showed thatitis possible to convertanyentityrelationship modelto


a
propertygraphschema.Inotherwords,adataarchitectcanusestandardmethodstom
odela domainasanERdiagram andthenfollow
thisproceduretoconvertittoapropertygraph
schema.ThistypeofatranslationisnotobviousforotherpopularNoSQLmodelslike
keyvalue storesanddocumentstores.

Chapter4

NormalizingGraphSchemas

Thissectionlooksathow graphschemascanbemanipulatedandtransformedto
equivalentgraphschemas.Thisissimilartothesplittingandmergingoftablesin
relationaldatamodels,typicallyperformedtonormalizeordenormalizearelational
schema.

Normalizationofrelationaldatabases

Thegoalofdatabasenormalizationismakesurethatrelationalschemasareeasytomo
dify,
easytoextend,informativetousersandsupportiveofvariousquerypatterns.Thevari
ous
normalforms,suchas1NF,2NF,andsoon,defineconstraintsthatatablemustsatisfyto
be
compliantwiththatnormalform.Althoughthedefinitionsofthenormalformscanbe
mathematical,thebasicideaisbreakuptableswithduplicateinformation.Hereisan
examplefromtheWikipediapageon3NF:
Thepreviousfigurebreaksupthetournamentwinnerstableintotwotables,onewithp
layer
detailsandonewiththetournamentdetails.Theactualruleson"functionaldependenc
ies"and
"nonprimeattributes"arehardtoremember,buttheprocessofsplittingandmergingta
bles
comesintuitivelywithexperience.Forexample,iftherewasanexistingtablewhichh
adone rowperplayer,we'dprobablymovethe"dateofbirth"tothattable.

Transformationrulesthatproduceequivalentschemas

Thissectionlistssometransformationrulesthatproduceequivalentgraphschemas.
Agraph schemaisequven
toanothergraphschemaifthedatastoredinoneschema,alongwith
theapplicationsthataccessit,canbeportedtotheotherschema,andviceversa.These
rules arelikesplittingandmergingtablesinrelationalmodels.
Thetransformationrulesinthissectioncanbemechanicallyappliedtoanyschema,an
dhas
nothingtodowithitssemantics.Byapplyingacombinationoftheserules,youcouldsi
mplify thesemanticsandimprovetheusabilityofyourgraphmodel.

RuleA:Renamingpropertiesandlabels
Thisruleconsistsofthreetransformationsthatresultinequivalentschemas:
Anyvertexlabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
vertexlabel.
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexistingedge
labelbetweentheoutandinvertextypes.
Anyvertex/edgepropertycanberenamedsolongthenewnamedoesn'trefertoan
existingpropertyofthevertex/edgetype.
Thefollowingfigureillustratessomeexampleapplicationsofthisruleonvertexande
dgelabels:

Fgure10:Renamngproperesandabes

Theschemashowninthetopisasimplegraphschemashowingfamilyrelationships.T
his schemaistransformedtotheschemashowninthebottom ofthefigureusingthe
followingtransformations:
VertexlabelsManandWomanarerenamedtoMaleandFemale.
Edgelabelsmother(2instances),father(2instances)arerenamedtoparent.

Eventhoughitseemslikesomeinformationislostbyrenamingmother/fathertoparent
,this
isn'ttruebecausethevertexlabelsattheendpoints(Male/Female)havethatinformati
on.This
sametransformationwouldn'tbesoobviouswhilelookingataninstanceofthisgraphl
ikethe Kennedyfamilytree.

Notethatyoucannotrename'wife'to'parent'inthebottomschema.Thisisbecausether
e alreadyexistsaparentedgetypefromMaletoFemale.
RuleB:Reversingedgedirections

Thisrulestatesthatanedgetypecanbereversedprovideditisaselfloop,orthereisnoe
dge
typewiththesamelabelinthereversedirection.Thecardinalityoftheedgetypeisreve
rsedas well.

Fgure 1:Reverngedgedrecons
Thefollowingfigureillustratesanexampletransformationusingthisruleandtheprev
ious one.Thetransformationinvolvesthefollowingsteps:
The'wife'edgeisrenamedto'husband'(ruleA)andthenreversed.
Eachparentedgeisrenamedto'son'or'daughter'andreversed.

Notethatthereversalisdoneinthegraphinstanceaswellastheschema.Inotherwords
, JFKJrparent>JFKSr.becomesJFKSr.son>JFKSr.
Youcouldalwaysrenamethefour'son'and'daughter'edgetypes,to'child'usingruleA
. Again,noinformationislostsincethevertexlabelsarestillunique.

Youwould,however,notbeabletorename'husband'to'son'.Youcouldrename'husba
nd'
to'daughter'(thoughabsurd).Theapplicationwillhavetointerpret"maledaughters"
as
husbands.Butafteryourenamehusbandtodaughter,youwouldnotbeabletoreverseit
s direction.

Asyoucanseealready,someapplicationsoftheserulesmaybequitehardtoderiveify
ouare thinkingintermsofgraphinstances,ratherthangraphschemas.
RuleC:Propertydisplacement

Fgure12:Propertydsacement

Thisrulestatesthatapropertyonanedgetypecanbemovedtoeitheradjacentvertexty
pe,
provideditslookacrosscardinalityis1.Thereverserulestatesthatapropertyinavert
extype
canbemovedtoanadjacentedgetypewithlookacrosscardinalityof1,providedthee
dge alwaysexistswhenthepropertyexists.

Theadjoiningfigureclarifiestherule,wherethe'dateOfBirth'propertyismovedtoth
e
'mother'relationshipbecausethereisexactlyonemotherrelationshipperMan/Wom
anandit
isdefinedwhenthedateOfBirthisdefined.Ifyourename'dateOfBirth'to'deliveryDa
te',one couldarguethatthepropertybelongsintheedgeandnotthevertex.

NotethataMan'sdateOfBirthcannotbedisplacedtothewiferelationshipbecausetha
t
wouldmeanthatthedataOfBirthcannotbestoredunlessthepersonismarried.Simila
rly,
thedateOfBirthintheedgetypelabeled'mother'fromMantoWomaninthebottomsche
ma, cannotbemovedtoWomanbecauseofthecardinalityrestrictionsintherule.

Usingthisrule,youcanmovethepropertiesaroundtheschematocomeupwithabetter
lookingdesign.Thisruleisalsousefulinsatisfyingindexingrequirementsofvarious
graph
databases.Forexample,ifagraphdatabaseonlysupportsindexesonvertexpropertie
s,you
couldmovesearchablepropertiesfromtheedgestovertices.Similarly,ifagraphdata
base
supportsvertexcentricindexesbasedonpropertiesonadjacentedges/vertices,youc
anuse thisruletobringtheindexedpropertyclosertothevertextypeofinterest.

RuleD:Specializationandgeneralization
Thisrulestatesthat:
AnyvertextypecanbedividedintotwodisjointvertextypesbasedonaBooleanteston
thepropertiesandadjacentedgelabelsofavertexbelongingtothattype.
AnyedgetypecanbedividedintotwodisjointedgetypesbasedonaBooleanteston
thepropertiesandadjacentvertexlabelsofanedgebelongingtothattype.
Fgure13:Generizaon
Inotherwords,ifweprovideabooleanfunctionthatcangiveaT/Fresultgivenavertex
/edge, wecanusethatfunctiontodivideavertex/edgetypeintotwodifferenttypes.
Thereverserulestatesthat:
Anyvertex/edgetypecanbemergedintoanothervertex/edgetypeprovidedthereisa
Booleantestthatcandistinguishitsvertices/edgesfromthemergedvertices.
Theadjoiningfigureshowsanexampletransformationinvolvingthefollowingsteps
:
MaleandFemalearegeneralizedasPerson,becausethebooleantest,sexequals'M',
candistinguishMalefromFemale.
Afterthat,sonanddaughteredgetypesaregeneralizedaschildbecausethebooleante
st, sexofinvertexequals'M',candistinguishsonfromdaughter.

Thisruleisusefulinincreasingthespecificity,orreducingthecomplexityofthegraph
schema.
Asageneralprinciple,itisbettertousethisruleforspecialization,we.e.,increasingth
e
specificity,becausethatallowsthedifferentvertexandedgetypestoembracediffere
nt
behaviorintermsofpropertiesandadjacentedges.However,thereareinstanceswhe
rethe
differencesbetweenthevertextypesaresominorthatspecializationonlyresultsinap
plication
complexity.ThisargumentcouldapplytotheabovegeneralizationofMaleandFemal
eto Person.

RuleE:Edgepromotion

Fgure14:Edgepromoon

Thisrulestatesthatanedgetypecanbe promoted
toavertextypebyaddingtwo"out"edge
typestotheendpoints.Thepropertiesofthevertextypebecomepropertiesoftheedget
ype.
ThecardinalityofthenewedgetypesareN:1or1:1dependingonthelookacrosscardi
nalityof theoriginalendpointvertex'stype.

Notethatthedirectionofthenewedgetypescanbechangedusingontherenameand
reverserulesresp.Weonlymentionthe"out"directiontosimplifythewayinwhich
cardinalityforthenewedgestypesisderived.

Theadjoiningfigureshowsthehusbandedgepromotedtoavertextypecalled'Marria
ge'. Theedgetypes'husband'and'wife'pointtothetwoendpointsofthevertextype.
TheedgepromotionruleisusefulinapreparingbinaryrelationshiptobecomeanNary
relationship.

Thereverserulestatesthatanyvertextypewithtwopropertylessedgetypes,withsam
eside cardinalityofexactly1,canbedemoted
toanedgebetweentheadjacentvertices.Thisprocess
isusefultosimplifyschemas.Youcanusethepropertydisplacementrule(ruleC)tom
ove propertiesoutofedges.

RuleF:Propertypromotion

Fgure15:Propertypromoon

Thisrulestatesthatanygroupofpropertiescanbepromotedtoanewvertextypewithth
ose
properties,providedthenewvertextypehasedgesconnectingittoallexistingvertext
ypesthat
includethepropertygroup.Thesamesidecardinalityofthenewedgetypeis1.
Theadjoiningfigureshowsthe'sex'propertyconvertedtoanewvertextype.Thisvert
extype
willhaveexactlytwonodescorrespondingtomaleandfemale.Soinotherwords,eve
ryperson
inthenewgraphwillhaveanoutgoing'isa'edgetooneofthetwonewvertices.

Thisruleisequivalenttothesplittingofarelationintotworelations,asshowninthefirs
t
figureofthissection.Anygroupofproperties,typicallyonesthatrepeat,canbepromo
tedtoa vertex.

Whileapplyingthisrule,itisbettertoincludeallvertextypesthathavethesamegroupo
f
properties.Forexample,ifthereisa'sex'propertyinadifferentAnimaltype,itisbetter
to
pointthattothenewSexvertextypeaswell.Ifyouhaveedgetypeswiththepropertygro
up, youcanfirstpromotethoseedgetypestovertices.

Thereverseofthisruleisthatavertextypethathaspropertylessedgetypeswithsamesi
de
cardinalityof1,canbedemotedtothegroupofpropertiesthatitholds.Thesepropertie
smust
beaddedtoeveryadjacentvertextype.Thisistheequivalentofthedenormalizationof
atable
intherelationalmodel,whichisusefultoreducethenumberofjoins(ortraversalsinth
ecase ofgraphdatabases).
RuleG:Propertyexpansion

Fgure16:Propertyexpanson

Thisrulestatesthatapropertyofavertextypethatrepresentsalistofvaluescanbemov
ed
toaseparatevertextypewhichstoreseachvalue.Thenewvertextypemusthavean"in
"
edgetypefromtheexistingvertextypewithcardinality1:N.Theadjoiningfiguresho
wsthis ruleappliedtothenicknamepropertywhichholdsalistofStrings.
Thereverserulestatesthatanyvertextypewithexactlyonepropertylessedgetypewit
h
lookacrosscardinalityofexactly1canberemovedaftermovingitspropertiestoalisti
nthe adjacentvertextype.

Thisistheequivalentof1NFintherelationalmodel.Unlikerelationaldatabases,how
ever,many
graphdatabasessupportlistsasavalidtypeforpropertyvalues.Sothechoiceofstorin
g nicknamesasaListoraseparatevertextypeisuptothedesigner.
Summary

Rulebasedschematransformationsaretoolsthatadatamodeldesignercanusetorew
ritea
graphschema,withoutlosinganyinformationintheprocess.Inotherwords,adatamo
del
designercanusetheserulestoselectthedirectionsofedges,thenamesofdifferentlabe
ls
andkeys,thelocationsofvariousproperties,andsoon.Thesechangesdon'tmatterfro
man
pureinformationperspective,butcouldmakeabigdifferenceintheusabilityandeffic
iency.
Inthatsense,adatamodeldesignercangobacktoCodd'soriginalgoalsfornormalizat
ion designingschemasthatareeasytomodify,easytoextend,informativetousersand
supportiveofvariousquerypatterns.

Chapter5

One metare fornormalization

Theprevioussectionlistedsevenrulebasedschematransformationssuchasrenamin
glabels,
reversingedges,promotingedgesandpropertiestovertices,andsoon.Suchrulebase
d transformationscanbemechanicallyappliedto
anygraphschema,withoutlosingany
informationintheprocess.Usingtheserules,agraphdatabasedesignercanstartwith
a designgeneratedfromanentityrelationshipmodelandtweakittogetafinaldesign.

Thissectiondescribesasingle metare from whichthesevenpreviouslydescribed


rulescanbederived.Italsoformalizessomeoftheideaspresentedintheprevious
sectionsusingsettheory.
Schemasandconstraints

Fgure17:Exampegraphschema
Theabovefigureshowsanexamplegraphschemadescribingconstraintsonthegraph
data modelsuchas:

Whatarethelegallabelsforvertices?
Whatarethelegaledgelabelsbetweentwovertextypes?
Whatarethelegalpropertykeysandvaluetypesateachedgeorvertextype?

Thereality,however,isthatagraphmodelcouldhaveotherconstraintsthataren'texpr
essed
intheschema.Forexample,the'inviter'edgeineveryInvitationmustbetotheUserwh
ohas
an'owns'edgetothe'page'edgeoftheInvitation.Thisconstraintisn'tcapturedintheab
ove schema.

Thequestionis:Howcanwemodelcompexconstrntsnagraphmode?
Graphuniverses,transformationsandequivalence
Agraphunverse
Uisasetofgraphs,typicallyaninfiniteset.Agraphuniverserepresentsa
datamodelinthesensethatitcaptureseveryvalidgraphthatbelongstothedatamodel.

AgraphuniverseUis compabe withagraphschemaS,ifeverygraphintheuniverseis


compatiblewithS.Inotherwords,althoughthegraphuniverseisaprecisedescriptio
nofthe
model,itcanstillbeunderstoodasarefinementofamorelooselydefinedgraphschem
a.

Redenngequvenceusngtransformaonfuncons

Fgure18:Annverbefuncon
AgraphtransformationTisafunctionthattakesgraphsfromoneuniverseUto
anotheruniverseV.Inshort,T:U→ V.

AuniverseUisequivalenttoauniverseVifthereisatransformationfunctionT:U →
V,
whereTisinvertible.Invertibleandbijectivearetermstocharacterizefunctionsthat
establishaonetoonecorrespondencebetweentwosets,whichinthiscasearegraph
universes.

Inotherwords,givenanygraphG∈U,wecanuseT(G)togetagraphG'∈V.Thenwecan
usethe inversefunctionT1
(G')togetbackG.Henceestablishingequivalenceofthetwouniverses.
Aprogrammngperspecve

Ifweareupgradingfrom onegraphmodeltoanother,thetransformationfunctionisthe
upgradescp thatwewouldimplementtomovetothenewmodel.Ifwecanalsowritea
downgradescp
,thenwehavetwoequivalentmodels(oruniverses).Inotherwords,two
graphmodels,representedasuniversesorschemas,areequivalentiftheyareforwar
dand backwardcompabe .

Derivedtypes

ConsideragraphuniverseUthatiscompatiblewithaschemaS.AvertextypeinScanb
ecalleda devedvertextypenU
,ifeverygraphG∈Uissuchthatitsvertices(andadjacentedges)belongingto
thevertextypecanbecalculatedfromtherestofthegraph.

Inotherwords,givenanygraphintheuniverseU,afterweremoveallverticescorresp
ondingtothe
derivedvertextype,thereshouldbeawaytocalculatethoseverticesagain.Derivede
dgeand
propertytypescanbedefinedsimilarly.Notethatallderivedelementtypesaredefine
dingraph
schemas,butarespecifictographuniversesthatarecompatiblewiththatschema.

Metarule:Addingandremovingderivedtypes
Finally,hereisthemetarulebehindallschematransformations:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanaddaderivedvertex/edge/propertytypetoproduce
an equivalentgraphuniverseVcompatiblewiththeschemaS∪{derivedtype}.

Thereverserulestatesthat:
GivenanygraphuniverseUcompatiblewithaschemaS,wecanremoveaderived
vertex/edge/propertytypetoproduceanequivalentgraphuniverseVcompatible
withtheschemaS{derivedtype}.
Fgure19:Modfedgraphschema

The'invitee'edgetypeinthegraphschemashowninthefirstfigureisaderivededgetyp
e. Thisisbecausethe'invitee'edgescanbecalculatedbygoingfrom
theInvitationverticesto
thePageandbacktotheuserthrough'owns'edge(reversedirection).Wecansimplifyt
he
originalschematotheversionshownintheadjoiningfigurebyapplyingtheserules:

(Metarule)Removederivededgetype'invitee'
(Edgepromotion)DemotethebinaryrelationshipInvitationtoanedgecalled'invited
'.
Asyoucansee,theupdatedschemaissimplerthantheoriginalschemaderivedfrom
anER diagram.
Provingthemetarule

Themetaruleiseasyto provebecauseofthewayderivedtypesaredefined.The
transformationfunctiontoremoveaderivedtypesimplyremovesallelementsthatbel
ong tothattype.Theinversefunctioncalculatesthederivedtypesfrom
theremaininggraph.
Hencetheuniversewiththederivedtypeisequivalenttotheuniversewithoutit.

Provingthe7rules:Renaming,Reversing,PropertyDisplacement,...
Considertheexampleofrenaminganedgetype.Thisrulewasstatedinthelastsectiona
s:
Anyedgelabelcanberenamed,solongasthenewnamedoesn'trefertoanexisting
edgelabelbetweentheoutandinvertextypes.
Wecanprovethisintwosteps:
Addderivededgetypewiththenewnameasacopyoftheoldedgetype.
Removetheoldedgetypewhichisnowderivablefromthenewedgetype.

Ofcourse,step1requiresthattheedgetypewiththenewnamedoesn'talreadyexistinth
e
schema.Otherwise,alledgesoftheedgetypecan'tbederived.Hencethecondition"a
slong asthenewnamedoesn'trefertoanexistingedgelabel."

Inthismanner,wecanproveeachrulebyperformingsomestepstofirstaddnewderive
d typesandthenremovetheexistingtypeswhichbecomederivedtypesthemselves.
Beyondtransformationrules

Thinkingintermsofgraphuniverses,derivedtypesandtransformationfunctionslets
usdo
moreradicaltransformationstoourgraphmodel.Themannerwithwhichweapplyth
eserules ortransformationsdependsonouroverallstrategyfordatamodeling.

Onestrategyistominimizethenumberofimplicitconstraintsnotcapturedbythesche
ma.For
instance,theschemashowninthesecondfiguredoesn'thavetheimplicitconstrainton
the'invitee'
edgetypeshowninthefirstfigure.Generally,fewerimplicitconstraintsmeanslessdu
plicationof
dataandlesschanceofbugswhileupdatingthedatabase.Thisissimilartonormalizati
onin relationaldatabases.

Adifferentstrategyistotunethegraphforitsspecificqueryingneeds.Suchapproache
shave
beenpopularizedby"denormalization"techniquessuchasdimensionalmodeling.F
orinstance, wecouldadda"shortcut"derivededgetypecalled'latest'from
UsertoPagetoshowthelast
createdpageforeachuser.Theimportantthingthenistoensurethatanychangetotheres
tof
thegraphisaccuratelyreflectedinthederivedelementtypes.Thecodethatoperateso
nthe graphmustbedesignedwiththeseconstraintsinmind.

Summary
Thissectionintroducedsettheoreticrepresentationsofgraphmodelscalledgraphun
iverses,
whicharemorepowerfulthangraphschemas.Secondly,thissectionshowedthattwo
graph
universesareequivalentifthereisaninvertiblegraphtransformationfunctionbetwe
enthem.
Finally,thissectionshowedthatallschematransformationrulespresentedintheearli
er
sectioncanbederivedfromonemetarulethatdealswithaddingandremovingderived
types.

Validatinggraphschemas

Thelastfew sectionshavediscussedhow
propertygraphschemascanhelpdesigngraph databasesfrom
ERmodelsandrefinethedatamodelthroughschemamanipulations.After
readingthisthreadontheGremlinusersgroup,werealizedthatitiseasytovalidategra
phs againstschemaswithGremlinandGroovy.

Fgure20:Tnkergraphschema
ThisgistonGithubshowshowyoucantakeaninstancegraphandchecktoseeifitiscom
patible
withaschemagraph.Theschemagraphhasverticesandedgescorrespondingtoverte
xandedge
types.Here'sthecodetocreateaschemagraphinsideaGremlinshellfortheclassicTi
nkerpop schemashownhere:

sg=newTinkerGraph()
person=sg.addVertex()
person.setProperty('_label','person')
person.setProperty('name','java.lang.String')
person.setProperty('age','java.lang.Integer')
software=sg.addVertex()
software.setProperty('_label','software')
software.setProperty('name','java.lang.String')
software.setProperty('lang','java.lang.String')
knows=person.addEdge('knows',person)
knows.setProperty('weight','java.lang.Float')
created=person.addEdge('created',software)
created.setProperty('weight','java.lang.Float')
created.setProperty('_minIn',1)//Someonemustcreatethesoftware

ThepropertieshavevaluescorrespondingtotheJavaClassofthepropertyvaluesinth
e
instancegraph.Thepropertykeyscanendwith'?'toindicatethepropertyisoptional.T
he
edgesintheschemagraphcanhave4specialproperties,viz._minIn,_maxIn,_minOu
tand _maxOuttoindicatecardinalityrestrictionsforvariousedgetypes.

Anyinstancegraph,g,canbevalidatedagainsttheschemastoredinsg,usingtheGreml
in script:
g.V.filter({checkVertex(it,sg)})
YoucanlookatthefullGithubgisttoseehowthevalidationisdone.
ThecurrentversionofTinkerpopdoesn'tsupportvertexlabels.Sothemappingfrom
the vertextothevertextypeisspecifictothegraph,likethis:
vertexType={v,sg>.age?
sg.V('_label','person').next():sg.V('_label','software').next()}
Mostgraphschemastypicallyhaveapropertynamed'type'thatwouldmakethismapp
ing easier.
HoweverwithTinkerpop3,thismethodcanbestandardizedtousethelabel:
vertexType={v,sg>sg.V('label',v.label).next()}

Pixy:Firstorderlogicongraphdatabases

TheprevioussectionshaveshownthatanyERmodelcanbeconvertedtoapropertygr
aph
schema,andthattheschemacanbenormalizedusingrules.However,onekeyquestio
n remains:

Dographdatabasesofferthesamequerngcapalitesasraonaldatabases?
Inotherwords,anydatathatfitsinanERmodelcanbestuffedintoagraphdatabase.But
candatastuffedinthisfashionbequeriedeffectively?
Thisisthesubjectofthissection.
Background
OnSQL

SQListhequerystandardforrelationaldatabases.Itfirstappearedinthe1970sandw
as
standardizedinthe80sand90s.ThetheoreticalfoundationofSQLisrelationalalgebr
a.Codd
showedthatrelationalalgebraisequivalenttorelationalcalculus,aformoffirstorde
rlogic.His theoremisthebedrockofSQL'sexpressivepower.

Onfirstorderlogic

Usingrelationalalgebra,wecanwriteanyqueryoftheform"Findallrowsfromtables
A,B,C,..., matching somepredcat
",aslongasthepredicatecanbeexpressedinfirstorderlogic.
Specifically,thepredicateisformedusing:

variouscomparisonsonrowsandcolumns, logicaloperations"and"(∧),"or"
(∨)and"not"(¬),and
theuniversal"forevery"(∀)andexistential"thereexists"quantifiers(∃)thatop erateonrowsofagiventable.

Let'sconsidertablesnamedperson,carandticket.Wecouldexpressaquerylike"find
me peoplewhoownonlyBMW
cars,buthaveatleastonespeedingticket".Thepredicatecanbe writtenas:
my_query(person)=
(∀car,personownscar∧car.make='BMW')∧(∃ticket,personhasticket)

OnGremlin
Gremlinisastandardgraphtraversallanguage.ItispartoftheTinkerpopstackandwo
rks
acrossallBlueprintscompatibledatabases.YoucanreadmoreaboutGremlinhere:

GremlinWikionGithub
GremlinDocs
ThePathologicalGremlin(presentation)

Gremlinisgreatforstepbasedqueries.Fore.g.,somethinglike"findthefriendofafrie
ndof
vertexv"canbewrittenasv.out('friend').out('friend').Thisstyleoftraversalwithver
ticesand edgesisn'tnaturalinSQLwithtuples.

ThedeclarativequeryingstyleofSQLis,however,differentfrom
Gremlin.TheSQL2Gremlin
tutorialgoesthroughsomeexamples.Butyoucanseethatthetranslationisn'tobvious.
Pixy:FirstorderlogicwithGremlin

Pixyisabridgefrom
firstorderlogictoGremlin.ThefirstorderlogicofPixyoperateson
verticesandedges.Wecanaskquestionslike"Findverticesandedgesthatmatchsom
e precat "wherethepredicateisformedby

variouscomparisonsonvertexandedgeproperties, logicaloperations"and"
(∧),"or"(∨)and"not"(¬),and
theuniversal"forevery"
(∀)andexistential"thereexists"quantifiers(∃)thatoperateon verticesandedges.
PixyqueriesareexpressedusingPrologrules,notSQL.RulesinPrologareexpressed
asHorn clauses.
ProloglikeSQLhasthefullexpressivepoweroffirstorderlogic.
Let'stakethepredicatefromtheearlierdiscussion,
my_query(person)=
(∀car,personownscar∧car.make='BMW')∧(∃ticket,personhasticket)
Let'ssaythatwerepresentpeopleasverticeswithoutgoingedgetypesnamed'car'and
'ticket'
toverticesrepresentingcarsandtickets.Now,wecouldexpresstheabovepredicateu
sing Hornclausesasfollows:

my_query(Person,Ticket):out(Person,'ticket',Ticket),
not(not_all_bmw(Person)).
not_all_bmw(Person):out(Person,'car',Car),
property(Car,'make',Make),
Make<>'BMW'.
NotethatoutandpropertyarepredefinedpredicatesinPixy.Youcanseethatthe
∃partofthequeryiseasy.Thisisamatteroffindinga
ticket.The∀partofthequeryisimplementedusingtwonots.Inotherwords,saying"everycarisaBMW"isthesamea
ssaying"thereisno carthatisn'taBMW".

ERmodelsinPixy

IfyouuseanERmodelasastartingpointforyourdesign,youcanreconstitutetheERmo
del from
thefinalgraphschemausingPixy.ConsiderthepreviouslyreferencedERmodelwith
entitiesnamedUser,PageandTagandrelationshipsnamedOwns,InvitesandTagged
As.
Fgure21:ERmodelforUserPageTagappcaon
Thiswastranslatedtoagraphschemawithfourtypesofvertices,viz.User,Page,Taga
nd Invitation.

Fgure
2:GraphschemaforUserPageTagappcaon
Now,wecanreconstitutetheERmodelfrom
thegraphschemausingPixywiththefollowing clauses:

%Entities
user(User,Name,Login):property(User,'name',Name),property(User,'login',Log
in). page(Page,Uri,Html,CreateTs):property(Page,'uri',Uri),...
tag(Tag,Hashtag,Description):property(Tag,'hashtag',Hashtag),...
%Relationships
owns(User,Page):out(User,'owns',Page).
taggedAs(Page,Tag):out(Page,'taggedas',Tag).

invites(Invitation,Inviter,Invitee,Page):
out(Invitation,'invitee',Invitee),
out(Invitation,'inviter',Inviter),
out(Invitation,'page',Page).

Everypredicatecorrespondstoanentityorarelationship.Thepredicateoperateson
vertices,
edgesandpropertiesthatbelongtothegraphschema.Now,yougetthefullpowerof
firstorderlogicontheERmodel.Inotherwords,anyfirstorderpredicatethatappliest
o entitiesandrelationshipscanbewrittenasaPixyquerythatusestheaboveclauses.

Let'stakeanexamplepredicatethatmatchesallusersinvitedtopagestagged'tinkerpo
p' createdin2014.Youcouldexpressthisasfollows:
tinkerpop_invitee(User,Page):invites(_,_,User,Page),
page(Page,_,_,CreateTs),
CreateTs>1388534400L,%Unixtimestampfor1/1/2014
taggedAs(Page,Tag),
tag(Tag,'tinkerpop').
Notethat'_'isusedtorepresentanonymousvariables.
Queryrequirementsdon'tusuallymatterwhilemodeling

Itisn'tsurprisingthatqueriesinfirstorderlogiccanbecompiledtoGremlin,sinceGre
mlinis
Turingcomplete.ThesurprisingthingisthatPixyconvertsanyfirstorderlogicqueryo
nan
ERmodeltosomethingthatexecutes"efficiently"onthecorrespondinggraphdataba
se.

By"efficiently",wemeanthatthePixy/Gremlinquerywillalwaystraverseedgestog
ofrom
oneentity/relationshiptoanother.Edgetraversaloperationsingraphdatabasesare
typicallyordersofmagnitudefasterthanindexbasedjoinsinrelationaldatabases.
Queriesonproperties,willofcourse,needindexesforefficientquerying.Butaslong
asyour
startingERmodelisaccurate,yourapplicationwillnothavetosimulatejoinsusingth
ese
propertyindexes.Inthatsense,thegraphschemadesignisindependentofthequery
requirements.

PartII
Chapter8:IntroductiontoDatabase

DatabaseSystemsevolution:Databasesanddatabasetechnologyarevitalto
modernorganizationssupportingboththedailyoperationsanddecisionmaking.
Databasetechnologyhasundergoneremarkableevolutionover50years.Despite
dominancetotheenterpriseDBMSmarketplacebyOracle,theindustryremains
highlycompetitivewithacontinuedhighlevelofinnovation[12].
Figure1:Evolutionofdatabasetechnology
Majorperiodsofdatabasetechnologyevolution[12]:

1stGeneration(1960’s):Fileoriented–Supportedsequentialandrandom
searchingoffiles,buttheuserwasrequiredtowritecomputerprogramsto
accessdata.Thedatabasesoftwareindustryhadlittleornostandards
duringthisperiod.

2ndGeneration(1970’s):Navigational–Couldmanagemultipleentitytypes
andrelationships.Computerprogram stillhastobewritten.Progresson standards.
3rdGeneration(1980’s):Relationalwithnon-proceduralaccess–Foundation
based on mathematical relations and associated operators. Optimization
technology was developed.IBM performed pioneering researchtoenablecom-
mercializationofrelationaldatabasetechnology.

4thGeneration(1990’s+):Objectoriented–Areextendingthebound-aries
ofdatabasetechnology.New kindsofdistributedprocessinganddata
warehouseprocessing.Canstoreandmanipulateunconventionaldata
types.ConvenientwaystopublishstaticanddynamicWebdata.
DBMSmarketplace:DespitedominancetotheenterpriseDBMSmarketplaceby
Oracle,withmorethan40% overallmarketshare,theindustryremainshighly
competitivewithacontinuedhighlevelofinnovation.Insomeenvironments,its
competitionisMicrosoftSQLServer,IBM DB2,Teradata,SAP Sybase.Open
sourceDBMSproductshavebeguntochallengethecommercialDBMSproducts
atthelow-endoftheenterpriseDBMSmarketplace.Thecategoryofopen-source
DBMSisleadedbyMySQL,followedbyMongoDB,PostgreSQLandMariaDB.Int
he desktopDBMSmarket,MicrosoftAccessdominatesbecauseofthedominanceof
MicrosoftOffice.

Figure2:DBMSmarketplace
Innovationintheindustry:TheadvancesinDBMSinrecentyearssupportbusiness
intelligenceprocessingfordataintegrationandusageofsummarydata.NoSQL
technologyhasbeendevelopedtosupporttheneedsofBigData,tobemodernwebsca
le databases.Since 2009,the mostaccepted definition ofNoSQL isnext
generationdatabasesbeingnon-relational,distributed,open-sourceandhorizon-
tally scalable.Othercharacteristicsthatusuallyapplyareschema-
free,scalability,global
availability,easyreplicationsupport,simpleAPI,eventuallyconsistent/BASE(not
ACID),andlargescaledata.[5][19]

TypesofDBMS

RankingInthissectionweobserverankingscreatedbyDB-Engines.DB-Enginesis
aninitiativethatprovidesinformationonthepopularityoftheDBMSavailablein
themarket.TheymakeavailabledifferentrankingsforeveryDBMStype,whichare
updatedmonthly.
Figure3:DBMSdevelopedbydatabasemodelpiechart

Overthoselines,apiechartrepresentsthecategoriesofDBMSthatcomprisemore
systemsdeveloped.ThedatabasemodelmoreelaborateistheRelationalDBMS,wh
ere 137systemsfallunderthiscategory.ItisfollowedbyKey-
valuestores,with63systems,
Documentstores,with43systems,andGraphDBMS,with27systems.
Intheoverallclassificationofdatabasemodels,thoseDBMStypesaredistinguished
. TypesofDBMS:
RelationalDBMS GraphDBMS
Key-valuestores TimeSeriesDBMS
Documentstores RDFstores
ObjectorientedDBMS(Atkinson) NativeXMLDBMS
Searchengines Contentstores
MultivalueDBMS EventStores
Widecolumnstores NavigationalDBMS

Abovetheselines,the14moredevelopeddatabasemodelshavebeenlisted.If
insteadofcountingthesystemsdeveloped,thedatabasemodelsarerankedbypop
-ularity,thelistofmodelstobeconsideredshrinks.Mostoftheusersworkon
relationalDBMS,the79.5%,followedbydocumentstores,7.3%,searchengines,
4.3%,key-valuestores,3.5%,widecolumnstores,3.1%,andgraphDBMS,1.1%.
Belowtheselinesapiechartrepresentsthemostrecentpopularityrank.
Figure4:DBMSpopularitybydatabasemodelpiechart

Inthepiechartabove,itiscleartoseethatRelationalDBMSaretheonesusedby
default.However,thestateoftheartischangingbytheinnovationsinthe
databasetechnology.Even thoughthepercentagesofpopularityofNoSQL
databasesareminimalcomparedtoRelationalDBMS,thefactthattheyarerecent
technologiesingrowthisenoughtoevaluatethemmoredeeply.
NoSQLDBMS

ManydifferentNoSQL DBMS have been developed,buttheyare generally


classifiedinfourtypes[5]:
Key-valuestores:itsstructureconsistsinpairingkeystovalues.When
performingachangeinavalue,theentirevalueotherthanthekeymustbe
updated.Itscaleswellbecauseofthesimplicity.However,itcanlimitthe
complexityofthequeriesandotheradvancedfeatures.[18]Examples:
Dynamo,AzureTableStorage,BerkeleyDB

DocumentStores:Therecordsstoredarecalleddocuments,whichconsist
ofgroupingofkey-valuepairs.Valuescanbenestedtoarbitrarydepths.
[18]Examples:Elastic,MongoDB,AzureDocumentDB

WideColumnStores:WhileRDBMSstoreallthedatainaparticulartable’s
rowstogetheron-
disk,beingabletoretrieveaparticularrowfast,Columnfamilydatabasesareabletor
etrievealargeamountofaspecificat-tribute
fastbyserializingallthevaluesofaparticularcolumntogetheron-disk. This
approach is useful for aggregate queries. [18] Examples:
Hadoop/HBase,Cassandra,AmazonSimpleDB

GraphDatabases:idealatdealingwithinterconnecteddata.Theirstruc-ture
consistofconnections,oredges,betweennodes.Bothnodesandtheiredges
canstoreadditionalpropertiessuchaskey-valuepairs.Thestrengthofa
graphdatabaseisintraversingtheconnectionsbetweenthenodes.Their
downsideisthattheygenerallyrequirealldatatofitononemachine,limiting
theirscalability.[18]Examples:Neo4J,InfiniteGraph,TITAN

Othertypes:MultimodelDatabases,ObjectDatabases,Grid & Cloud Database


Solutions, XML Databases, Multidimensional Databases,
MultivalueDatabases,EventSources,TimeSeries/StreamingDatabases
(a)ExampleofKey-ValueStore (b)ExampleofDocumentStore

Figure5:FourmaintypesofNoSQLdatabases Consistency Models forNoSQL


databases:Before NoSQL,ACID was the
quintessentialmodelthatdatabasesweremeanttofollow.Briefreminderofthe
ACIDproperties:

Atomicity:Alloperationsinatransactionsucceedoreveryoperationis rolledback.
Consistent:On the completion ofa transaction,the database is
structurallysound.

Isolated:Transactionsdonotcontendwithoneanother.Contentiousaccesstodatais
moderatedbythedatabasesothattransactionsappearto runsequentially.
Durable:Theresultsofapplyingatransactionarepermanent,eveninthe
presenceoffailures.

However,NoSQLdatabasesbreakwiththetopicalityofSQLmodelswithACID
properties.BASEpropertiesseem toadequatebettertomostNoSQLdatabases,
andtheyareasfollows:

BasicAvailability:hedatabaseappearstoworkmostofthetime.
Soft-state:Storesdon’thaveto bewrite-consistent,nordodifferent
replicashavetobemutuallyconsistentallthetime.
Eventualconsistency:Storesexhibitconsistencyatsomelaterpoint(e.g.,
lazilyatreadtime).

ACIDtransactionscanbeconsideredstricterthanneededformanyNoSQLcases,
astheyapplymanyconstraintsforsafetysake.Ontheotherhand,BASE
transactionsguaranteesscale and resilience.The BASE modelisused by
aggregatestores,suchascolumnfamily,key-valueanddocumentstores.In
contrast,graph databases use the ACID model.BASE databases promise
availabilityofthedataattheexpenseofdataconsistency(theconsistencyofthe
dataisonlyassuredatconcretesnapshots).[16]Graphdatabasesdifferentiate
themselvesfrom otherNoSQLdatabasesbyfocusingmoreondataconsistency.
Thecomparisonmadeinthelinesaboveisshowninatablebelow:

ACID

Properties Atomicity Consistent Isolated Durable BASE


BasicAvailability Soft-state
Eventualconsistency

NoSQLDBMS GraphDatabases Aggregatestores


Table1:ComparisonofACIDandBASEConsistencyModels
ComparisonofDBMS

RelationalDBMSclearlyarethebenchmarkamongdatabasesystems.Themass
adoptionofthisDBMStypeisanimportantfactorforchoosingitasthemainsystem
inmanycompanies.However,currenttrendsshowthatthefourmaintimesofNoSQL
databasesshouldalsobetakenintoaccountbeforeinstallingaDBMS.Tohavea
moreobjectivepointofviewofthebenefitsofusingeachmodel,theusecasesfor
whichtheyperform betterandtheonesforwhichtheyperform theworst,are
listedbelow.

Usecasesforrelationaldatabases

Positiveusecases:transaction-orienteddatabases(bankingapplications, on-
linereservations),wheretheconcurrencyofmanytransactionsmust besup-
portedandtheintegrityofthedatamustbeprotected.

Negativeusecases:datawarehouses,whichareanalytically-oriented
databaseswithalargeamountofdataandinfrequentupdates.The
constraintsoftherelationaldatabasewouldn’tsupportthescalability.

Usecasesforkey-valuestores
Positiveusecases:
–Forstoringusersessiondata
–Maintainingschema-lessuserprofiles
–Storinguserpreferences
–Storingshoppingcartdata
Negativeusecases:
–Toquerythedatabasebyspecificdatavalue
–Withrelationshipsbetweendatavalues
–Tooperateonmultipleuniquekeys
–Ifthebusinessneedsupdatingapartofthevaluefrequently
Usecasesfordocumentstores
Positiveusecases:
–E-commerceplatforms
–Contentmanagementsystems
–Analyticsplatforms
–Bloggingplatforms
Negativeusecases:
–Toruncomplexsearchqueries
–Applicationrequirescomplexmultipleoperationtransactions
Usecasesforwide-columnstores
Positiveusecases:
–Contentmanagementsystems
–Bloggingplatforms
–Systemsthatmaintaincounters
–Servicesthathaveexpiringusage
–Systemsthatrequireheavywriterequests(likelogaggregators)
Negativeusecases:
–Tousecomplexquerying
–Ifthequerypatternschangefrequently
–Withoutanestablisheddatabaserequirement
Usecasesforgraphdatabases[19]
Positiveusecases:
–Frauddetection
–Graphbasedsearch
–NetworkandIToperations
–Socialnetworks
Negativeusecases:
–DataWarehousessobigthatrequireBASEmodel
Figure6:PositionsofNoSQLdatabases(source:Neo4j)

Onthefigureabove,thefivetypesofDBMSthatwerebeingcompared,aredisplayeda
ccordingtothesizeandcomplexityoftheirdatabases.Itcanbe
concludedthateachoneofthoseDBMSworksforsomespecificusecases,
dependingontheamountandcomplexityofthedatathatisgoingtobestored.
Theirusecasesarenotoverlapped,whichjustifiesthatthefifthofthem must
beconsideredbeforeimplementingaDBMSinacompany.

Chapter9:GraphDatabases

Graphdatabasesaredatabaseswhosespecificpurposeisthestorageofgraphoriente
ddatastructures,thereforeanintroductiontographtheorytobeconsistentwhenusingi
tsterminology.

ConceptsofGraphDatabases

PositioningIthaspreviouslybeenexplainedthatNoSQLdatabasesaddresssev-eral
issuesthatrelationaldatabasesdonot:availabilityfortheprocessingoflarge
datasets,partitioning,flexibilityoftheschemaandmodellingandprocessingcomple
xstructuresliketrees,graphs,specializedinprocessinghighlyconnecteddata,
managingcomplexandflexi-bledatamodelsandimprovingtheperformanceof
complexqueriesbytraversingthegraph.

ModelAnotherqualityofgraphdatabasesisthesimplicityofitsmodel.Inthe
figuresbelow,itcanbeappreciatedthedifferenceinmodelingthesameusecase
inarelationaldatabaseoragraphdatabase.Themodelofthegraphdatabaseis
moresimilartothebusinessmodel,whichmakesitmoreaccessibletonottechnicalpr
ofiles.[8]
(a)RelationalDatabaseModel (b)GraphDatabaseModel
Figure7:ModelComparison

Agraphisapictorialrepresentationofobjectswhichareconnectedbysome
pairoflinks.Agraphcontainstwoelements:Nodes(vertices)and
relationships(edges).

WhatisGraphdatabase
Agraphdatabaseisadatabasewhichisusedtomodelthedataintheform
ofgraph.Itstoreanykindofdatausing:

Nodes
Relationships
Properties
Nodes:Nodesaretherecords/dataingraphdatabases.Dataisstoredasproperties
andpropertiesaresimplename/valuepairs.
NodescanbegroupedtogetherbyapplyingaLabeltoeachmember.Anodecan
havezeroormorelabels.Labelsdonothaveanyproperties.StoringdatainNeo4jis
similartoaddmorerecordsinotherdatabases.

Relationships:Itisusedtoconnectnodes.Itspecifieshowthenodesarerelated.

Relationshipsalwayshavedirection. Relationshipsalwayshaveatype.
Relationshipsformpatternsofdata.

Properties:Propertiesarenameddatavalues.
PopularGraphDatabases
Neo4jisthemostpopularGraphDatabase.OtherGraphDatabasesare

OracleNoSQLDatabase OrientDB
HypherGraphDB
GraphBase
InfiniteGraph
AllegroGraphetc.
WhyGraphDB

Graphdatabaseisveryusefulnowadaybecauseingraphdatabasesdataexistin
theformoftherelationshipbetweendifferentobjects.Therelationshipbetweenthe
dataismorevaluablethanthedataitself.

Relationaldatabasesstorehighlystructureddatawhichhaveseveralrecordsstoring
thesametypeofdatasotheycanbeusedtostorestructureddataand,theydonot
storetherelationshipsbetweenthedatawhilegraphdatabasesstorerelationships
andconnectionsasfirst-classentities.

Thedatamodelforgraphdatabasesissimplecomparedtootherdatabasesand,
theycanbeusedwithOLTPsystems.Theyprovidefeaturesliketransactionalintegrit
yand operationalavailability.

GraphDBvsNoSQLDatabase
FollowingaresomepointswhichspecifywhyGraphDbisbetterthanotherNoSQLda
tabases:

MostNoSQLdatabasesstoresetsofdisconnectedaggregates.Thismakesit
difficulttousethemforconnecteddataandgraphs.
Onewell-knownstrategyforaddingrelationshipstosuchstoresistoembedan
aggregate'sidentifierinsidethefieldbelongingtoanotheraggregate-effectively
introducingforeignkeys.

Butthisrequiresjoiningaggregatesattheapplicationlevel,whichquicklybecomes
prohibitivelyexpensive.
Seetheusecasesofdifferenttypeofdatabases:
Relationaldatabase:Itisrepresentedintabularformsoitisbestforcalculatingthe
income.
Key-ValueStore:Itisbestforbuildingashoppingcart.
NoSQLdatabases:Itisstoredasadocumentso,itisbestforstoringstructured
productinformation.
GraphDB:Itfollowsagraphstructure.Itisbestfordescribinghowausergotfrom
pointAtopointB.
Neo4jDataModel
Neo4jDatabasefollowsthePropertyGraphModelforstoringandmanagingitsdata.
Neo4jisagraph
databasewhichcontainsthefollowingfeaturesofPropertyGraphModel.

TheGraphmodelcontainsNodes,RelationshipsandPropertieswhichspecifiesdat
aand itsoperation.
Propertiesarekey-valuepairs.
NodesarerepresentedusingcircleandRelationshipsarerepresentedusingarrowke
ys. Relationshipspecifiestherelationbetweentwonodes.
Therearetwotypesofrelationshipsbetweennodesaccordingtotheirdirections:
UnidirectionalandBidirectional
EachRelationshipcontainstwonodes:"StartNode"or"FromNode"and"ToNode"
or "EndNode".
BothNodesandRelationshipscontainproperties.
RelationshipsshouldbedirectionalinPropertyGraphDataMode.Ifyoucreatea
relationshipwithoutadirection,itwillthroughanerrormessage.

TherearethreemainbuildingblockofaGraphDBDatamodel:

Nodes
Relationship Properties
FollowingisasimpleexampleofaPropertyGraph.

Figure8:SimpleGraph

Here,wehaverepresentedNodesusingCircles.Relationshipsarerepresentedusin
gArrows.
Relationshipsaredirectional.WecanrepresentNode'sdataintermsofProperties(k
ey-valuepairs).In
thisexample,wehaverepresentedeachNode'sIdpropertywithintheNode'sCircle.

Queryperformance

GraphdatabasescompetitiveadvantageIthasbeensaidthatgraphdatabaseshavea
reasontobebecausetheyoutperform relationaldatabasesincomplexqueries.They
areparticularlygoodwhentherelationshipsbetweenitemsaresignificant.Theuse
casethatisbettersuited forgraph databasesis"find allentitiesofa kind"
(myEntity.findAll).Theexecutionofsuchaquery,startswithanindexlookuptofind
thestartingnode(s)fortraversal.Thentherelationshipsinthegrapharetra-versed
simultaneously.Becauseoftheconcurrenceofthetraversal,thebiggerthevolumeof
data,themoreitoutperformsrelationaldatabases.
Figure9:Queryexecutioningraphdatabases

Relationaldatabasesarelessadequatetoquerythroughrelationships.Itwouldmean
queryingthroughdifferenttables,followingforeignkeysandotherindexes,anditwo
uld
considerablyincrementtheperformancetime.Graphdatabasestraversalsareperfo
rmed byfollowingphysicalpointers,whileforeignkeysarelogicalpointers.
[8]Thequeryinthe figure,includesthetimeofeachindex-
scan.Themoretablesareincludedinthequery,the
largertheexecutiontimewillbecome.
Figure10:Queryexecutioninrelationaldatabases

RelationalDatabasescompetitiveadvantageOntheotherhand,becauseofthe
internalstructureofthetables,relationaldatabaseswouldoutperform graph
databaseswhentheoutputrequiresalltheattributesofatable(findAll-like
queries).Itsidealusecaseistoaggregateoveracompletedataset.[8]

GraphdatabasesrankingBelow thoselines,thefigureshowstheDB-Engines
RankingonGraphDBMS.Neo4jleadstheranking,anditsscoretriplesthe
followingDBMS,MicrosoftAzureCosmosDB.Neo4jhasbeenleadingtheGraph
databasessectorforsomeyears,aswecanseeinthetrendscatterplot.Itmust
betakenintoaccountthatthescoreisdisplayedinlogarithmicscale,thereforethe
differenceinpopularityisreallysignificant.

ItcanalsobeseeninthetrendscatterplotthatMicrosoftAzureCosmosDBappearedin
thegraphdatabaselandscapein2014,andsincethenitsrisein
popularityhasbeenquitesteep.AnargumentforthatisthatMicrosoftAzureis
wellintegratedinthesoftwaremarketplace.
Successfactor:Ithasbeenstated,whencomparingtheNoSQLDBMS,thatgraph
databaseshadalimitationinsize.Therefore,itisacompetitiveadvantagetowork
onfacilitatethepartitioningofagraph.WhileOrientDBandInfiniteGraphstatethat
theyaccomplishedso,Neo4jseemstobetheDBMSthatmoresuccessfullyis
improvinggraphpartitioning.[8]

Figure11:GraphDBMSRanking
Figure12:TrendGraphDBMSpopularityscatterplot

Chapter10:Neo4j
NecessityofNeo4j

WhyNeo4j?ByusingagraphdatabaselikeNeo4jwhichfocusesondatarela-
tionships;
patternsandtrendscaneasilybeseenunliketorelationaldatabases.Duetotoday’s
growingbusinessdemandsandcompetitiveatmosphere,usingtherighttoolisvery
importantandwhenitcomestowidelyconnecteddataNeo4jisthebestbecauseitis
thousandsoftimesfasterthantraditionaldatabases.Neo4janalyzeandtraverseofall
datainrealtimeandgivestheresultsveryfast.Neo4jiswidelyusedbylotsofbig
companieslikeeBay,Walmart,Cisco,UBSandmanymore.

WhatisNeo4j?Neo4jisanopen-sourceNoSQLgraphdatabasewritteninJavaand
ScalaandAccordingtodb-engines.com,Neo4jiscurrentlyworld’slead-inggraph
database.Thishasmanyreason.FirstofallNeo4jprovidesACID transaction
compliance,clustersupport,runtimefailover,highavailabilityandhighspeedquery
ing throughtraversals.Itscalestobillionsofnodesandrelationship.Ithasgreatuser
interfaceanditiseasytolearnbecausetherearelotsoffreeonlineresourcesonthe
web.Alsoithasgreatcommunitythatcanhelpwithanyprob-lems.Ingeneralterms
Neo4jisdesignedforlinkingrelationshipsandithandlesthisrelationshipswithspee
d,
ease,andextremeflexibility.WithNeo4j,modelscaneasilybeconvertedtodatabase
schema.Ifthedataisdenselyconnectedorvariousconceptualmodeltry’sisneeded
forthedatathenNeo4jisthesolution..

Neo4jVersions

Version ReleaseDate Neo4jVersion1.0 February2010 Neo4jVersion2.0


December2013 Neo4jVersion3.0 April2016

Graphdatabasesusesarelationshipfirstapproachtostoringandqueryingyourdata.
Theystoredatainamuchmorelogicalfashion,awaythatrepresentstherealworld
and prioritizes the representation,discoverability and maintainability ofdata
relationships.Butdataintegrityisimportantformaydeveloperswhocareaboutdata
relationshipsoACIDpropertywasbroughtbacktoatleastonenosqldatabasecalled
Neo4J.ThisallowsusNeo4Jasatransactionaldatastore.Storingyourmostcritical
businessdata.

Graphdatabasesgivesdevelopersamoreintuitivedatamodelfasterqueriesand
betteragilitytoadapttochangesinthebusiness.
Figure13:Neo4jAsaLeadingGraphDatabase

HowNeo4jisDifferentThanTraditionalDatabases?Graphdatabasesaremuch
differentthantraditionalrelationaldatabaseslikeSQL.Insteadofusingtablewith
rowsandcolumns,graphdatabasesuseagraphwithnodesandrelationships.
Bothofthesetypesofdatabaseshavetheirplace.Relationaldatabaseisgreatfor
tabulardatathatisnotreallycloselyrelated.Ifwehavealotofnested
relationshipsinrelationaldatabaseitcangetverycomplicatedwithjointablesand
joinqueriesandweneedallkindsofprimaryandforeignkeysanditcanbereal
hardtodealwithandevenworsethanthatisitcanbereallycostlyonthesystem
sographdatabasesarebuilttofixthatproblem andworkwithdatathatismuch
morecloselyrelatedandmoredynamic.
Thus,becauseofthereasonsstatedabovewechooseNeo4jasourdatabase.
Figure14:Ebay’scommentaboutNeo4j
Neo4jWorking
Neo4jstoresanddisplaysdataintheformofgraph.InNeo4j,dataisrepresentedbyno
desand relationshipsbetweenthosenodes.

Neo4jdatabases(aswithanygraphdatabase)arealotdifferenttorelationaldatabase
ssuchasMS
Access,SQLServer,MySQL,etc.Relationaldatabasesusetables,rows,andcolumn
stostoredata. Theyalsopresentdatainatabularfashion.

Neo4jdoesn'tusetables,rows,orcolumnstostoreorpresentdata.
Neo4jisbestforstoringdatathathasmanyinterconnectingrelationshipsthat'swhygr
aphdatabases
likeNeo4jhasanadvantageandmuchbetteratdealingwithrelationaldatathanrelatio
naldatabases are.

Thegraphmodeldoesn'tusuallyrequireapredefinedschema.Sothereisnoneedtocr
eatethe
databasestructurebeforeyouloadthedata(likeyoudoinarelationaldatabase).InNe
o4j,thedatais thestructure.Neo4jisa"schema-optional"DBMS.
InNeo4j,noneedtosetupprimarykey/foreignkeyconstraintstopredeterminewhichf
ieldscanhave
arelationship,andtowhichdata.Youjusthavetodefinetherelationshipsbetweenthe
nodesyou need.

FeaturesofNeo4jGraphDatabase

SQLLikesimplequerydialectNeo4jCQL
It’sbackinguptheIndexesbyusingApacheLucence
ItcontainsaUItoexecuteCQLCommandsi.e,Neo4jDataBrowse
It’sbackinguptheUNIQUEconstraint
ItbolstersfullACIDproperties
ItutilizesNativegraphstockpilingwithNativeGPE
ItfollowsPropertyGraphDataModel
ItgivesRESTAPItobeexecutedforanyProgrammingLanguagelikeSpring,Java,
Scalaandsoforth
ItbolsterstradingofinquiryinformationtoJSONandXLSformat
AdvantagesofNeo4j
PropertiesofNeo4j

Figure15:GeneralLookatNeo4j
FollowingarepropertiesofNeo4j;
Datamodel(flexibleschema):Neo4jhaspropertygraphmodel.Itcanbe
explainedlikegraphhasnodesandthesenodesareconnectedwitheach
other.Nodesandtheirrelationshipsstoredatainkey-valuepairsknownas
properties.Neo4jhasalsoflexibleschemaitmeanspropertiescanbe
addedorremovedwhenitisnecessary.

ACIDproperties:Neo4jsupportsfullACID(Atomicity,Consistency,Isolation,and
Durability)rules.

Scalabilityandreliability:Databasecanbescaledbyincreasingthenumberofreads/
writes,andthevolumewithouteffectingthequeryprocessing
speedanddataintegrity.Neo4jalsoprovidessupportforreplicationfor
datasafetyandreliability.

Thetraversalofthegraph:Thetraversalistheoperationofvisitingasetof
nodesinthegraphbymovingbetweennodesconnectedwithrelationships.
It’sauniqueoperationtothegraphmodelfordataretrieval.Queryingthedata
usingatraversalonlytakesintoaccountthedatathat’srequired,thereforeit
isnotneededtoquerytheentiredatasetinanexpensiveoperation,likeisthe
casewithjoinoperationsonrelationaldata.[1]

CypherQueryLanguage:Neo4jprovidesapowerfuldeclarativequerylanguagekno
wnasCypher.ItusesASCII-artfordepictinggraphs.Cypheris
easytolearnandcanbeusedtocreateandretrieverelationsbetween
datawithoutusingthecomplexquerieslikeJoins.[9]

Built-inwebapplication:Neo4jprovidesabuilt-inNeo4jBrowserweb
application.Usingthis,creatingandqueryinggraphdatacanbedone.
Drivers:Neo4jcanworkwith
RESTAPItoworkwithprogramminglanguagessuchasJava,Spring, Scalaetc.
JavaScripttoworkwithUIMVCframeworkssuchasNodeJS.
ItsupportstwokindsofJavaAPI:CypherAPIandNativeJavaAPIto
developJavaapplications.
Indexing:Neo4jsupportsIndexesbyusingApacheLucence.
AdvantagesofNeo4jGraphDatabase

Neo4jisverypopularinlotsofindustriesanditisafirstchoiceofmanycompanies.Ne
o4jgivesadvantageinmanypoints.Firstofallitisbasedonhandling
complexdataconnectionsasaresultoftheincreasedvolumeandstrengthinthe
data,thesecompaniesgainlotsofbenefitsamongtheircompetitive.Following
aretheadvantagesofNeo4j.

Easytorepresentconnecteddata:Itmakesbotheasyandfasttotraverseor
navigatelargeamountsofdatathathassomesortofrelationship
Canrepresentsemi-structureddataeasily:Datathatdoesnotfallintonatural
structurecanbeeasilyrepresentedinagraphdatabase

CypherCommands:Cyphercommandsarehumanreadableandveryeasy
tolearnSimpleandPowerfulDataModel:Thepropertygraphdatamodelis
simpleyetstillverypowerful.Thebasicbuildingblocksareknowntorelationshipsa
ndtheycancontaindataintheform ofkeyvaluepairsor
propertiesunliketherelationalmodel.

JoinAspect:There’snoneedforcomplexandcostlyjoinstoretrieveconnectedorrel
ateddata.Insteadthegraphdatabaseusesanaturalconcept
ofrelationships.Relationshipsinagraphactuallyformedpathssoquerying
ortraversingagraphinvolvesfollowingthosepatsandbecauseofthat pathori-
entednatureofthegraphdatamodel,themajorityofpathbased
operationsareextremelyefficient.

Performance:Traversing a relationship is done in constanttime so query


performancedoesnotdecreasewhendatagrowsandCypherisdesignedfor
graphssoitisverysimpletowritegraphtraversalsbasedonpatternmatching.

Neo4jisonlygraphdatabasethatcombinesnativegraphstorage,scalable
architecture optimized forspeed,and ACID compliance to ensure
predictabilityofrelationship-basedqueries.[10]

Real-timeinsights:Neo4jprovidesresultsbasedonreal-timedata.
Highavailability:Neo4jishighlyavailableforlargeenterprisereal-time
applicationswithtransactionalguarantees.[15]
Biggestgraphcommunityintheworld:Neo4jhasthelargestandmost
contributorgraphcommunity.
Easytolearn:MatureUIwithintuitiveinteractionandbuilt-inlearning.[10]
PerformanceInNeo4j
Neo4jprovidesfastandefficientgraphexperienceandthestrongestpartofitis;Neo4j
cantraversemillionsofnodesinmilliseconds.Alsoevenexponentiallyincreas-
ingdata sizedoesnoteffecttheperformanceofNeo4junlikerelationaldatabases.

VolkerPacher,eBaydeveloperandNeo4jclient:"OurNeo4jsolutionisliterallya
thousandtimesfasterthanthepreviousMySQLsolution,withsearchesthat
requirebetween10and100timeslesscode”.

Figure16:QuerytimesforOracleExadatavsNeo4j
Figure17:Tomtom’sComparisonofNeo4jwithMySQL
HowToIncreasePerformanceOfNeo4j?
Increasingthesizeofavailableheapmemory(Between8G-16G).
Increasingopenfilelimitfromdefault1024toatleast40000tobesure.
Inordertoavoidcostlydiskaccess,makingsureofrelevantgraphdatais
cachedinmemory.
Forthenon-Neo4jtasksrunningonthecomputerasufficientmemory
shouldbereserved.(Atleast16G)
Simplealgorithmsleadstoincreasedperformance.
Allrelatednodesandedgesshouldbekeptinservermemorybeforegiving results.
Traversalsshouldbeindependent.
Indexesshouldbeused.
WhatcanNeo4jbeusedfor?

Neo4jis highly suitable forstoring data thathas has many


interconnectingrelationships.Thisiswheregraphdatabasescan
makeahugedifference.Infact,graphdatabaseslikeNeo4jaremuch
betteratdealingwithrelationaldatathanrelationaldatabasesare.
Thisisinpart,duetothefactthatthegraphmodeldoesn'tusually
requireapredefinedschema.Youdon'tneedtocreatethedatabase structurebefore
you load the data (like you do in a relational database).InNeo4j,thedatais
thestructure.Neo4jisa"schemaoptional"DBMS.

ButthemainreasonNeo4jisbetterforrelationaldataisinthewayit
allowsyoutocreaterelationships.Neo4jisbuiltaroundrelationships.
Thereisnoneedtosetupprimarykey/foreignkeyconstraintsto
predeterminewhichfieldscanhavearelationship,andtowhichdata.
WithNeo4j,justaddanyrelationshipbetweenanynodewheneveryou need.

SothismakesNeo4jextremelywellsuitedforsocialnetworking
applicationslikeFacebook,Twitter,etc.Buttherearemanyother
areaswhereNeo4jexcels.Herearesomeofthemainareasthat Neo4jcanbeusedfor:

● Socialnetworks
●Realtimeproductrecommendations
●Networkdiagrams
●Frauddetection
●Accessmanagement
●Graphbasedsearchofdigitalassets
●Masterdatamanagement

CypherQueryLanguage

Cypherisadeclarativelanguageforworkingwithgraphsandgraphdataforboth
readingandwritingtothegraphanditisveryexpressiveandpowerful.Also
Cypherdefinespatternsinthegivengraphdata.

Cypherisdeclarativelanguage:Thismeansthatwespecifythedatathatweare
interestedin.Wedonotspecifyhowtogetthatdatafromthedatabase.
Cypherisveryhumanreadablelanguageanditisaccessiblenotjustfordevelopersev
eryonecaneasilylearnanduseit.

CypherhasexpressionssimilartoSQLlikeWHERE,ORDER BY andsimple
conditionstatementslike<,=,>.Itsdifferencewithsqlis;Cypherisdesignedto
representgraphdatapatternsforexampleithasMATCHpropertythispropertyis
builtonfindingandspecifyingpatternsinthedata
Structure
NodesNodesrepresentsdataentitiesandtheycanhavelabelsandeachnode
representsdifferentsingledataentities.Itisequivalenttorecordsinarela-tional
databaseNodescanalsohavepropertieswhicharebasicallyattributes.Nodesare
shownwithparentheseslike(p:Product).

Figure18:NodeRepresentation

RelationshipsInCypher;betweenthenodeswehavelineswhichrepresentthe
relationshipbetweeneachnode.Relationshipscanalsohavepropertiesjustlike
nodeswhichissomethingthatismuchdifferentthanSQL.Alsorelationships
havedirections.Relationshipisshownas–>betweentwonodes.

OperationsInCypher
Create:Itisusedtocreatenodesandrelationshipsbetweenthem
Wecreatedanoderepresentinguswithfiveproperties;
Name:’AjitSingh’
Country:’India’
City:’Patna’

DateOfBirth:’21.05.1984’ School:’PWC’ WiththisCyphercode;

CREATE(n:Person{name:’AjitSingh’,country:’India’,city:’Patna’,
DateOfBirth:’21.05.1984’,School:’PWC’})RE-TURNn
Name:’AnnaTuruPi’
Country:’Spain’
City:’Barcelona’

DateOfBirth:’30.07.1995’ School:’PWC’ WiththisCyphercode;

CREATE (n:Person{name:’AnnaTuruPi’,country:’Spain’,city:
’Barcelona’,DateOfBirth:’30.07.1995’,School:’PWC’})RETURNn
Wecreatedarelationshipcalled"FRIENDS_WITH"withtheproperty"SINCE";
WiththisCyphercode;

MATCH(a:Person),(b:Person)WHEREa.name=’AjitSingh’ANDb.name=
’AnnaTuruPi’CREATE(a)-[r:FRIENDS_WITH{SINCE:"17/09/2017"}]->(b)
RETURNr
(a)ResultinConsole (b)AfterCreatingRelationship
Figure19:CreateRelationshipBetweenTwoNodes
Match:Matchfindsspecifiedpatternsinthedata.
Figure20:Relationships
WiththisCyphercodeweshowedallpeoplewhomEstebanZimányiteachesto;
MATCH(a:Person)<-[:TEACHES_TO]-(b:Person{name:’Este-
banZimányi’}) RETURNa.name
Set:Thisisusedtoupdatepropertiesinthenodesandrelationships.
WiththisCypherCodewechangedEstebanZimányi’sdateofbirthto’01.01.1966’
MATCH(n{name:’EstebanZimányi’})SETn.DateOfBirth=’01.01.1966’
RETURNn
DeleteThisoperatordeletesnodesorrelationshipsinthedata.
WiththisCyphercodewedeletedAjitSingh
MATCH(n:Person{name:’AjitSingh’})DELETEn
LoadingDataWithCypher

TherearelotsofwaystoimportdatainNeo4jbutthemostcommonwayisuploadit
asacsvfile.LoadCSVoperatorisbuiltintoNeo4jandthisoperatorisusedforsmall
ormediumsizedatasetsupto10millionrecords.Ifwewanttouploaddatathathas
morethan10millionrecordsthanweshoulduse[USING PERIODICCOMMIT[n]]
property.Ifwedontusethispropertythismeansthatweareprocessingwholefilein
onerunandcreatingeverythinginonetransaction

LoadCSV:ThisoperatorisusedforimportingCSVfilesintoNeo4j.
Figure21:LoadCSVOperatorStructure
UseCasesofNeo4j

Figure22:UseCasesOfNeo4j
Thecommonusecasesare;

RealTimeRecommendations:Recommendationalgorithmsfindsrelationships
betweenpeople,productsandotherservicesrelatedtopurposebasedonuser’s
previousbehaviors.Neo4jisabletostoreinterconnecteddataaboutcustomers
andproductsandsinceNeo4jdoesn’tneedindexingateverysuggestionit
providesveryfastandeffectivealgorithm todealwithrealtimedata.Walmart
usesNeo4jforthispurpose
MasterDataManagement:Inlargeorganizations,differentsystemsstoresinformati
onaboutcustomers,employees,titlesandsupplychain.Withthegraph
modelitiseasyto bring datafrom differentsystemscreateviewsabout
customersorcankeeptrackofalltheinformationabouttheorganizational
systemitself.CiscousesNeo4jforthispurposeandthecompanyalsousesNeo4j
fortheirhelpdeskso-lution
Figure23:MasterDataManagementGraphDesign

FraudDetection:Frauddetectionisveryimportantinfinanceindustry.Nowa-days
inordernottobedetectedbybank’sfraudalgorithmspeopleusedifferent
approacheslikeopenseveralbankaccountswithvalidinformationanddonormal
transactionswithoutbeinganoutlier.Sopeopleopenfalsebankaccountswiththe
sameidentitytokenandwithdrawallthemoneyinallbankaccounts.Itishardto
detectthatbehaviorbutitisveryeasytoseethatwithgraphbecausethepattern
ofthepeopleopeningbankaccountsusingthesameidentitytokencanbeeasily
detectedasapatterninagraph

GraphBasedSearch:Metadataisavailableforthingslikeproducts,articlesetc.
Andbeingabletomodelmetadataasagraphallowstoenhancesearchmeaning
usersareabletofindmorerelevantthingsforthem.ForexampleLinkedIn;When
searchisexecutedwedon’tseerandomoralphabeticalsortedresultswefirstsee
therelevantones.LufthansausesNeo4jforthismatter.

Network&ITOperations:Ifdatacenterismodelledasagraphthendepen-dency
analysiscaneasilybeappliedonnetworksystemstogetconclusionslikeifone
virtualmachinegoesdownhowmanyapplicationswillbeaffected.HpusesNeo4j
tomodeltheirnetworkforsomelargetelecommunicationproviders.

Figure24:NetworkITOperationsGraphDesign
Identity&AccessManagement:Withinlargeorganizationstherearehundreds
ofusersandcontrollingwhocanaccesstowhichinformationiscrucialfor
securityreasons.Socreatinggroupsandrolesforeachusercomesinhandyin
thissituation.Thiskindofdataisveryrichandconnectedandcanbeeasily
handledbyNeo4j.UPCLondonusesNeo4jforthatanditreceived2014Graphic
awardsfor“Bestİdentityandaccessmanagementapp”

Chapcrer11:GettingstartedwithNeo4j
Requirements
SpringDataNeo4j5.1.xatminimum,requires:
JDKVersion8andabove.
Neo4jGraphDatabase3.1andabove.
SpringFramework{springVersion}andabove.
IfyouplanonalteringtheversionoftheNeo4j-OGMmakesureitisa3.0.0+release.
DownloadNeo4j
FirstdownloadNeo4jfromitsofficialwebsite:https://neo4j.com/download/
YoucanchoosefromeitherafreeEnterpriseTrial,orthefreeCommunityEdition.Her
e,weareusing theCommunityEdition.
Runthedownloadedfileandfollowtheinstructionsgivenbelow:
StartNeo4j:
StarttheServer

ClickontheinstalledNeo4jCommunityEdition.
Initializationstarted:
Neo4jisstarted.Itisreadytouse
Openbrowserandgotolocalhost:http://localhost:7474/browser/
Orhttp://127.0.0.1:7474/browser/

StartNeo4jwebserver

Visitthesub-directory/binoftheextractedfolderandexecuteinterminal./neo4jstart
Visithttp://localhost:7474/

Onlythefirsttime,youwillhavetosigninwiththedefaultaccountandchangethe
defaultpassword.Asofcommunityversion3.0.3,thedefaultusernameand
passwordareneo4jandneo4j.

YoucannowinsertNeo4jqueriesintheconsoleprovidedinyourwebbrowserand
visuallyinvestigatetheresultsofeachquery.
StartNeo4jwebserver

EachNeo4jservercurrently(inthecommunityedition)canhostasingleNeo4j
database,soinordertosetupanewdatabase:
Visitsub-directory/bin andexecute./neo4jstop tostoptheserver

Visitthesub-directory /conf andeditthefileneo4j.conf ,changingthevalueofthe


parameterdbms.active_database tothenameofthenew databasethatyouwantto create.

Visitagainthesub-directory/binandexecute./neo4jstart
Thewebserverhasstartedagainwiththenewemptydatabase.Youcan
visitagainhttp://localhost:7474/toworkwiththenewdatabase.
Thecreateddatabaseislocatedinthesub-directory/data/databases ,underafolder
withthenamespecifiedintheparameterdbms.active_database .

Deleteoneofthedatabases

MakesuretheNeo4jserverisnotrunning;gotosub-directory/binandexecute
./neo4jstatus .Iftheoutputmessageshowsthattheserverisrunning,alsoexecute
./neo4jstop .
Thengotosub-directory/data/databasesanddeletethefolderofthedatabase
youwanttoremove.
CypherQueryLanguage

ThisistheCypher,Neo4j'squerylanguage.Inmanyways,CypherissimilartoSQL
ifyouarefamiliarwithit,exceptSQLreferstoitemsstoredinatablewhile
Cypherreferstoitemsstoredinagraph.

First,weshouldstartoutbylearninghow to createagraphandadd
relationships,sincethatisessentiallywhatNeo4jisallabout.
CREATE(ab:Object{age:30,destination:"England",weight:99})
YouuseCREATEtocreatedata
Toindicateanode,youuseparenthesis:()

Theab:Objectpartcanbebrokendownasfollows:avariable'ab'and
label'Object'forthenewnode.Notethatthevariablecanbeanything,but
youhavetobeconsistentinalineofCypherQuery
Toaddpropertiestothenode,usebrackets:{}brackets
Next,wewilllearnaboutfindingMATCHes
MATCH(abc:Object)WHEREabc.destination="England"RETURNabc;

MATCHspecifiesthatyouwanttosearchforacertainnode/relationship
pattern(abc:Object)referstoonenodePattern(withlabelObject)which
storethematchesinthevariableabc.Youcanthinkofthisentireline
asthefollowing

abc= findthematchesthatisanObjectWHEREthedestinationisEngland.

Inthiscase,WHEREaddsaconstraintwhichisthatthedestinationmustbe
England.YoumustincludeareturnattheendforallMATCHqueries(neo4jwill
notacceptjustaMatch...yourquerymustalwaysreturnsomevalue[thisalso
dependsonwhattypeofqueryyouarewriting...wewilltalkmoreaboutthislater
asweintroducetheothertypesofqueriesyoucanmake].

Thenextlinewillbeexplainedinthefuture,afterwegooversomemore
elementsoftheCypherQueryLanguage.Thisistogiveyouatasteofwhatwe
candowiththislanguage!Below,youwillfindanexamplewhichgetsthecastof
movieswhosetitlestartswith'T'

MATCH(actor:Person)-[:ACTED_IN]->(movie:Movie)
WHEREmovie.titleSTARTSWITH"T"
RETURNmovie.titleAStitle,collect(actor.name)AScast
ORDERBYtitleASCLIMIT10;
AcompletelistofcommandsandtheirsyntaxcanbefoundattheofficialNeo4j
CypherReferenceCardhere.
RDBMSVsGraphDatabase RDBMS GraphDatabase
Table Graph Rows Nodes
ColumnsandData Propertiesanditsvalues
Constraints Relationships Joins Traversal

Cypher
Introduction
CypheristhequerylanguageusedbyNeo4j.YouuseCyphertoperformtasks
andmatchesagainstaNeo4jGraph.
Cypheris"inspiredbySQL"andisdesignedtobyintuitiveinthewayyoudescribe
therelationships,i.e.typicallythedrawingofthepatternwilllooksimilartothe
Cypherrepresentationofthepattern.

Examples
Creation

Createanode
CREATE(neo:Company)//createnodewithlabel'Company'
CREATE(neo:Company{name:'Neo4j',hq:'SanMateo'})//createnodewithproperties

Createarelationship
CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)

QueryTemplates
Runningneo4jlocally,inthebrowserGUI(default:http://localhost:7474/browser/
), youcanrunthefollowingcommandtogetapaletteofqueries.
:playquerytemplate
Thishelpsyougetstartedcreatingandmergingnodesandrelationshipsbytyping
queries.
CreateanEdge CREATE(beginning_node)-[:edge_name{Attribute:1,Attribute:'two'}]->(ending_node)

Deleteallnodes
MATCH(n)
DETACHDELETEn
DETACH doesn'tworkinolderversions(lessthen2.3),forpreviousversionsuse
MATCH(n)
OPTIONALMATCH(n)-[r]-() DELETEn,r

Deleteallnodesofaspecificlabel
MATCH(n:Book)
DELETEn
Match(capturegroup)andlinkmatchednodes
Match(node_name:node_type{}),(node_name_two:node_type_two{})
CREATE(node_name)-[::edge_name{}]->(node_name_two)
UpdateaNode
MATCH(n)
WHEREn.some_attribute="someidentifier"
SETn.other_attribute="anewvalue"
DeleteAllOrphanNodes
Orphannodes/verticesarethoselackingallrelationships/edges.
MATCH(n)
WHERENOT(n)--()
DELETEn ReadCypheronline:neo4j/topic/3669/cypher

Python&Neo4j
Examples

Installneo4jrestclient
pipinstallneo4jrestclient
Connecttoneo4j
fromneo4jrestclient.clientimportGraphDatabase
db=GraphDatabase("http://localhost:7474",username="neo4j",password="mypass")
Createsomenodeswithlabels
user=db.labels.create("User")
u1=db.nodes.create(name="user1")
user.add(u1)
u2=db.nodes.create(name="user2")
user.add(u2)

Youcanassociatealabelwithmany nodesinonego
Language=db.labels.create("Language")
b1=db.nodes.create(name="C++")
b2=db.nodes.create(name="Python")
beer.add(b1,b2)
Createrelationships
u1.relationships.create("likes",b1)
u1.relationships.create("likes",b2) u2.relationships.create("likes",b1)

Bi-directionalrelationships
u1.relationships.create("friends",u2)
Matchusingneo4jrestclient
fromneo4jrestclientimportclient
q='MATCH(u:User)-[r:likes]->(m:language)WHEREu.name="Marco"RETURNu,type(r),m'

"db"asdefinedabove
results=db.query(q,returns=(client.Node,str,client.Node))

Printresults
forrinresults:
print("(%s)-[%s]->(%s)"%(r[0]["name"],r[1],r[2]["name"]))

Output:
(Marco)-[likes]->(C++) (Marco)-[likes]->(Python)

Chapter12:Neo4jApplication
SoftwareForthegraphdatabase,Neo4jCommunityEdition3.2.5hasbeenused,
andfortherelationaldatabase,SQLServer2017.
UseCaseSelected

Asproposedingraphdatabasebenchmarkguidelines[4],thebestteststo
benchmarkagraphdatabaseare:traversal(whichincludesthecalculationofthe
shortestpath),graphanalysis,connectedcomponents,communities,centrality
measures,patternmatchingandgraphanonymisation.Itisalsocommentedthat
amongthedomainswheregraphdatabasesprovetobemorebeneficialarethe
shortestpathgraphanalysisandrealtimeanalysisoftrafficnetworks.Inour
implementation,wearegoingtomodelflightroutes,astheyhavetheideal
propertiestobenchmarkagraphdatabase.Airportsandairlinesareelements
wheretheinformationliesonthetheirintercommunications.

Data

Thedatasetselectedtoperform thebenchmarkwasadatasetofflightroutes pro-


vided by OpenFlights.org [13].Itprovided three flatfiles,airlines.dat,
airports.dat,routes.dat.

Becauseofthesizeconcernswecreatedsyntheticdatainadditiontoourexistingdata
tables.Beforecreatingnewdatawehad67663differentroutesandnowwehave1193
413
differentroutes.Therowswecreatedhavedummyvariables,theydonothaveany
connectionwiththeexistingdataexcepttheirtypes.Soourqueriesmostlyresultedin
initialdataresults.Thisdatacreationprocesswasappliedbecausethemoredatawe
have,themoreaccuratebench-
markingresultsweget.Alsounliketraditionaldatabases,
addingmoredatatoNeo4jdoesnoteffectitsperformance.
ImplementingData

Figure25:OpenFlights.org
Neo4j:TocreatetheNeo4jdatabasewedevelopedapythoncode.Thiscodeuses
py2neolibrarytoaccessNeo4jdatabaseanditreadsourdata(externalsource)to
createnodes,relationships,propertiesandindexes
Figure26:Structureofthepythoncode

Theoriginalairportdatahadlatitudeandlongitudeattributes.Inordertopresent
bettervisualizationwecreatedafunctionthatcalculatesthedistancebetweentwo
connectedairports.Routedatahassource_airportanddestination_airportSowe
createdaroutenodeandweassignedthedistancebetweensource_airportand
destination_airportasanameattributetoroutenode.Intheendfourtypesofnodes
areAirlines,AirportsandRoutes,andtheyhavethefollowingcommunications:

Route ! TO ! Airport
Route ! FROM ! Airport
Route ! OF ! Airline

Table2:Graphdatabaseschema WeimplementedourdatatoNeo4jwiththisschema;

Figure27:InitialSchema
Figure28:ExampleofaqueryinNeo4j
SQL:Arelationaldatabasewascreatedimportingeachflatfileasatableandthen
wecreatedforeignkeyreferencesbetweentables.
Exportdata

ToexporttheNeo4j,wechosetousetheapoclibrary.Itisneededtoauthorize
Neo4jtoruntheplugins.Forthat,thislineofcodehastobeaddedinneo4j.conf:
apoc.export.file.enabled=true.

ExporttoCSV

apoc.export.csv.query(query,file,config): exports results from the Cypher


statementasCSV totheprovidedfileapoc.export.csv.all(file,config):exports
whole database as CSV to the pro-vided file
apoc.export.csv.data(nodes,rels,file,config):exportsgivennodesandrelationshi
ps
asCSVtotheprovidedfileapoc.export.csv.graph(graph,file,config):exportsgiven
graphobjectasCSVtotheprovidedfile
Weexportedtheentiredatabaseexecutingthefollowingcommandincypher: CALL
apoc.export.csv.all("/temp/neo4j_database_csv_file.csv",
{batchSize:10})YIELDfile,source,format,nodes,relationships,properties,
time,rows.

Exporttocypherscript

apoc.export.cypher.all(file,config):exportswholedatabaseincl.indexesasCyphe
r statements to the provided file
apoc.export.cypher.data(nodes,rels,file,config):
exportsgivennodesandrelationshipsincl.indexesasCypherstatementstothe
providedfileapoc.export.cypher.graph(graph,file,config)exportsgivengraphobj
ect incl.

Uindexes as Cypher statements to the provided file


apoc.export.cypher.query(query,file,config):exportsnodesandrelationshipsfro
mthe Cypher statement incl. indexes as Cypher statements to the provided file
apoc.export.cypher.schema(file,config):exportsallschemaindexesandconstraint
s tocypher

Thedatabasewasalsoexportedtocypheracypherscript:

CALL apoc.export.cypher.all("/temp/neo4j_database_cypher_file.cypher",
{batchSize:10})YIELDfile,source,format,nodes,relationships,properties,time,
rows
Figure29:ExportingNeo4jdatabasetocypherscript
QueryExamples(Neo4j-SQL)
Figure30:Algorithmsforgraphdatabases

Addlibraries:IthasbeencommentedthatNeo4jincludesgraphalgorithmsthat
allow ustoperform queriesthatwouldbeimpossibletoperform inSQL.
LibrariesofalgorithmscanbedownloadedandaddedinNeo4jasplugins.
Figure31:Addjarfilesinpluginfolder

ItisneededtoauthorizeNeo4jtoruntheplugins.Forthat,thislineofcodehas
tobeaddedinneo4j.conf:dbms.security.procedures.unrestricted=apoc.*(e.g.,
apoclibrary).

Afterthat,Neo4jneedstoberestarted,anditcanbeverifiedthatthepluginis
workingbywritingthefollowingcommandinNeo4jbrowser:
CALLdbms.procedures()YIELDname,signature,description
WHEREnamestartswith"apoc"
RETURNname,signature,description
ShortestPath

Thisalgorithmistheonethatbetterjustifiestheexistenceofgraphdatabases.
ItscalculationisimpossiblewithSQL.InSQLitisneededtospecifythenumber
oflayerstheroutehas.
Firstqueryexample:findtheshortestpathtogofromanairportinMadridtoan
airportinSeoul.
MATCHp=shortestpath((src:Airportcity: ’Madrid’)-[r:FROM|TO*..15]
(dest:Airportcity: ’Seoul’))RETURNp

Figure32:ShortestpathqueryfromMadridtoSeoul
Figure33:Pipelineoftheshortestpathquery
Thenodescanbeexpanded,andweseetheairlinetowhicheachroutebelongs.
Figure34:Expandedshortestpathquery
Secondqueryexample:findtheshortestpathbetweenanairportinSeouland
anairportinAntwerp.
MATCHp=shortestpath((src:Airport{city: ’Seoul’})-[r:FROM|TO*..15]
(dest:Airport{city: ’Antwerp’}))RETURNp
Figure35:ShortestpathqueryfromSeoultoAntwerp

Payingattentiontotherelationships,itcanbeseenthatthequerydoesn’toutputa
physicallypossibletravelingroutefrom theorigincitytotheorigincity.Inthefirst
query,oneofthepathsendsupinSeoul,buttheotherhastwosources,Madridand
Seoul,andtheybothendupinBeijing.Thesecondqueryhasthreeoriginairports,
oneinAntwerpandtwoinSeoul,andalltheroutesfinishinGeneve.

Thepurposeofthealgorithmistofindtheshortestpathtoconnecttwonodes,
independentlyofthephysicalmeaning,butrealroutescanbecreatedwiththe
followingmodification:

Persistentinferredrelationships:Foreachroutegoingfrom
anairporttoanother,arelationshipconnectingbothairportshasbeenadded.Thisway
,the shortestpathquerycanlookforonlyonetypeofrelationship.Iftheobjectiveis
tofindphys-icallypossiblepathsbetweentwoairports(e.g.,notsteppinginto
anairline)itwillbeassuredlookingforthatinferredrelationshipthatairports
arebeingconnectedtoairports.

RelationshipCONNECTED.Thisrelationshiphasthepropertyweight,andispropo
rtionaltothenumberofroutesbetweentwoairports.Itisbeingusedinthe
shortestpathqueriesandcommunitydetectionqueries.

Cyphercodetocreatetherelationship:
MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,COUNT(*)ASweight
CREATE(ap1)-[c:CONNECTED]->(ap2)
SETc.weight=weightInthefigurebelowthedatabaseschemaafteraddingthe
inferredrelationshipisdisplayed:
Figure36:Neo4jDBschemaafteraddingConnectedrelationships
Cyphercodetodeletetherelationship:
MATCH(ap1:Airport)-[r:CONNECTED]->(ap2:Airport)DELETEr

RelationshipGOINGTO.Thisrelationshipsavestherouteandairlineinformation
initsproperties.Itisbeingusedintheshortestpathqueriesandcommunity detec-
tionqueries.
Cyphercodetocreatetherelationship:

MATCH(ap1:Airport)<-[:FROM]-(r:Route)-[:TO]->(ap2:Airport)
WHEREid(ap1)<>id(ap2)
WITHap1,ap2,r
MATCH(r)-[:OF]->(al:Airline)
CREATE(ap1)-[g:GOINGTO]->(ap2)
SETg.distance=r.distance
SETg.route=id(r)
SETg.airline=al.name
Inthefigurebelowthedatabaseschemaafteraddingtheinferredrelationshipis
displayed:
Figure37:Neo4jDBschemaafteraddingGoingtorelationships
Cyphercodetodeletetherelationship:
MATCH(Airport)-[r:GOINGTO]->(Airport)DELETEr
Thefirstshortestpathqueryisrunagainnowwiththeinferredrelationships:
MATCHp=shortestpath((src:Airport{city: ’Madrid’})-[r:GOINGTO]
(dest:Airport{city: ’Seoul’}))RETURNp

Figure38:ShortestpathbetweenMadridandSeoul

Nowtheairportsaredirectlyconnectedtoeachother.Theroutenodecannotbe
seen,butitsidentifierissavedasoneoftherelationshipproperties.Withthe
followingqueryitcanbeverifiediftheroutematchestherequisites:

MATCH(r:Route)WHEREid(r)=50276RETURNr
ItisverifiedthattherelationshipGOINGTOwasequivalenttoarealoutbound
routebetweenMadridandSeoul.Thereturnroutisalsoverified:
MATCH(r:Route)WHEREid(r)=50205RETURNr
Figure39:Shortestpathreturnrouteoutput

ShortestpathinSQLServer:SQLServerhasthelimitationthatitneedtobe
specifiedthenumberoflayersinthepath.Analternativeistousearecursive
query,butfromourexperience,itwasnoteffective.

Whenexecutingthequery,weobtainthefollowingmessage:"Thestatementterminat
ed.Themaximumrecursion100exhaustedbeforestatementcompletion."
Figure40:PipelineofNeo4jqueryonAntwerp-Patnashortestpath
Betweennesscentrality:

Thebetweennesscentralityofanodeinanetworkisthenumberofshortestpaths
betweentwoothermembersinthenetworkonwhichagivennodeappears. Between-
nesscentalityisanimportantmetricbecauseitcanbeusedtoidentify
“brokersofinformation”inthenetworkornodesthatconnectdisparateclusters.[6]

Thisqueryshowstheairportsthathavetobecrossedmoreoftenbyroutesto gofrom
oneairporttoanother.Inotherworlds,theairportswheremore
transferstakeplace.Asitisdisplayed inthefigurebelow,theairports
highlightedarelikebottlenecksthatconnectclustersofairports.

MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL apoc.algo.betweenness([’CONNECTED’],airports,’OUTGOING’)
YIELDnode,score
SETnode.betweenness=score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT25
Figure41:Betweennesscentralityqueryresult

Thequeryoutputsfivebigairports,whicharecommonlyusedtotransferduring
intercontinentaljourneys.Itmakessensethattheyhavethehighestbetwenness
centrality.

Closenesscentrality:

Closenesscentralityistheinverseoftheaveragedistancetoallothercharactersin
thenetwork.Nodeswithhighclosenesscentalityareoftenhighlyconnectedwithin
clustersinthegraph,butnotnecessarilyhighlyconnectedoutsideofthecluster.[6]

Thisqueryoutputstheairportsthathavemoreconnectionstodifferentairports.In
otherwords,itshowsthelocationsthataremoregeographicallyisolatedtobe
reachedbyothermeansoftransport(e.g.islands).Itcanoutputtheairportswith
moredirectflightsfromdifferentlocationsortheairlinesthatperformmoreroutes.

Figure42:Conceptofclosenesscentrality
Queryexample:outputthefiveairportswithahigherclosenesscentrality:
MATCH(ap:Airport)
WITHcollect(ap)ASairports
CALL apoc.algo.closeness([’CONNECTED’], airports, ’OUTGOING’)
YIELDnode,score
RETURNnodeASAirport,scoreORDERBYscoreDESCLIMIT5
Figure43:Closenesscentralityqueryresult

Aspredicted,thequeryoutputsairportsthatareinhighlytouristicbutgeographicallyi
solatedlocations:LopezIslandnearSeattle,theriverAraguaiainthemiddle
ofBrazil,theGrandCanyonofColorado...
Figure44:Locationoftheairportswithhighestclosenesscentrality
Queryperformance:WritingPROFILEbeforethecypherquery,outputsthe
pipelineofthequeryexecution.

Figure45:Pipelineoftheclosenesscentralityquery
PageRank:
ThesecretofGoogle’ssuccesswasitssearchalgorithm,PageRank.PageRank
worksbycountingthenumberandqualityoflinkstoapagetodeterminearough
estimateofhowimportantthewebsiteis.Theunderlyingassumptionisthatmore
importantwebsitesarelikelytoreceivemorelinksfrom otherwebsites[11].This
algorithmcanoutputthemostconnectedairportorthemostpowerfulairline(the
nodeconnectedtomoreroutes).

Firstquery:outputthemostimportantairports
MATCH(ap:Airport)WITHcollect(ap)ASairports
CALLapoc.algo.pageRank(airports)YIELDnode,score
Figure46:Pipelineoftheairportspagerankquery
Themostimportantairportsarefrom London,Paris,Frankfurt,Patna,Dubai,
BeijingandtheUSA.Theoutputisnotsurprising.
Secondquery:Outputthemostpopularairlines.
MATCH(node:Airline)WITHcollect(node)ASairlines
CALLapoc.algo.pageRank(airlines)YIELDnode,score
Figure47:Pipelineoftheairlinespagerankquery
AsaresultwecanseethatRyanairistheleadingairline,followedbyfour
companiesfromtheUSAandthreefromChina.
CommunityDetection:

Therearemanyalgorithmsforcommunitydetection:trianglecounting,strongly
connectedcomponents,...Thisalgorithmsclustertogetherthenodesmore
relatedwitheachother.Wehavechosenanalgorithm from thelibraryAPOC,
andwhatthecodebelowdoes,isclassifytheairportnodesin40partitions.The
classificationisdeterminedontheweightoftheconnectedrelationships(the
numberofroutesbetweeneachpairofairports).

Seeingasairportsaregeographicallocation,androutsarephysicaljourneys
betweenthem,itisexpectedthatgeographicallyneighbouringairportswillbe
clusteredto-gether.Thathypothesisisverifiedbelow.

CALL apoc.algo.community(40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)
MATCH(ap:Airport)WHEREexists(ap.partition)RETURNap

Figure48:Communitydetectiongraph

Thefigureovertheselinesshowstheshapeofthegraphafterthenodeshave
beenclassifiedinpartitions.Toseewhichnodesbelongtoeachpartition,the
partitionnumbermustbereturnedasoutput:

CALLapoc.algo.community(40,[’Airport’],’partition’,
’CONNECTED’,’OUTGOING’,’weight’,10000)

MATCH(ap:Airport)WHEREexists(ap.partition)
RETURNap.partition,ap.country,COUNT(*)ASnum
ORDERBYap.partition,numDESC

Figure49:Communitydetectiontable

Goingbacktothevisualizationofthecommunitydetectionforairports,thepartitionsc
anberecognizedandverifiedbylookingatthetable.Theclusterofsix
nodesdisconnectedfrom therestofairportsiscomprisedofPapuaNewGuinea
airports(thecountrycanbeseenbyhoveringoverthenodes).Theybelongtothe
firstpartitioninthetable,6394.

Thefollowingpartofthegraphisabitscattered,butitcanbeseenthattheyareall
communicatedtothecentralnodes.Hoveringoverthem,weseethattheyallbelong
toCanada,andwecansupposethatthemoreseparatednodesareregionalairports
connectedtobiggermoreimportantairports.Thatpartofthegraphisequivalentto
sevenpartitionsinthetable.

NexttoCanada,agroupofnodesareseparated,andthoseairportsareallfrom
Algeria.Theymustbelongtopartition6624.

ThemorecentralizedpartofthissubgrapharetheairportsfromFinland.Someof
thoseareconnected withaGreenland’sairport,whichconnectswithother
GreenlandandIcelandairports.

ThenextsubgraphshowsairportsfromdifferentAfricancountriesinterconnected
witheachother.Ontheleftside,thereareairports,andairportsfrom african
countrieshighlyconnectedtothem,andontherightsidetherearemainlynigerian
airports,amongotherafricanaiportstoo.

Goingbacktothecenterofthegraph,itishardtorecognizemorethanonepartition,
asitshowsthecentraleuropeanairports,whicharehighlyinterconnected.
Atlast,apartitionwasdetectedinthetable,8355.Checkingifthoseairportsare
geographicallyrelated,ithasbeendeterminedthatthoseareislandsbetween
Polynesia,MicronesiaandMelanesia.that
(b)Geographicallocation
(a)Partitiontable
Figure50:Australasiapartition
PossiblequeriesonSQL
TheprevioussectionshowedoperationsthatcannotbedonewithSQL.Nowwe
willpresentoperationsapplicabletoboth;
Findingflightsbetweentwoairportsthathavenodirectroutebe-tweenthem:

MATCH
selectdistinctA1.Nameas
p=allShortestPaths((ap1:Airport [1stAirport]
{city:’Antwerp’})-[*]->(ap2:Airport ,airline1.nameas[1st
{city:’Patna’})) Airline],
WITHextract(nodein A2.Nameas[2ndAirport],
nodes(p)|node.name)as airline2.nameas[2nd
cities, Airline],
extract(relin A3.Name[3rdAirport],
relationships(p)|rel.airline)as airline3.name[3rdAirline],
airlines a4.name[4thAirport]
RETURNcities,airlines FROMroutesrINNERJOIN airportsa1
ONr.source_airport_id=a1.ID
ĢINNERJOINairlinesairline1
ONairline1.id=r.airline_id
INNERJOINairportsa2
ON
r.destination_airport_id=a2.ID
INNERJOINroutesr2
ona2.ID=r2.source_airport_id
INNERJOINairlinesairline2
onairline2.id=r2.airline_id
INNERJOINairportsa3
ON
r2.destination_airport_id=a3.ID
INNERJOINroutesr3
ona3.id=r3.source_airport_id
INNERJOINairlinesairline3
onairline3.id=r3.airline_id
INNERJOINairportsa4
on
a4.id=r3.destination_airport_id
WHEREa1.city=’Antwerp’and
a4.city=’Patna’
(a)Neo4jResult
(b)SQLResult Figure51:ComparisonofQueries-firstquery
Asitcanbeseenfromherefindingallpossibleroutesbetweentwoairportsiseasyin
Neo4j.BesidesthatNeo4jgivesvisualization.

Thereisoneimportantpointhere;InSQLwehavetospecifylevelofdepthtofind
results.Forexampleinthisquerywesearched3-levelflightsbetweenAntwerpand
Patna.Ifwesearched1or2levelthenthequerywouldhavereturnednoresult.Butin
Neo4jwedon’thavetospecifylevel,itfindsallroutesbetweentwoairportsandeven
calculatestheshortestroute.ThereforethisisoneofthedrawbacksofusingSQLin
datathathaslevels.
Nearestairporttocitybydistance

Select
match(airport1:Airport{city:’Bologna’} top1
)<-[:FROM]-(route:Route) A2.name,a2.city,a2.country
-[:TO]->(airport2:Airport) ,dbo.DistanceKM(a.latitude,a2. latitude,
RETURNairport1, A.longitude,A2.longitude)as
route,airport2 distance
ORDERBYroute.distance fromroutesr
asclimit1 INNERJOINairportsa
ona.id=r.source_airport_id
INNERJOINairportsa2
on
a2.id=r.destination_airport_id
WHEREA.city=’Bologna’
orderbydistanceasc

WhilewewereuploadingourdataintoNeo4jwecreatedanodecalledroute
andthisnodehasthreerelationships;TO,FROM,OFandasadescriptive
propertyweassignedcalculateddistancepropertyintoroutenode.Tobeinthe
samepagewecreatedafunctioninSQLthatcalculatesdistancesbetween
airportsgivenlat-itudeandlongitudeattributesofairportswhichalreadyexists
inourdata.BothapproachesgivethesameresultbutNeo4jalsoprovides
visualization.
Mostconnectedairports

MATCH SELECT
(airport:Airport)<-[:FROM]-(r:Route)
A.Name,A.City,A.Country,SUM(A.route_count)
WITHairport,count(r)as ASroute_count
departures FROM(
MATCH SELECT
(r2:Route)-[:TO]->(airport) a.Name,a.City,a.Country,
RETURNairport.nameas COUNT(*)asroute_countFROM
airport_name,departures routesR
,count(r2)asarrivals INNERJOINairportsAON
orderby A.ID=source_airport_id
departures+arrivalsdesc
GROUPBY
a.Name,a.City,a.Country
)
UNION(
SELECT
a.Name,a.City,a.Country,COUNT(*)
asroute_countFROM
routesR
INNERJOINairportsAON
A.ID=destination_airport_id
GROUPBY
a.Name,a.City,a.Country))A
GROUPBY A.Name,A.City,A.CountryORDER
(a)Neo4jQuery
(b)SQLquery
Figure52:Comparisonofqueries-thirdquery
Withthesequerieswefoundthemostinterconnectedairportbycountingnumberof
incomingandoutcomingflights.AsitseemsitisveryeasytowriteinNeo4j.

Bibliography
TareqAbedrabboDominicFoxJonasPartnerAleksaVukotic,NickiWatt.Neo4jin
Action.ManningPublications,2015.
StephanC.Carlson.Graphtheory.encyclopediabritannica.Availableathttps://ww
w. britannica.com/topic/graph-theory,May2013.Accessed:2017-11-30.

DB-
Engines.Knowledgebaseofrelationalandnosqldatabasemanagementsystems.
Availableathttps://db-engines.com/en/,2017.Accessed:2017-10-20. Martinez-
BazanN.Muntes-MuleroV.BaletaP.Larriba-PayJ.L.Dominguez-Sal,D.A
discussiononthedesignofgraphdatabasebenchmarks.September2010.

StefanEdlich.Nosqlarchive.Availableathttp://nosql-database.org/.
Accessed:2017-11-20.
Mathigon.Graphsandnetworks.Accessed:2017-11-30.
ThomasVialMichelDomenjoud.Graphdatabases:anoverview.OctoTalks,
July2012.Accessed:2017-11-30.
Neo4j.Introtocypher.
Neo4j.Toptenreasonsforchoosingneo4j.Availableathttps://neo4j.com/top-
tenreasons/.
Neo4j.Neo4jgraphalgorithms.Github,October2017.Accessed:2017-12-8.
University ofColorado.Database managementessentials.
Available at https://www.youtube.com/playlist?list=
PL73oFZbnYuixa9w-dL-EsM7Vy5BQGBIeO.Accessed:2017-10-21.
OpenFlights.org.Airport,airlineandroutedata.Availableathttps://
openflights.org/data.html.Accessed:2017-11-3.

TutorialsPoint.Graphtheory:Introduction.Availableathttps://www.tutorialspoin
t. com/graph_theory/graph_theory_introduction.htm.Accessed:2017-11-30.
JamesSerra.Relationaldatabasesvsnon-relationaldatabases.BigDataandData
Warehousing.JamesSerra’sBlog,August2015.Accessed:2017-11-29.

JamesSerra.Typesofnosqldatabases.BigDataandDataWarehousing.James
Serra’sBlog,April2015.Accessed:2017-11-29.
RoopendraVishwakarma.Thedifferenttypesofnosqldatabases.OpenSourceForU
, May2017.Accessed:2017-11-29.
S.Abiteboul.QueryingSemiStructuredData.InProc.ofthe6thInt.Conf.onDatabase
Theory(ICDT),volume1186ofLNCS,pages1–18.Springer,Jan1997.
S.AbiteboulandR.Hull.IFO:AFormalSemanticDatabaseModel.InProc.ofthe3th
Symposium onPrinciplesofDatabaseSystems(PODS),pages119–132.ACM
Press, 1984.

S.Abiteboul,D.Quass,J.McHugh,J.Widom,andJ.L.Wiener.TheLorelquerylangu
age
forsemistructureddata.InternationalJournalonDigitalLibraries(JODL),1(1):68–
88, 1997.

S.AbiteboulandV.Vianu.QueriesandComputationontheWeb.InProc.ofthe6thInt.
Conf.onDatabaseTheory(ICDT),volume1186ofLNCS,pages262–
275.Springer,Jan 1997.

R.AgrawalandH.V.Jagadish.EfficientSearchinVeryLargeDatabases.InProc.ofth
e14th Int.Conf.onVeryLargeDataBases(VLDB),pages407–
418.MorganKaufmann,AugSept 1988.

R.AgrawalandH.V.Jagadish.MaterializationandIncrementalUpdateofPathInfor
mation. InProc.ofthe5thInt.Conf.onDataEngineering(ICDE),pages374–
383.IEEEComputer Society,Feb1989.

R.AgrawalandH.V.Jagadish.AlgorithmsforSearchingMassiveGraphs.IEEE
TransactionsonKnowledgeandDataEngineering(TKDE),6(2):225–238,1994.
R.AlbertandA.L.Barabasi.Statisticalmechanicsofcomplexnetworks.Reviewsof
ModernPhysics,74:47,Jan2002.
N.Alechina,S.Demri,andM.deRijke.AModalPerspectiveonPathConstraints.Jou
rnalof LogicandComputation,13(6):939–956,2003.
B.AmannandM.Scholl.Gram:AGraphDataModelandQueryLanguage.InEuropea
n ConferenceonHypertextTechnology(ECHT),pages201–
211.ACM,NovDec1992.

M.AndriesandG.Engels.AHybridQueryLanguageforanExtendedEntityRelations
hip
Model.TechnicalReportTR9315,InstituteofAdvancedComputerScience,Univer
siteit Leiden,May1993.

M.Andries,M.Gemis,J.Paredaens,I.Thyssens,andJ.V.denBussche.Conceptsfor
GraphOrientedObjectManipulation.InProc.ofthe3rdInt.Conf.onExtendingDatab
ase Technology(EDBT),volume580ofLNCS,pages21–
38.Springer,March1992.

R.AnglesandC.Gutierrez.QueryingRDFDatafromaGraphDatabasePerspective.I
nProc.
2ndEuropeanSemanticWebConference(ESWC),number3532inLNCS,pages346
–360,
2005.

M.A.AufaurePortierand C.Tr´epied.A SurveyofQueryLanguages


forGeographic
InformationSystems.InProc.ofthe3rdInt.WorkshoponInterfacestoDatabases,pag
es 431–438,July1976.

M.AzmoodehandH.Du.GQL,AGraphicalQueryLanguageforSemanticDatabases
.In Proc.ofthe4th Int.Conf.on Scientificand StatisticalDatabaseManagement
(SSDBM),volume339ofLNCS,pages259–277.Springer,June1988.

C.Beeri.DataModelsandLanguagesforDatabases.InProc.ofthe2ndInt.Conf.on
DatabaseTheory(ICDT),volume326ofLNCS,pages19–
40.Springer,AugSept1988.
G.Benk¨o,C.Flamm,andP.F.Stadler.AGraphBasedToyModelofChemistry.Journ
al ofChemicalInformationandComputerSciences(JCISD),43(1):1085–
1093,Jan2003.
C.Berge.GraphsandHypergraphs.NorthHolland,Amsterdam,1973.
U.Brandes.NetworkAnalysis.Number3418inLNCS.SpringerVerlag,2005.

T.Bray,J.Paoli,andC.M.SperbergMcQueen.ExtensibleMarkupLanguage(XML)
1.0, W3C Recommendation 10 February 1998.
http://www.w3.org/TR/1998/RECxml19980210.

A.Broder,R.Kumar,F.Maghoul,P.Raghavan,S.Rajagopalan,R.Stata,A.Tomkins,
and J.Wiener.GraphstructureintheWeb.In
Proc.ofthe9thInt.WorldWideWebconferenceonComputernetworks:theinternatio
naljournalof computerandtelecommunicationsnetworking,pages309–
320.NorthHollandPublishingCo.,2000.
P.Buneman.SemistructuredData.InProc.ofthe16thSymposium onPrinciplesof
DatabaseSystems(PODS),pages117–121.ACMPress,May1997.

You might also like