KEMBAR78
Publishing Data Using Semantic Web Technologies | PDF
Publishing Data Using Semantic
       Web Technologies
  An introduction for software engineers


              Nikolaos Konstantinou, Ph.D.
        National Documentation Centre / N.H.R.F.
Lecture Outline
•   Introduction to the Semantic Web
•   Semantic Web Languages
•   Publishing RDF Using OpenLink’s Virtuoso
•   Linked Open Data and Examples




November 9, 2011   National Documentation Centre / NHRF   2
The Problem (1)
• Keyword-based queries cannot be expressive
• E.g. search for:
      – Cities in the U.S. with more than 100,000
        inhabitants
      – Italian painters of the 18th century
• Web resources
      – Do not (usually) convey their meaning




November 9, 2011      National Documentation Centre / NHRF   3
The Problem (2)
• Seeking specific information in the Web or a
  repository
• Integrating distributed data sources
• Need for data annotation
      – Necessary for data non-readable by human
            • E.g. binary information, multimedia
      – Annotation may be redundant, incomplete, or
        erroneous
      – When it is present it does not necessarily follow a
        standard pattern
November 9, 2011           National Documentation Centre / NHRF   4
The Semantic Web Paradigm (1)
• ‘Web of Data’ as in a ‘Web of Documents’
      – Web resources uniquely identified by their URI
• Assign an unambiguously defined meaning to
  information, its semantics
      – Ontology, a well defined vocabulary
      – Queries can be posed by any third parties
• Knowledge modeled in the form of a graph
      – subject, predicate, object
• Interconnected data sets on the Web
      – Provide context Documentation Centre / NHRF
November 9, 2011      National                           5
The Semantic Web Paradigm (2)
• Enables semantic annotation, interoperability,
  integration of information
• Enables reasoning
      – Extract implicit information
      – Assure concept consistency
• Variety of mature, open source tools available
      – Protégé, Jena, Virtuoso, D2RQ, …
• Allows information to be exposed as Linked Open
  Data (to be discussed later on)
• Data  Information  Knowledge
November 9, 2011      National Documentation Centre / NHRF   6
What is an Ontology?
• In philosophy, Ontology is the study of beings
      – Onto (ὤν/ὄντος) + logy (λογία)
      – Along with their properties and relations
• In computer science, an ontology is the formal
  representation of knowledge
      – A formal, explicit specification of a shared
        conceptualisation
      – Concepts of a domain, objects and their relations
      – Allows complexity in schemas
• The RDF and OWL approaches
November 9, 2011      National Documentation Centre / NHRF   7
Lecture Outline
•   Introduction to the Semantic Web
•   Semantic Web Languages
•   Publishing RDF Using OpenLink’s Virtuoso
•   Linked Open Data and Examples




November 9, 2011   National Documentation Centre / NHRF   8
The Resource Description Framework
  • The Resource Description Framework is about
    describing resources
      – Was initially proposed for describing Web resources
  • RDF can be viewed as a graph where
      – Objects are graph nodes
      – Properties are graph edges
                 foaf:name                                      Graph triples
   ex:Author                    “J. Smith”           ex:Author foaf:name         “J. Smith”
                                                     ex:Author ex:participatesIn ex:Publication
                  ex:participatesIn
                                                     ex:Author foaf:knows        ex:anotherAuthor
foaf:knows
                             ex:Publication
    ex:anotherAuthor
                              National Documentation Centre / NHRF
The RDF Schema (1)
• Describing Web Resources using RDF
      – rdfs:Resource
            • All things described by RDF are resources
      – rdfs:Class
            • The class of resources that are classes, i.e. the class of
              classes
      – rdf:type
            • States resource membership
            • E.g.: ex:Person rdf:type rdfs:Class
      – rdf:Property
            • The relations between subjects and objects
November 9, 2011            National Documentation Centre / NHRF
The RDF Schema (2)
• Describing Web Resources using RDF
      – rdfs:SubClassOf
            • foaf:Agent rdfs:subClassOf foaf:Person
      – rdfs:SubPropertyOf
            • Allow class and property hierarchies
            • E.g.: ex:hasFirstName rdfs:subpropertyOf ex:hasName
      – rdfs:domain
            • ex:employer rdfs:domain foaf:Person
      – rdfs:range
            • ex:employer rdfs:range foaf:Organization

November 9, 2011           National Documentation Centre / NHRF
The RDF Schema (3)
• Describing Web Resources using RDF
      – rdfs:Container         – rdf:List   – rdf:statement
         • rdf:Bag            • rdf:first   – rdf:subject
         • rdf:Seq            • rdf:rest    – rdf:predicate
         • rdf:Alt            • rdf:nil
                                            – rdf:object
         • rdfs:ContainerMembershipProperty
                                            – rdf:value
         • rdfs:member
                                            – rdfs:seeAlso
      – rdfs:label
                                            – rdfs:isDefinedBy
      – rdfs:comment


November 9, 2011      National Documentation Centre / NHRF
The RDF Schema (4)
• Example 1
                   <rdfs:Class rdf:ID="animal" />
                     <rdfs:Class rdf:ID="horse">
                     <rdfs:subClassOf rdf:resource="#animal"/>
                   </rdfs:Class>

• Example 2
                   <rdf:Description rdf:about="http://www.ekt.gr">
                    <dc:description>National Documentation Centre</dc:description>
                    <dc:publisher>NHRF</dc:publisher>
                    <dc:date>2001-02-16</dc:date>
                    <dc:format>text/html</dc:format>
                    <dc:language>el</dc:language>
                   </rdf:Description>
November 9, 2011                National Documentation Centre / NHRF
Web Ontology Language (1)
• Based on Description Logics
      – Decidable fragment of First Order Logic
• Allows more complex schema definitions
• OWL builds on top of RDF      Woman ≡ Person ∩ Female

                             Father ≡ Man ∩ ∃hasChild.Person
• Current version is OWL 2  Wife ≡ Woman ∩ ∃hasHusband.Man

                   rdfs:Resource                               MotherWithoutDaughter ≡ Mother ∩
                                                                     ∀hasChild. ¬Woman

  rdfs:Class                       rdf:Property

                   owl:datatypeProperty                              owl:functionalProperty

owl:Class                                       owl:objectTypeProperty
November 9, 2011              National Documentation Centre / NHRF
Web Ontology Language (2)
• Class description • Property description
      – owl:intersectionOf             – owl:datatypeProperty
      – owl:unionOf                    – owl:objectProperty
      – owl:complementOf               – owl:equivalentProperty
      – owl:equivalentClass            – owl:inverseOf
      – owl:disjointWith                     • isTaughtBy ↔ teaches
      – Cardinality              – owl:functionalProperty
            • owl:maxCardinality – owl:inverseFunctionalProperty
            • owl:minCardinality – owl:transitiveProperty
            • owl:cardinality    – owl:symmetricProperty
November 9, 2011        National Documentation Centre / NHRF
Web Ontology Language (3)
• owl:Thing                                        • Individuals
• owl:Nothing                                            – owl:sameAs
• Version information                                    – owl:differentFrom
      – owl:versionInfo                                  – owl:allDifferent
      – owl:priorVersion       • Value constraints
      – owl:backwardCompatibleWith– owl:allValuesFrom
      – owl:incompatibleWith      – owl:someValuesFrom
      – owl:deprecatedClass       – owl:hasValue
      – owl:deprecatedProperty
November 9, 2011        National Documentation Centre / NHRF
Web Ontology Language (4)
• Example 1
                   :RedBordeaux rdf:type owl:Class ;
                                owl:equivalentClass [ rdf:type owl:Class ;
                                owl:intersectionOf ( :Bordeaux :RedWine ) ] .

• Example 2
                   :locatedIn rdf:type owl:ObjectProperty ,
                                       owl:TransitiveProperty ;
                              rdfs:domain owl:Thing ;
                              rdfs:range :Region .

• Example 3
                   :BordeauxRegion rdf:type owl:NamedIndividual ,
                                            :Region ;
November 9, 2011
                                   :locatedIn :FrenchRegion .
Web Ontology Language (5)
• Example 4
    :hasColor rdf:type owl:FunctionalProperty ,
                       owl:ObjectProperty ;
              rdfs:domain :Wine ;
              rdfs:range :WineColor ;
              rdfs:subPropertyOf :hasWineDescriptor .

• Example 5
     :CabernetSauvignon rdf:type owl:Class
               owl:equivalentClass [ rdf:type owl:Class ;
                      owl:intersectionOf ( :Wine
                         [ rdf:type owl:Restriction ;
                           owl:onProperty :madeFromGrape ;
                           owl:hasValue :CabernetSauvignonGrape
                         ] [ rdf:type owl:Restriction ;
                             owl:onProperty :madeFromGrape ;
November 9, 2011
                             owl:maxCardinality "1"^^xsd:nonNegativeInteger] ) ] ;
Web Ontology Language (6)
• OWL 1 flavors
      – OWL Full, full language expressivity
      – OWL DL, maximal subset allowing reasoner support
      – OWL Lite, minimal useful subset of language features
• OWL 2 profiles
      – OWL 2 EL, for large numbers of classes/properties
      – OWL 2 QL, large volume of instance data support,
        relational database-friendly
      – OWL 2 RL, RDFS with extra expressivity, scalable
        reasoning
November 9, 2011        National Documentation Centre / NHRF
Reasoning
•   Check ontology consistency
•   Class expression subsumption
•   Concept satisfiability
•   Infer implicit information
      – Produces extra (inferred) triples
• Numerous reasoners available
      – Free
            • Pellet, FaCT++, Jena, Hermit
      – Non-free
            • OWLIM, OntoBroker
November 9, 2011           National Documentation Centre / NHRF   20
Ontology Authoring (1)
• Protégé is a prominent
  GUI solution
      – Java-based, open-source
      – OWL/RDF capabilities
      – Includes FaCT++ reasoner
      – WebProtégé in beta
      – Extensible through
        plugins
             • E.g. Ontograf

      Available online at http://protege.stanford.edu/

November 9, 2011
Ontology Authoring (2)
• Using HP’s Jena
      – Large, active community
      – Apache Maven group id com.hp.hpl.jena
      – API Example
           String ns = "http://example.com/sample#";
           Model model = ModelFactory.createDefaultModel();
           Resource resource = model.createResource(ns +
           "Individual1");
           resource.addProperty(DC.title, title);
           model.write(file, “RDF/XML”);

November 9, 2011         National Documentation Centre / NHRF
Adding Reasoning Capabilities
• Using HP’s Jena
      – Create an RDFS model using the Jena API
           String ns = "http://www.example.com/ex#";
           Model rdfsEx = ModelFactory.createDefaultModel();
             Property p = rdfsEx.createProperty(ns, "p");
             Property q = rdfsEx.createProperty(ns, "q");
             rdfsEx.add(p, RDFS.subPropertyOf, q);
             rdfsEx.createResource(NS+"a").addProperty(p, "foo");
      – Adding the internal RDFS reasoner
           Reasoner reasoner = ReasonerRegistry.getRDFSReasoner();
           InfModel inf = ModelFactory.createInfModel(reasoner, rdfsEx);



November 9, 2011             National Documentation Centre / NHRF
Querying Ontologies
• SPARQL is to ontologies what SQL is to
  relational databases
      – W3C recommendation since 2008
• Designed using an SQL-like syntax
      – SELECT … FROM … WHERE
• The WHERE conditions are a triple pattern
• Returns graphs instead of a tables
• Example
           SELECT ?x ?y ?z WHERE { ?x ?y ?z }
          returns all theNational Documentation Centregraph
November 9, 2011
                          triples in the / NHRF
Introduction to SPARQL (1)
• Selecting a single value
    SELECT ?x
    WHERE { ?x <ex:hasName> "John Smith" }
• Matching values from a graph
    SELECT ?x ?fname
    WHERE {?x <ex:hasName> ?fname}
• Also
     SELECT ?name ?value
     WHERE { ?x <ex:hasAttribute> ?attr .
November 9, 2011
                 ?attr <ex:hasValue> ?value . }
                         National Documentation Centre / NHRF
Introduction to SPARQL (2)
• String matching using regular expressions
      SELECT ?y
      WHERE
      { ?x vcard:Given ?y .
        FILTER regex(?y, "r", "i") }
• Filtering values
      SELECT ?resource
      WHERE {
        ?resource info:age ?age .
        FILTER (?age >= 24) }
November 9, 2011        National Documentation Centre / NHRF
Introduction to SPARQL (3)
• The OPTIONAL construct to return information
  where available
      SELECT ?name ?age
      WHERE {
        ?person vcard:FN ?name .
        OPTIONAL { ?person info:age ?age } }
• The UNION construct
      SELECT ?name
      WHERE {
       { [] foaf:name ?name } UNION { [] vCard:FN ?name }
November 9, 2011       National Documentation Centre / NHRF
Introduction to SPARQL (4)
• Result handling
    – ORDER BY, DISTINCT, OFFSET and LIMIT
        • Same as in SQL
    – CONSTRUCT
             • Ability to construct a new graph based on the results
              PREFIX foaf: <http://xmlns.com/foaf/0.1/>
                 CONSTRUCT { <http://example.com/person#Alice>
             foaf:knows ?x }
                 FROM <http://example.org/foaf/people>
                 WHERE { ?x foaf:name ?name }
                 ORDER BY desc(?name)
November 9, 2011 LIMIT 10    National Documentation Centre / NHRF
Common Vocabularies (1)
• DC
  – Describe library asset information
• SKOS
  – Simple Knowledge Organization Scheme
• FOAF
  – Friend of a friend
• SIOC
  – Semantically Interlinked Online Communities
• DBPedia
  – Extract structured information from Wikipedia   29
Common Vocabularies (2)
• Music ontology
      – Describe music concepts
• Good relations
      – Used in the e-commerce context
      – Supported by Google and Yahoo
• Basic Geo Vocabulary
      – Expresses spatial information using WGS84
• Creative commons
      – Express copyright information
November 9, 2011       National Documentation Centre / NHRF   30
Common Vocabularies (3)
• Microformats are open data standards for
  publishing structured information on the Web
• Simple, solve specific problems
• No change in
  display
• Examples
  – hCard
  – hCalendar
  – RDFa
• For SEO, see also schema.org
Embedded RDF
• RDFa
     – Embed RDF in XHTML documents
     – Uses <span>, <div>
     – Allows nested
       descriptions
• GRDDL
     – Obtain RDF from
       HTML pages
     – Uses XSLT for XML

                                                 32
Both RDFa and GRDDL are W3C recommendations
Ontology Serialisation Formats
• RDF+XML
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:dc="http://purl.org/dc/elements/1.1/">
        <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
           <dc:title>Tony Benn</dc:title>
           <dc:publisher>Wikipedia</dc:publisher>
        </rdf:Description>
      </rdf:RDF>

• N3 and Turtle (Turtle ⊆ N3)
      @prefix dc: <http://purl.org/dc/elements/1.1/>.
      <http://en.wikipedia.org/wiki/Tony_Benn>
       dc:title "Tony Benn";
       dc:publisher "Wikipedia".

• ... and, of course, in the database!

November 9, 2011                     National Documentation Centre / NHRF      33
Lecture Outline
•   Introduction to the Semantic Web
•   Semantic Web Languages
•   Publishing RDF Using OpenLink’s Virtuoso
•   Linked Open Data and Examples




November 9, 2011   National Documentation Centre / NHRF   34
Triplestores
• A triplestore contains records in the form
      – (subject, predicate, object)
• Uses a relational database backend
• Saving an Ontology in a triplestore
      – Jena
            • http://jena.sourceforge.net
      – Sesame
            • http://www.openrdf.org
      – Virtuoso
            • http://virtuoso.openlinksw.com
      – Oracle
            • http://www.oracle.com/technetwork/database
November 9, 2011             National Documentation Centre / NHRF   35
RDB to RDF Mapping Language
• Several tools proposed in the early years
      – Triplify, D2OMapper, DB2OWL, VisaVis, R2O,
        MapOnto, …
• R2RML: a W3C working draft
      – Implementations
            • D2RQ
            • Virtuoso




November 9, 2011         National Documentation Centre / NHRF   36
Virtuoso Overview (1)
• Open source and commercial version
• Can be used as
      – A web application server
      – A relational database repository
            • Offers a JDBC Driver
            • Collaborates with Jena
            • Offers Conductor, a GUI for server administration
      – A web service server
      – A triplestore
            • Export RDF data from same DB or others
November 9, 2011           National Documentation Centre / NHRF   37
Virtuoso Overview (2)
• RDF Views
      – Export relational data as triples
• SPARQL 1.1 support, plus
      – Full Text Queries
      – Geo Spatial Queries
      – Business Analytics and Intelligence
      – SQL Stored Procedure and Built-In Function
        exploitation from SPARQL
      – Create, Update, and Delete (SPARUL)
• Cluster Configuration
      – Parallel and Horizontal scaling
November 9, 2011         National Documentation Centre / NHRF   38
Virtuoso Overview (3)
• Extendable through VAD* Packages
      – Interactive SPARQL Query Builder
            • A GUI to create SPARQL queries
      – Sponger Middleware
            • Offers RDF Mappers to import data into Virtuoso
      – PubSubHub Protocol (for RSS)
            • Can be used to allow push behavior and subscriptions
              by clients
      – OAT (OpenLink AJAX Toolkit) Framework
            • Rich web application development


* Virtuoso Application Distribution
November 9, 2011           National Documentation Centre / NHRF      39
Virtuoso Reasoning Engine
• Backward-chaining OWL reasoner coverage
      –   rdfs:subClassOf
      –   rdfs:subPropertyOf
      –   owl:sameAs
      –   owl:equivalentClass
      –   owl:equivalentProperty
      –   owl:InverseFunctionalProperty
      –   owl:inverseOf
      –   owl:SymmetricalProperty
      –   owl:TransitiveProperty


November 9, 2011         National Documentation Centre / NHRF   40
Virtuoso Sponger
• An RDF-iser to bring data into the Semantic
  Web
• Sponger extracts RDF data from non-RDF
  sources
• A Cartridge per data source
• XSLT templates do the work
• Customisable and Programmable
      – Virtuoso PL, C++, Java


November 9, 2011      National Documentation Centre / NHRF   41
Virtuoso as a DB Server (1)
• Conductor: a GUI for server administration




                                               42
Virtuoso as a DB Server (2)
• Can export data as RDF using RDF Views




                                           43
Virtuoso as an RDF Server (1)
• A URI for every resource, browseable repository




                                              44
Virtuoso as an RDF Server (2)
• Example: Measurement URI




November 9, 2011         National Documentation Centre / NHRF   45
Virtuoso as an RDF Server (3)
• RDF data also accessible via
      – ODBC, JDBC, OLE DB, XMLA, ADO.NET
• Difficulties in extracting RDF Data
      – Tables must have a primary key
      – Mappings are defined using regular expressions
        and tend to be complicated
      DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'demo_rule6‘, 1, '/Demo/objects/(*^#+*)‘, vector('path'), 1,
      '/sparql?query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^/Demo/objects/%U%%3E+FROM+%
      %3Chttp%%3A//^{URIQADefaultHost}^/Demo%%23%%3E&format=%U‘, vector('path', '*accept*'), null,
      '(text/rdf.n3)|(application/rdf.xml)‘, 2, null );




November 9, 2011                    National Documentation Centre / NHRF                              46
Virtuoso Jena Provider
• Offered by
  OpenLink
• Native Graph
  Model Storage
  Provider
• Enables access to
  the Virtuoso RDF
  Quad store
  through Jena
November 9, 2011      National Documentation Centre / NHRF   47
Querying Remote Repositories
• XML over HTTP (RESTful approach)
      – http://demo.openlinksw.com/sparql?default-
        graph-
        uri=urn:lsid:ubio.org:namebank:11815&should-
        sponge=soft&query=SELECT+*+WHERE+{?s+?p+?
        o}&format=text/html
• No create/update/delete capabilities




November 9, 2011     National Documentation Centre / NHRF   48
SPARQL Query Interface (1)
• SPARQL queries can be named and stored
      – A query named sparql-demo listens to:
        http://localhost:8890/DAV/sparql-demo
• Can return results over HTTP (XML by default)
• MIME type of the RDF data
      – 'rdf+xml' (default) | 'n3' | 'turtle' | 'ttl‘




November 9, 2011        National Documentation Centre / NHRF   49
SPARQL Query Interface (2)
• SPARQL results example in RDF/XML
<ROOT>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rs="http://www.w3.org/2005/sparql-results#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
<rs:results rdf:nodeID="rset">
      <rs:result rdf:nodeID="sol193">
                <rs:binding rdf:nodeID="sol193-0" rs:name="x“>
                               <rs:value rdf:resource="http://localhost:8890/Demo/temperature/PK/4#this"/>
                </rs:binding>
                <rs:binding rdf:nodeID="sol193-1" rs:name="y“>
                               <rs:value rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/></rs:binding>
                <rs:binding rdf:nodeID="sol193-2" rs:name="z“>
                               <rs:value rdf:resource="http://localhost:8890/schemas/Demo/temperature"/>
                </rs:binding>
      </rs:result>
…
</rs:results>
</rdf:RDF>
</ROOT>



November 9, 2011                           National Documentation Centre / NHRF                                          50
Lecture Outline
•   Introduction to the Semantic Web
•   Semantic Web Languages
•   Publishing RDF Using OpenLink’s Virtuoso
•   Linked Open Data and Examples




November 9, 2011   National Documentation Centre / NHRF   51
The Linked Open Data Cloud (1)
       • Data available on the Web
              – Under an open license
       • Available as structured data
              – Excel sheet instead of a scanned image
       • Use non-proprietary format
              – CSV, RDF instead of DOC, XLS
       • Use linked data format
              – URIs to identify things
       • Linked to other people’s data
              – Provision of context                                   52
Also see: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
The Linked Open Data Cloud (2)
• Interconnected datasets using URI’s and RDF

                                                                       2011

                     2007




Source: http://linkeddata.org
Also see the       datahub: http://thedatahub.org/group/lodcloud
November 9, 2011                National Documentation Centre / NHRF   53
The Linked Open Data Cloud (3)
    • Consumer capabilities
         – Access it, print it, store it locally, enter the data in
           another system
         – Process, aggregate, visualise, manipulate, export
           in another format, reuse
         – Avoid vendor lock-ins
    • Publisher capabilities
         – Make data discoverable
         – Increase the value of the data
              • Allow added-value services
               – Fine-granular control over the data
Also see: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
                                                                       54
Open Government Data
• Prominent examples include
      – data.gov (US)
      – data.gov.uk, (UK)
      – data.london.gov.uk (UK)
      – digitaliser.dk (DK)
      – data.govt.nz (NZ)
      – linkedopendata.it (IT)
      – geodata.gov.gr (GR)


      Promotional video: http://opengovernmentdata.org/film/
November 9, 2011           National Documentation Centre / NHRF   55
The DBPedia Project
• Structured information based on Wikipedia
• SPARQL endpoint: dbpedia.org/sparql
• Example: Greece on DBPedia
New York Times Public Data
• News
  data




November 9, 2011        National Documentation Centre / NHRF   57
The Claros Project
• The world of art on the Semantic Web
• Uses the CIDOC-CRM vocabulary to describe
      – Objects
      – Places
      – Periods
      – People
• OWL DL
• RESTful

November 9, 2011
The OpenCalais Project
• Creates semantic metadata for submitted
  content
• By Thomson
  Reuters
• Free to all
• Extracts RDF
  using NLP
• Uses WS

November 9, 2011
The TrueKnowledge Engine
• Based on facts, not keywords
Thank you for your attention!
        Questions?
Appendix
• Installing Virtuoso as a system service
      – Windows 7
            • Download and extract from
              http://virtuoso.openlinksw.com/dataspace/dav/wiki/Mai
              n/VOSDownload
            • Open a command line prompt as administrator
            • Register required DLL
                   – regsvr32 virtodbc.dll
            • Install service
                   – virtuoso-t +service screate +instance "DB" +configfile virtuoso.ini
      – Ubuntu 10.04 LTS
            • sudo apt-get install virtuoso-server
November 9, 2011                 National Documentation Centre / NHRF               62

Publishing Data Using Semantic Web Technologies

  • 1.
    Publishing Data UsingSemantic Web Technologies An introduction for software engineers Nikolaos Konstantinou, Ph.D. National Documentation Centre / N.H.R.F.
  • 2.
    Lecture Outline • Introduction to the Semantic Web • Semantic Web Languages • Publishing RDF Using OpenLink’s Virtuoso • Linked Open Data and Examples November 9, 2011 National Documentation Centre / NHRF 2
  • 3.
    The Problem (1) •Keyword-based queries cannot be expressive • E.g. search for: – Cities in the U.S. with more than 100,000 inhabitants – Italian painters of the 18th century • Web resources – Do not (usually) convey their meaning November 9, 2011 National Documentation Centre / NHRF 3
  • 4.
    The Problem (2) •Seeking specific information in the Web or a repository • Integrating distributed data sources • Need for data annotation – Necessary for data non-readable by human • E.g. binary information, multimedia – Annotation may be redundant, incomplete, or erroneous – When it is present it does not necessarily follow a standard pattern November 9, 2011 National Documentation Centre / NHRF 4
  • 5.
    The Semantic WebParadigm (1) • ‘Web of Data’ as in a ‘Web of Documents’ – Web resources uniquely identified by their URI • Assign an unambiguously defined meaning to information, its semantics – Ontology, a well defined vocabulary – Queries can be posed by any third parties • Knowledge modeled in the form of a graph – subject, predicate, object • Interconnected data sets on the Web – Provide context Documentation Centre / NHRF November 9, 2011 National 5
  • 6.
    The Semantic WebParadigm (2) • Enables semantic annotation, interoperability, integration of information • Enables reasoning – Extract implicit information – Assure concept consistency • Variety of mature, open source tools available – Protégé, Jena, Virtuoso, D2RQ, … • Allows information to be exposed as Linked Open Data (to be discussed later on) • Data  Information  Knowledge November 9, 2011 National Documentation Centre / NHRF 6
  • 7.
    What is anOntology? • In philosophy, Ontology is the study of beings – Onto (ὤν/ὄντος) + logy (λογία) – Along with their properties and relations • In computer science, an ontology is the formal representation of knowledge – A formal, explicit specification of a shared conceptualisation – Concepts of a domain, objects and their relations – Allows complexity in schemas • The RDF and OWL approaches November 9, 2011 National Documentation Centre / NHRF 7
  • 8.
    Lecture Outline • Introduction to the Semantic Web • Semantic Web Languages • Publishing RDF Using OpenLink’s Virtuoso • Linked Open Data and Examples November 9, 2011 National Documentation Centre / NHRF 8
  • 9.
    The Resource DescriptionFramework • The Resource Description Framework is about describing resources – Was initially proposed for describing Web resources • RDF can be viewed as a graph where – Objects are graph nodes – Properties are graph edges foaf:name Graph triples ex:Author “J. Smith” ex:Author foaf:name “J. Smith” ex:Author ex:participatesIn ex:Publication ex:participatesIn ex:Author foaf:knows ex:anotherAuthor foaf:knows ex:Publication ex:anotherAuthor National Documentation Centre / NHRF
  • 10.
    The RDF Schema(1) • Describing Web Resources using RDF – rdfs:Resource • All things described by RDF are resources – rdfs:Class • The class of resources that are classes, i.e. the class of classes – rdf:type • States resource membership • E.g.: ex:Person rdf:type rdfs:Class – rdf:Property • The relations between subjects and objects November 9, 2011 National Documentation Centre / NHRF
  • 11.
    The RDF Schema(2) • Describing Web Resources using RDF – rdfs:SubClassOf • foaf:Agent rdfs:subClassOf foaf:Person – rdfs:SubPropertyOf • Allow class and property hierarchies • E.g.: ex:hasFirstName rdfs:subpropertyOf ex:hasName – rdfs:domain • ex:employer rdfs:domain foaf:Person – rdfs:range • ex:employer rdfs:range foaf:Organization November 9, 2011 National Documentation Centre / NHRF
  • 12.
    The RDF Schema(3) • Describing Web Resources using RDF – rdfs:Container – rdf:List – rdf:statement • rdf:Bag • rdf:first – rdf:subject • rdf:Seq • rdf:rest – rdf:predicate • rdf:Alt • rdf:nil – rdf:object • rdfs:ContainerMembershipProperty – rdf:value • rdfs:member – rdfs:seeAlso – rdfs:label – rdfs:isDefinedBy – rdfs:comment November 9, 2011 National Documentation Centre / NHRF
  • 13.
    The RDF Schema(4) • Example 1 <rdfs:Class rdf:ID="animal" /> <rdfs:Class rdf:ID="horse"> <rdfs:subClassOf rdf:resource="#animal"/> </rdfs:Class> • Example 2 <rdf:Description rdf:about="http://www.ekt.gr"> <dc:description>National Documentation Centre</dc:description> <dc:publisher>NHRF</dc:publisher> <dc:date>2001-02-16</dc:date> <dc:format>text/html</dc:format> <dc:language>el</dc:language> </rdf:Description> November 9, 2011 National Documentation Centre / NHRF
  • 14.
    Web Ontology Language(1) • Based on Description Logics – Decidable fragment of First Order Logic • Allows more complex schema definitions • OWL builds on top of RDF Woman ≡ Person ∩ Female Father ≡ Man ∩ ∃hasChild.Person • Current version is OWL 2 Wife ≡ Woman ∩ ∃hasHusband.Man rdfs:Resource MotherWithoutDaughter ≡ Mother ∩ ∀hasChild. ¬Woman rdfs:Class rdf:Property owl:datatypeProperty owl:functionalProperty owl:Class owl:objectTypeProperty November 9, 2011 National Documentation Centre / NHRF
  • 15.
    Web Ontology Language(2) • Class description • Property description – owl:intersectionOf – owl:datatypeProperty – owl:unionOf – owl:objectProperty – owl:complementOf – owl:equivalentProperty – owl:equivalentClass – owl:inverseOf – owl:disjointWith • isTaughtBy ↔ teaches – Cardinality – owl:functionalProperty • owl:maxCardinality – owl:inverseFunctionalProperty • owl:minCardinality – owl:transitiveProperty • owl:cardinality – owl:symmetricProperty November 9, 2011 National Documentation Centre / NHRF
  • 16.
    Web Ontology Language(3) • owl:Thing • Individuals • owl:Nothing – owl:sameAs • Version information – owl:differentFrom – owl:versionInfo – owl:allDifferent – owl:priorVersion • Value constraints – owl:backwardCompatibleWith– owl:allValuesFrom – owl:incompatibleWith – owl:someValuesFrom – owl:deprecatedClass – owl:hasValue – owl:deprecatedProperty November 9, 2011 National Documentation Centre / NHRF
  • 17.
    Web Ontology Language(4) • Example 1 :RedBordeaux rdf:type owl:Class ; owl:equivalentClass [ rdf:type owl:Class ; owl:intersectionOf ( :Bordeaux :RedWine ) ] . • Example 2 :locatedIn rdf:type owl:ObjectProperty , owl:TransitiveProperty ; rdfs:domain owl:Thing ; rdfs:range :Region . • Example 3 :BordeauxRegion rdf:type owl:NamedIndividual , :Region ; November 9, 2011 :locatedIn :FrenchRegion .
  • 18.
    Web Ontology Language(5) • Example 4 :hasColor rdf:type owl:FunctionalProperty , owl:ObjectProperty ; rdfs:domain :Wine ; rdfs:range :WineColor ; rdfs:subPropertyOf :hasWineDescriptor . • Example 5 :CabernetSauvignon rdf:type owl:Class owl:equivalentClass [ rdf:type owl:Class ; owl:intersectionOf ( :Wine [ rdf:type owl:Restriction ; owl:onProperty :madeFromGrape ; owl:hasValue :CabernetSauvignonGrape ] [ rdf:type owl:Restriction ; owl:onProperty :madeFromGrape ; November 9, 2011 owl:maxCardinality "1"^^xsd:nonNegativeInteger] ) ] ;
  • 19.
    Web Ontology Language(6) • OWL 1 flavors – OWL Full, full language expressivity – OWL DL, maximal subset allowing reasoner support – OWL Lite, minimal useful subset of language features • OWL 2 profiles – OWL 2 EL, for large numbers of classes/properties – OWL 2 QL, large volume of instance data support, relational database-friendly – OWL 2 RL, RDFS with extra expressivity, scalable reasoning November 9, 2011 National Documentation Centre / NHRF
  • 20.
    Reasoning • Check ontology consistency • Class expression subsumption • Concept satisfiability • Infer implicit information – Produces extra (inferred) triples • Numerous reasoners available – Free • Pellet, FaCT++, Jena, Hermit – Non-free • OWLIM, OntoBroker November 9, 2011 National Documentation Centre / NHRF 20
  • 21.
    Ontology Authoring (1) •Protégé is a prominent GUI solution – Java-based, open-source – OWL/RDF capabilities – Includes FaCT++ reasoner – WebProtégé in beta – Extensible through plugins • E.g. Ontograf Available online at http://protege.stanford.edu/ November 9, 2011
  • 22.
    Ontology Authoring (2) •Using HP’s Jena – Large, active community – Apache Maven group id com.hp.hpl.jena – API Example String ns = "http://example.com/sample#"; Model model = ModelFactory.createDefaultModel(); Resource resource = model.createResource(ns + "Individual1"); resource.addProperty(DC.title, title); model.write(file, “RDF/XML”); November 9, 2011 National Documentation Centre / NHRF
  • 23.
    Adding Reasoning Capabilities •Using HP’s Jena – Create an RDFS model using the Jena API String ns = "http://www.example.com/ex#"; Model rdfsEx = ModelFactory.createDefaultModel(); Property p = rdfsEx.createProperty(ns, "p"); Property q = rdfsEx.createProperty(ns, "q"); rdfsEx.add(p, RDFS.subPropertyOf, q); rdfsEx.createResource(NS+"a").addProperty(p, "foo"); – Adding the internal RDFS reasoner Reasoner reasoner = ReasonerRegistry.getRDFSReasoner(); InfModel inf = ModelFactory.createInfModel(reasoner, rdfsEx); November 9, 2011 National Documentation Centre / NHRF
  • 24.
    Querying Ontologies • SPARQLis to ontologies what SQL is to relational databases – W3C recommendation since 2008 • Designed using an SQL-like syntax – SELECT … FROM … WHERE • The WHERE conditions are a triple pattern • Returns graphs instead of a tables • Example SELECT ?x ?y ?z WHERE { ?x ?y ?z } returns all theNational Documentation Centregraph November 9, 2011 triples in the / NHRF
  • 25.
    Introduction to SPARQL(1) • Selecting a single value SELECT ?x WHERE { ?x <ex:hasName> "John Smith" } • Matching values from a graph SELECT ?x ?fname WHERE {?x <ex:hasName> ?fname} • Also SELECT ?name ?value WHERE { ?x <ex:hasAttribute> ?attr . November 9, 2011 ?attr <ex:hasValue> ?value . } National Documentation Centre / NHRF
  • 26.
    Introduction to SPARQL(2) • String matching using regular expressions SELECT ?y WHERE { ?x vcard:Given ?y . FILTER regex(?y, "r", "i") } • Filtering values SELECT ?resource WHERE { ?resource info:age ?age . FILTER (?age >= 24) } November 9, 2011 National Documentation Centre / NHRF
  • 27.
    Introduction to SPARQL(3) • The OPTIONAL construct to return information where available SELECT ?name ?age WHERE { ?person vcard:FN ?name . OPTIONAL { ?person info:age ?age } } • The UNION construct SELECT ?name WHERE { { [] foaf:name ?name } UNION { [] vCard:FN ?name } November 9, 2011 National Documentation Centre / NHRF
  • 28.
    Introduction to SPARQL(4) • Result handling – ORDER BY, DISTINCT, OFFSET and LIMIT • Same as in SQL – CONSTRUCT • Ability to construct a new graph based on the results PREFIX foaf: <http://xmlns.com/foaf/0.1/> CONSTRUCT { <http://example.com/person#Alice> foaf:knows ?x } FROM <http://example.org/foaf/people> WHERE { ?x foaf:name ?name } ORDER BY desc(?name) November 9, 2011 LIMIT 10 National Documentation Centre / NHRF
  • 29.
    Common Vocabularies (1) •DC – Describe library asset information • SKOS – Simple Knowledge Organization Scheme • FOAF – Friend of a friend • SIOC – Semantically Interlinked Online Communities • DBPedia – Extract structured information from Wikipedia 29
  • 30.
    Common Vocabularies (2) •Music ontology – Describe music concepts • Good relations – Used in the e-commerce context – Supported by Google and Yahoo • Basic Geo Vocabulary – Expresses spatial information using WGS84 • Creative commons – Express copyright information November 9, 2011 National Documentation Centre / NHRF 30
  • 31.
    Common Vocabularies (3) •Microformats are open data standards for publishing structured information on the Web • Simple, solve specific problems • No change in display • Examples – hCard – hCalendar – RDFa • For SEO, see also schema.org
  • 32.
    Embedded RDF • RDFa – Embed RDF in XHTML documents – Uses <span>, <div> – Allows nested descriptions • GRDDL – Obtain RDF from HTML pages – Uses XSLT for XML 32 Both RDFa and GRDDL are W3C recommendations
  • 33.
    Ontology Serialisation Formats •RDF+XML <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn"> <dc:title>Tony Benn</dc:title> <dc:publisher>Wikipedia</dc:publisher> </rdf:Description> </rdf:RDF> • N3 and Turtle (Turtle ⊆ N3) @prefix dc: <http://purl.org/dc/elements/1.1/>. <http://en.wikipedia.org/wiki/Tony_Benn> dc:title "Tony Benn"; dc:publisher "Wikipedia". • ... and, of course, in the database! November 9, 2011 National Documentation Centre / NHRF 33
  • 34.
    Lecture Outline • Introduction to the Semantic Web • Semantic Web Languages • Publishing RDF Using OpenLink’s Virtuoso • Linked Open Data and Examples November 9, 2011 National Documentation Centre / NHRF 34
  • 35.
    Triplestores • A triplestorecontains records in the form – (subject, predicate, object) • Uses a relational database backend • Saving an Ontology in a triplestore – Jena • http://jena.sourceforge.net – Sesame • http://www.openrdf.org – Virtuoso • http://virtuoso.openlinksw.com – Oracle • http://www.oracle.com/technetwork/database November 9, 2011 National Documentation Centre / NHRF 35
  • 36.
    RDB to RDFMapping Language • Several tools proposed in the early years – Triplify, D2OMapper, DB2OWL, VisaVis, R2O, MapOnto, … • R2RML: a W3C working draft – Implementations • D2RQ • Virtuoso November 9, 2011 National Documentation Centre / NHRF 36
  • 37.
    Virtuoso Overview (1) •Open source and commercial version • Can be used as – A web application server – A relational database repository • Offers a JDBC Driver • Collaborates with Jena • Offers Conductor, a GUI for server administration – A web service server – A triplestore • Export RDF data from same DB or others November 9, 2011 National Documentation Centre / NHRF 37
  • 38.
    Virtuoso Overview (2) •RDF Views – Export relational data as triples • SPARQL 1.1 support, plus – Full Text Queries – Geo Spatial Queries – Business Analytics and Intelligence – SQL Stored Procedure and Built-In Function exploitation from SPARQL – Create, Update, and Delete (SPARUL) • Cluster Configuration – Parallel and Horizontal scaling November 9, 2011 National Documentation Centre / NHRF 38
  • 39.
    Virtuoso Overview (3) •Extendable through VAD* Packages – Interactive SPARQL Query Builder • A GUI to create SPARQL queries – Sponger Middleware • Offers RDF Mappers to import data into Virtuoso – PubSubHub Protocol (for RSS) • Can be used to allow push behavior and subscriptions by clients – OAT (OpenLink AJAX Toolkit) Framework • Rich web application development * Virtuoso Application Distribution November 9, 2011 National Documentation Centre / NHRF 39
  • 40.
    Virtuoso Reasoning Engine •Backward-chaining OWL reasoner coverage – rdfs:subClassOf – rdfs:subPropertyOf – owl:sameAs – owl:equivalentClass – owl:equivalentProperty – owl:InverseFunctionalProperty – owl:inverseOf – owl:SymmetricalProperty – owl:TransitiveProperty November 9, 2011 National Documentation Centre / NHRF 40
  • 41.
    Virtuoso Sponger • AnRDF-iser to bring data into the Semantic Web • Sponger extracts RDF data from non-RDF sources • A Cartridge per data source • XSLT templates do the work • Customisable and Programmable – Virtuoso PL, C++, Java November 9, 2011 National Documentation Centre / NHRF 41
  • 42.
    Virtuoso as aDB Server (1) • Conductor: a GUI for server administration 42
  • 43.
    Virtuoso as aDB Server (2) • Can export data as RDF using RDF Views 43
  • 44.
    Virtuoso as anRDF Server (1) • A URI for every resource, browseable repository 44
  • 45.
    Virtuoso as anRDF Server (2) • Example: Measurement URI November 9, 2011 National Documentation Centre / NHRF 45
  • 46.
    Virtuoso as anRDF Server (3) • RDF data also accessible via – ODBC, JDBC, OLE DB, XMLA, ADO.NET • Difficulties in extracting RDF Data – Tables must have a primary key – Mappings are defined using regular expressions and tend to be complicated DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'demo_rule6‘, 1, '/Demo/objects/(*^#+*)‘, vector('path'), 1, '/sparql?query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^/Demo/objects/%U%%3E+FROM+% %3Chttp%%3A//^{URIQADefaultHost}^/Demo%%23%%3E&format=%U‘, vector('path', '*accept*'), null, '(text/rdf.n3)|(application/rdf.xml)‘, 2, null ); November 9, 2011 National Documentation Centre / NHRF 46
  • 47.
    Virtuoso Jena Provider •Offered by OpenLink • Native Graph Model Storage Provider • Enables access to the Virtuoso RDF Quad store through Jena November 9, 2011 National Documentation Centre / NHRF 47
  • 48.
    Querying Remote Repositories •XML over HTTP (RESTful approach) – http://demo.openlinksw.com/sparql?default- graph- uri=urn:lsid:ubio.org:namebank:11815&should- sponge=soft&query=SELECT+*+WHERE+{?s+?p+? o}&format=text/html • No create/update/delete capabilities November 9, 2011 National Documentation Centre / NHRF 48
  • 49.
    SPARQL Query Interface(1) • SPARQL queries can be named and stored – A query named sparql-demo listens to: http://localhost:8890/DAV/sparql-demo • Can return results over HTTP (XML by default) • MIME type of the RDF data – 'rdf+xml' (default) | 'n3' | 'turtle' | 'ttl‘ November 9, 2011 National Documentation Centre / NHRF 49
  • 50.
    SPARQL Query Interface(2) • SPARQL results example in RDF/XML <ROOT> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rs="http://www.w3.org/2005/sparql-results#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <rs:results rdf:nodeID="rset"> <rs:result rdf:nodeID="sol193"> <rs:binding rdf:nodeID="sol193-0" rs:name="x“> <rs:value rdf:resource="http://localhost:8890/Demo/temperature/PK/4#this"/> </rs:binding> <rs:binding rdf:nodeID="sol193-1" rs:name="y“> <rs:value rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/></rs:binding> <rs:binding rdf:nodeID="sol193-2" rs:name="z“> <rs:value rdf:resource="http://localhost:8890/schemas/Demo/temperature"/> </rs:binding> </rs:result> … </rs:results> </rdf:RDF> </ROOT> November 9, 2011 National Documentation Centre / NHRF 50
  • 51.
    Lecture Outline • Introduction to the Semantic Web • Semantic Web Languages • Publishing RDF Using OpenLink’s Virtuoso • Linked Open Data and Examples November 9, 2011 National Documentation Centre / NHRF 51
  • 52.
    The Linked OpenData Cloud (1) • Data available on the Web – Under an open license • Available as structured data – Excel sheet instead of a scanned image • Use non-proprietary format – CSV, RDF instead of DOC, XLS • Use linked data format – URIs to identify things • Linked to other people’s data – Provision of context 52 Also see: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
  • 53.
    The Linked OpenData Cloud (2) • Interconnected datasets using URI’s and RDF 2011 2007 Source: http://linkeddata.org Also see the datahub: http://thedatahub.org/group/lodcloud November 9, 2011 National Documentation Centre / NHRF 53
  • 54.
    The Linked OpenData Cloud (3) • Consumer capabilities – Access it, print it, store it locally, enter the data in another system – Process, aggregate, visualise, manipulate, export in another format, reuse – Avoid vendor lock-ins • Publisher capabilities – Make data discoverable – Increase the value of the data • Allow added-value services – Fine-granular control over the data Also see: http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/ 54
  • 55.
    Open Government Data •Prominent examples include – data.gov (US) – data.gov.uk, (UK) – data.london.gov.uk (UK) – digitaliser.dk (DK) – data.govt.nz (NZ) – linkedopendata.it (IT) – geodata.gov.gr (GR) Promotional video: http://opengovernmentdata.org/film/ November 9, 2011 National Documentation Centre / NHRF 55
  • 56.
    The DBPedia Project •Structured information based on Wikipedia • SPARQL endpoint: dbpedia.org/sparql • Example: Greece on DBPedia
  • 57.
    New York TimesPublic Data • News data November 9, 2011 National Documentation Centre / NHRF 57
  • 58.
    The Claros Project •The world of art on the Semantic Web • Uses the CIDOC-CRM vocabulary to describe – Objects – Places – Periods – People • OWL DL • RESTful November 9, 2011
  • 59.
    The OpenCalais Project •Creates semantic metadata for submitted content • By Thomson Reuters • Free to all • Extracts RDF using NLP • Uses WS November 9, 2011
  • 60.
    The TrueKnowledge Engine •Based on facts, not keywords
  • 61.
    Thank you foryour attention! Questions?
  • 62.
    Appendix • Installing Virtuosoas a system service – Windows 7 • Download and extract from http://virtuoso.openlinksw.com/dataspace/dav/wiki/Mai n/VOSDownload • Open a command line prompt as administrator • Register required DLL – regsvr32 virtodbc.dll • Install service – virtuoso-t +service screate +instance "DB" +configfile virtuoso.ini – Ubuntu 10.04 LTS • sudo apt-get install virtuoso-server November 9, 2011 National Documentation Centre / NHRF 62