KEMBAR78
Consuming Linked Data 4/5 Semtech2011 | PPTX
Consuming Linked DataJuan F. SequedaSemantic Technology ConferenceJune 2011
Now what can we do with this data?
Linked Data ApplicationsSoftware system that makes use of data on the web from multiple datasets and that benefits from links between the datasets
Characteristics of Linked Data ApplicationsConsume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data
Discover further information by following the links between different data sources: the fourth principle enables this.
Combine the consumed linked data with data from sources (not necessarily Linked Data)
Expose the combined data back to the web following the Linked Data principles
Offer value to end-usersGeneric Applications
Linked Data Browsers
Linked Data BrowsersNot actually separate browsers. Run inside of HTML browsersView the data that is returned after looking up a URI in tabular formUser can navigate between data sources by following RDF Links(IMO) No usability
Linked Data Browsershttp://browse.semanticweb.org/TabulatorOpenLinkDataexplorerZitgistMarblesExploratorDiscoLinkSailor
Linked Data (Semantic Web) Search Engines
Linked Data (Semantic Web) Search EnginesJust like conventional search engines (Google, Bing, Yahoo), crawl RDF documents and follow RDF links.Current search engines don’t crawl data, unless it’s RDFaHuman focus SearchFalcons - KeywordSWSE – KeyworkdVisiNav – Complex QueriesMachine focus SearchSindice – data instancesSwoogle - ontologiesWatson - ontologiesUberblic – curated integrated data instances
(Semantic) SEO ++Markup your HTML with RDFaUse standard vocabularies (ontologies)Google VocabularyGood RelationsDublin CoreGoogle and Yahoo will crawl this data and use it for better rendering
On-the-fly Mashups
http://sig.ma
Domain Specific Applications
Domain Specific ApplicationsGovernmentData.govData.gov.ukhttp://data-gov.tw.rpi.edu/wiki/DemosMusicSeevl.netDbpedia MobileLife ScienceLinkedLifeDataSportsBBC World Cup
Faceted Browsers
http://dbpedia.neofonie.de/browse/
http://dev.semsol.com/2010/semtech/
Query your data
Find all the locations of all the original paintings of Modigliani
Select all proteins that are linked to a curated interaction from the literature and to inflammatory responsehttp://linkedlifedata.com/
SPARQL EndpointsLinked Data sources usually provide a SPARQL endpoint for their dataset(s)SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol*Send your SPARQL query, receive the result* http://www.w3.org/TR/rdf-sparql-protocol/
Where can I find SPARQL Endpoints?Dbpedia: http://dbpedia.org/sparqlMusicbrainz: http://dbtune.org/musicbrainz/sparqlU.S. Census: http://www.rdfabout.com/sparqlhttp://esw.w3.org/topic/SparqlEndpoints
Accessing a SPARQL EndpointSPARQL endpoints: RESTful Web servicesIssuing SPARQL queries to a remote SPARQL endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter queryGET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1URL-encoded string with the SPARQL query
Query Results FormatsSPARQL endpoints usually support different result formats:XML, JSON, plain text (for ASK and SELECT queries)RDF/XML, NTriples, Turtle, N3 (for DESCRIBE and CONSTRUCT queries)
Query Results FormatsPREFIX dbp: http://dbpedia.org/ontology/PREFIX dbpprop: http://dbpedia.org/property/SELECT ?name ?bdayWHERE {    ?pdbp:birthplace <http://dbpedia.org/resource/Berlin> .    ?pdbpprop:dateOfBirth ?bday .    ?pdbpprop:name ?name .}
Query Result FormatsUse the ACCEPT header to request the preferred result format:GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1 Accept: application/sparql-results+json
Query Result FormatsAs an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter outGET /sparql?out=json&query=... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1
Accessing a SPARQL EndpointMore convenient: use a librarySPARQL JavaScript Libraryhttp://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.htmlARC for PHPhttp://arc.semsol.org/RAP – RDF API for PHPhttp://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html
Accessing a SPARQL EndpointJena / ARQ (Java)http://jena.sourceforge.net/Sesame (Java)http://www.openrdf.org/SPARQL Wrapper (Python)http://sparql-wrapper.sourceforge.net/PySPARQL (Python)http://code.google.com/p/pysparql/
Accessing a SPARQL EndpointExample with Jena/ARQimport com.hp.hpl.jena.query.*;String service = "..."; // address of the SPARQL endpoint String query = "SELECT ..."; // your SPARQL query QueryExecutione = QueryExecutionFactory.sparqlService(service, query)ResultSet results = e.execSelect(); while ( results.hasNext() ) {QuerySolutions = results.nextSolution(); 		// ...} e.close();
Querying a single dataset is quite boringcompared toIssuing queries over multiple datasets
Creating a Linked Data Application
Linked Data ArchitecturesFollow-up queriesQuerying Local CacheCrawlingFederated Query ProcessingOn-the-fly Dereferencing
Follow-up QueriesIdea: issue follow-up queries over other datasets based on results from previous queriesSubstituting placeholders in query templates
String s1 = "http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql";String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }";String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) {QuerySolution s1 = results.nextSolution(); 	String q2 = String.format( qTmpl, s1.getResource("s"),getURI() );QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); 	while ( results2.hasNext() ) {		// ... 	}	e2.close();}e1.close();Find a list of companies Filtered by some criteria and return DbpediaURIs from them
Follow-up QueriesAdvantageQueried data is up-to-dateDrawbacksRequires the existence of a SPARQL endpoint for each datasetRequires program logicVery inefficient
Querying Local CacheIdea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasetsUse RDF dumps of each datasetSPARQL endpoint over a majority of datasets from the LOD cloud at:http://uberblic.orghttp://lod.openlinksw.com/sparql
Querying a Collection of DatasetsAdvantage:No need for specific program logicIncludes the datasets that you wantComplex queries and high performanceEven reasoningDrawbacks:Depends on existence of RDF dumpRequires effort to set up and to operate the store How to keep the copies in sync with the originals?Queried data might be out of date
CrawlingCrawl RDF in advance by following RDF linksIntegrate, clean and store in your own triplestoreSame way we crawl HTML todayLDSpider
CrawlingAdvantages:No need for specific program logic Independent of the existence, availability, and efficiency of SPARQL endpointsComplex queries with high performanceCan even reason about the dataDrawbacks:Requires effort to set up and to operate the store How to keep the copies in sync with the originals?Queried data might be out of date
Federated Query ProcessingIdea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results
Federated Query ProcessingInstance-based federationEach thing described by only one data source Untypical for the Web of DataTriple-based federationNo restrictions Requires more distributed joinsStatistics about datasets required (both cases)
Federated Query ProcessingDARQ (Distributed ARQ)http://darq.sourceforge.net/Query engine for federated SPARQL queriesExtension of ARQ (query engine for Jena)Last update: June 2006Semantic Web Integrator and Query Engine(SemWIQ)http://semwiq.sourceforge.net/Last update: March 2010Commercial…
Federated Query ProcessingAdvantages:No need for specific program logic Queried data is up to dateDrawbacks:Requires the existence of a SPARQL endpoint for each datasetRequires effort to set up and configure the mediator
In any case:You have to know the relevant data sourcesWhen developing the app using follow-up queriesWhen selecting an existing SPARQL endpoint over a collection of dataset copiesWhen setting up your own store with a collection of dataset copiesWhen configuring your query federation system You restrict yourself to the selected sources
In any case:You have to know the relevant data sourcesWhen developing the app using follow-up queriesWhen selecting an existing SPARQL endpoint over a collection of dataset copiesWhen setting up your own store with a collection of dataset copiesWhen configuring your query federation system You restrict yourself to the selected sourcesThere is an alternative: Remember, URIs link to data
On-the-fly DereferencingIdea: Discover further data by looking up relevant URIs in your application on the flyCan be combined with the previous approachesLinked Data Browsers
Link Traversal Based Query ExecutionApplies the idea of automated link traversal to the execution of SPARQL queriesIdea:Intertwine query evaluation with traversal of RDF linksDiscover data that might contribute to query results during query executionAlternately:Evaluate parts of the query Look up URIs in intermediate solutions
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query Execution
Link Traversal Based Query ExecutionAdvantages:No need to know all data sources in advanceNo need for specific programming logicQueried data is up to dateDoes not depend on the existence of SPARQL endpoints provided by the data sourcesDrawbacks:Not as fast as a centralized collection of copiesUnsuitable for some queriesResults might be incomplete (do we care?)
ImplementationsSemantic Web Client library (SWClLib) for Javahttp://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/SWIC for Prologhttp://moustaki.org/swic/
ImplementationsSQUIN http://squin.orgProvides SWClLib functionality as a Web serviceAccessible like a SPARQL endpointInstall package: unzip and startLess than 5 mins!Convenient access with SQUIN PHP tools:$s = 'http:// ...'; // address of the SQUIN service $q = new SparqlQuerySock( $s, '... SELECT ...' ); $res = $q->getJsonResult();// or getXmlResult()

Consuming Linked Data 4/5 Semtech2011

  • 1.
    Consuming Linked DataJuanF. SequedaSemantic Technology ConferenceJune 2011
  • 2.
    Now what canwe do with this data?
  • 3.
    Linked Data ApplicationsSoftwaresystem that makes use of data on the web from multiple datasets and that benefits from links between the datasets
  • 4.
    Characteristics of LinkedData ApplicationsConsume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data
  • 5.
    Discover further informationby following the links between different data sources: the fourth principle enables this.
  • 6.
    Combine the consumedlinked data with data from sources (not necessarily Linked Data)
  • 7.
    Expose the combineddata back to the web following the Linked Data principles
  • 8.
    Offer value toend-usersGeneric Applications
  • 9.
  • 10.
    Linked Data BrowsersNotactually separate browsers. Run inside of HTML browsersView the data that is returned after looking up a URI in tabular formUser can navigate between data sources by following RDF Links(IMO) No usability
  • 12.
  • 13.
    Linked Data (SemanticWeb) Search Engines
  • 14.
    Linked Data (SemanticWeb) Search EnginesJust like conventional search engines (Google, Bing, Yahoo), crawl RDF documents and follow RDF links.Current search engines don’t crawl data, unless it’s RDFaHuman focus SearchFalcons - KeywordSWSE – KeyworkdVisiNav – Complex QueriesMachine focus SearchSindice – data instancesSwoogle - ontologiesWatson - ontologiesUberblic – curated integrated data instances
  • 15.
    (Semantic) SEO ++Markupyour HTML with RDFaUse standard vocabularies (ontologies)Google VocabularyGood RelationsDublin CoreGoogle and Yahoo will crawl this data and use it for better rendering
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
    Find all thelocations of all the original paintings of Modigliani
  • 26.
    Select all proteinsthat are linked to a curated interaction from the literature and to inflammatory responsehttp://linkedlifedata.com/
  • 27.
    SPARQL EndpointsLinked Datasources usually provide a SPARQL endpoint for their dataset(s)SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol*Send your SPARQL query, receive the result* http://www.w3.org/TR/rdf-sparql-protocol/
  • 28.
    Where can Ifind SPARQL Endpoints?Dbpedia: http://dbpedia.org/sparqlMusicbrainz: http://dbtune.org/musicbrainz/sparqlU.S. Census: http://www.rdfabout.com/sparqlhttp://esw.w3.org/topic/SparqlEndpoints
  • 29.
    Accessing a SPARQLEndpointSPARQL endpoints: RESTful Web servicesIssuing SPARQL queries to a remote SPARQL endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter queryGET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1URL-encoded string with the SPARQL query
  • 30.
    Query Results FormatsSPARQLendpoints usually support different result formats:XML, JSON, plain text (for ASK and SELECT queries)RDF/XML, NTriples, Turtle, N3 (for DESCRIBE and CONSTRUCT queries)
  • 31.
    Query Results FormatsPREFIXdbp: http://dbpedia.org/ontology/PREFIX dbpprop: http://dbpedia.org/property/SELECT ?name ?bdayWHERE { ?pdbp:birthplace <http://dbpedia.org/resource/Berlin> . ?pdbpprop:dateOfBirth ?bday . ?pdbpprop:name ?name .}
  • 34.
    Query Result FormatsUsethe ACCEPT header to request the preferred result format:GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1 Accept: application/sparql-results+json
  • 35.
    Query Result FormatsAsan alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter outGET /sparql?out=json&query=... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1
  • 36.
    Accessing a SPARQLEndpointMore convenient: use a librarySPARQL JavaScript Libraryhttp://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.htmlARC for PHPhttp://arc.semsol.org/RAP – RDF API for PHPhttp://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html
  • 37.
    Accessing a SPARQLEndpointJena / ARQ (Java)http://jena.sourceforge.net/Sesame (Java)http://www.openrdf.org/SPARQL Wrapper (Python)http://sparql-wrapper.sourceforge.net/PySPARQL (Python)http://code.google.com/p/pysparql/
  • 38.
    Accessing a SPARQLEndpointExample with Jena/ARQimport com.hp.hpl.jena.query.*;String service = "..."; // address of the SPARQL endpoint String query = "SELECT ..."; // your SPARQL query QueryExecutione = QueryExecutionFactory.sparqlService(service, query)ResultSet results = e.execSelect(); while ( results.hasNext() ) {QuerySolutions = results.nextSolution(); // ...} e.close();
  • 39.
    Querying a singledataset is quite boringcompared toIssuing queries over multiple datasets
  • 40.
    Creating a LinkedData Application
  • 41.
    Linked Data ArchitecturesFollow-upqueriesQuerying Local CacheCrawlingFederated Query ProcessingOn-the-fly Dereferencing
  • 42.
    Follow-up QueriesIdea: issuefollow-up queries over other datasets based on results from previous queriesSubstituting placeholders in query templates
  • 43.
    String s1 ="http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql";String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }";String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) {QuerySolution s1 = results.nextSolution(); String q2 = String.format( qTmpl, s1.getResource("s"),getURI() );QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); while ( results2.hasNext() ) { // ... } e2.close();}e1.close();Find a list of companies Filtered by some criteria and return DbpediaURIs from them
  • 44.
    Follow-up QueriesAdvantageQueried datais up-to-dateDrawbacksRequires the existence of a SPARQL endpoint for each datasetRequires program logicVery inefficient
  • 45.
    Querying Local CacheIdea:Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasetsUse RDF dumps of each datasetSPARQL endpoint over a majority of datasets from the LOD cloud at:http://uberblic.orghttp://lod.openlinksw.com/sparql
  • 46.
    Querying a Collectionof DatasetsAdvantage:No need for specific program logicIncludes the datasets that you wantComplex queries and high performanceEven reasoningDrawbacks:Depends on existence of RDF dumpRequires effort to set up and to operate the store How to keep the copies in sync with the originals?Queried data might be out of date
  • 47.
    CrawlingCrawl RDF inadvance by following RDF linksIntegrate, clean and store in your own triplestoreSame way we crawl HTML todayLDSpider
  • 48.
    CrawlingAdvantages:No need forspecific program logic Independent of the existence, availability, and efficiency of SPARQL endpointsComplex queries with high performanceCan even reason about the dataDrawbacks:Requires effort to set up and to operate the store How to keep the copies in sync with the originals?Queried data might be out of date
  • 49.
    Federated Query ProcessingIdea:Querying a mediator which distributes sub-queries to relevant sources and integrates the results
  • 50.
    Federated Query ProcessingInstance-basedfederationEach thing described by only one data source Untypical for the Web of DataTriple-based federationNo restrictions Requires more distributed joinsStatistics about datasets required (both cases)
  • 51.
    Federated Query ProcessingDARQ(Distributed ARQ)http://darq.sourceforge.net/Query engine for federated SPARQL queriesExtension of ARQ (query engine for Jena)Last update: June 2006Semantic Web Integrator and Query Engine(SemWIQ)http://semwiq.sourceforge.net/Last update: March 2010Commercial…
  • 52.
    Federated Query ProcessingAdvantages:Noneed for specific program logic Queried data is up to dateDrawbacks:Requires the existence of a SPARQL endpoint for each datasetRequires effort to set up and configure the mediator
  • 53.
    In any case:Youhave to know the relevant data sourcesWhen developing the app using follow-up queriesWhen selecting an existing SPARQL endpoint over a collection of dataset copiesWhen setting up your own store with a collection of dataset copiesWhen configuring your query federation system You restrict yourself to the selected sources
  • 54.
    In any case:Youhave to know the relevant data sourcesWhen developing the app using follow-up queriesWhen selecting an existing SPARQL endpoint over a collection of dataset copiesWhen setting up your own store with a collection of dataset copiesWhen configuring your query federation system You restrict yourself to the selected sourcesThere is an alternative: Remember, URIs link to data
  • 55.
    On-the-fly DereferencingIdea: Discoverfurther data by looking up relevant URIs in your application on the flyCan be combined with the previous approachesLinked Data Browsers
  • 56.
    Link Traversal BasedQuery ExecutionApplies the idea of automated link traversal to the execution of SPARQL queriesIdea:Intertwine query evaluation with traversal of RDF linksDiscover data that might contribute to query results during query executionAlternately:Evaluate parts of the query Look up URIs in intermediate solutions
  • 57.
    Link Traversal BasedQuery Execution
  • 58.
    Link Traversal BasedQuery Execution
  • 59.
    Link Traversal BasedQuery Execution
  • 60.
    Link Traversal BasedQuery Execution
  • 61.
    Link Traversal BasedQuery Execution
  • 62.
    Link Traversal BasedQuery Execution
  • 63.
    Link Traversal BasedQuery Execution
  • 64.
    Link Traversal BasedQuery Execution
  • 65.
    Link Traversal BasedQuery Execution
  • 66.
    Link Traversal BasedQuery Execution
  • 67.
    Link Traversal BasedQuery ExecutionAdvantages:No need to know all data sources in advanceNo need for specific programming logicQueried data is up to dateDoes not depend on the existence of SPARQL endpoints provided by the data sourcesDrawbacks:Not as fast as a centralized collection of copiesUnsuitable for some queriesResults might be incomplete (do we care?)
  • 68.
    ImplementationsSemantic Web Clientlibrary (SWClLib) for Javahttp://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/SWIC for Prologhttp://moustaki.org/swic/
  • 69.
    ImplementationsSQUIN http://squin.orgProvides SWClLibfunctionality as a Web serviceAccessible like a SPARQL endpointInstall package: unzip and startLess than 5 mins!Convenient access with SQUIN PHP tools:$s = 'http:// ...'; // address of the SQUIN service $q = new SparqlQuerySock( $s, '... SELECT ...' ); $res = $q->getJsonResult();// or getXmlResult()
  • 70.
  • 71.
    What else?Vocabulary Mappingfoaf:namevsfoo:nameIdentityResolutionex:Juanowl:sameAsfoo:JuanProvenanceData QualityLicense
  • 72.
    Getting Started Finding URIsUsesearch enginesFinding SPARQL Endpoints