Pubby A Linked Data Frontend for SPARQL Endpoints
1 of 4
http://wifo5-03.informatik.uni-mannheim.de/pubby/
Pubby
A Linked Data Frontend for SPARQL Endpoints
Richard Cyganiak
Chris Bizer
Pubby can be used to add Linked Data interfaces to SPARQL endpoints.
Much Semantic Web data lives inside triple stores and can be accessed only by sending SPARQL queries to a SPARQL endpoint. It is hard to
connect information in these stores with other external data sources.
Linked Data is a style of publishing data on the Semantic Web that makes it easy to interlink, discover and consume data on the Semantic
Web. It allows a wide variety of existing RDF browsers (e.g. Disco, Tabulator, OpenLink Browser), RDF crawlers (e.g. SWSE, Swoogle), and
query agents (e.g. SemWeb Client Library, SWIC) to access the data.
Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server. It is implemented as a Java web application.
News
2011-01-26: Pubby 0.3.3 released. This version switches Pubby from using N3 syntax to the (almost identical) Turtle syntax. Configuration files now
use the .ttl extension instead of .n3. The source code was also moved to a Github repository.
2011-01-25: Pubby 0.3.2 released. This version fixes a bug in the metadata extension. This bug caused problems generating RDF/XML output.
2011-01-20: Alternaitve tool released. Epimorphics has released Elda, a Linked Data publishing tool which can be used as an alternative to Pubby.
2010-07-27: Pubby 0.3.1 released. The default metadata template in this version is updated to release v0.5 of the Provenance Vocabulary.
2009-09-26: Pubby 0.3 released. This version adds a metadata extension that -by default- provides provenance information.
2007-10-22: Pubby 0.2 released. This version adds multi-dataset support, has improved content negotiation, adds the conf:datasetURIPattern
option, plus various small improvements and bug fixes.
2007-06-20: Pubby 0.1 released. Today marks the release of the first alpha version of Pubby.
Features
Provides a Linked Data interface to local or remote SPARQL protocol servers
Provides dereferenceable URIs by rewriting URIs found in the SPARQL-exposed dataset into the
Pubby server's namespace
Provides a simple HTML interface showing the data available about each resource
Takes care of handling 303 redirects and content negotiation
Compatible with Tomcat and Jetty servlet containers
Includes a metadata extension to add metadata to the provided data
How It Works
Many triple stores and other SPARQL endpoints can be accessed only by SPARQL client applications that use the SPARQL protocol. It
cannot be accessed by the growing variety of Linked Data clients. Pubby is designed to provide a Linked Data interface to those RDF data
sources.
6/23/2016 5:39 PM
Pubby A Linked Data Frontend for SPARQL Endpoints
2 of 4
http://wifo5-03.informatik.uni-mannheim.de/pubby/
In RDF, resources are identified by URIs. The URIs used in most SPARQL dataset are not dereferenceable, meaning they cannot be accessed
in a Semantic Web browser, but return
404 Not Found
errors instead, or use non-dereferenceable URI schemes, as in the fictional URI
tag:dbpedia.org,2007:Berlin .
When setting up a Pubby server for a SPARQL endpoint, you will configure a mapping that translates those URIs to dereferenceable URIs
handled by Pubby. If your server is running at http://myserver.org:8080/pubby/, then the Berlin URI above might be mapped to
http://myserver.org:8080/pubby/Berlin .
Pubby will handle requests to the mapped URIs by connecting to the SPARQL endpoint, asking it for information about the original URI, and
passing back the results to the client. It also handles various details of the HTTP interaction, such as the 303 redirect required by Web
Architecture, and content negotiation between HTML, RDF/XML and Turtle descriptions of the same resource.
Download and Installation
1. Download Pubby Current version: v0.3.3 (alpha), released 2011-01-26
2. If you haven't already, download and install a servlet container . Pubby has been tested with Tomcat and Jetty. I will assume your server is set up
to run at http://myserver/.
3. Unzip the Pubby distribution and copy the
webapp
directory into the servlet container's webapps folder. If Pubby is the only web application you want
to run in the container, then rename the webapp directory to root. Otherwise, rename it to something like
mydataset.
This will change the Pubby root
to http://myserver/mydataset/.
4. Modify the configuration file to suit your needs. It is located within Pubby's webapp directory, at /WEB-INF/config.ttl. See the next section for a list
of supported configuration directives.
Configuration
The Pubby configuration file uses Turtle syntax. It typically starts with some boilerplate prefix declarations, followed by a server configuration
section, and one or more dataset configuration sections:
<> a conf:Configuration;
conf:option1 value1;
conf:option2 value2;
(...)
conf:dataset [
conf:option1 value1;
conf:option2 value2;
];
.
There is an example configuration file.
Note that punctuation is significant, e.g. URIs are always enclosed in angle brackets, while literal values are enclosed in quotes. All directives
are optional unless otherwise noted.
Server Configuration Section
Below is a list of all supported directives for the server configuration section.
6/23/2016 5:39 PM
Pubby A Linked Data Frontend for SPARQL Endpoints
3 of 4
http://wifo5-03.informatik.uni-mannheim.de/pubby/
conf:projectName "Project Name";
The name of the project, for display in page titles.
conf:projectHomepage <project_homepage_url.html>;
A project homepage or similar URL, for linking in page titles.
conf:webBase <server_base_uri>;
Required. The root URL where the Pubby web application is installed, e.g. http://myserver/mydataset/ .
conf:labelProperty ex:property1, ex:property2, ...;
The value of these RDF properties, if present in the dataset, will be used as labels and page titles for resources. Defaults to rdfs:label,
dc:title, foaf:name .
conf:commentProperty ex:property1, ex:property2, ...;
The value of these RDF properties, if present in the dataset, will be used as a short textual description of the item. Defaults to rdfs:comment,
dc:description .
conf:imageProperty ex:property1, ex:property2, ...;
The value of these RDF properties, if present in the dataset, will be used as an image URL to show a depiction of the item. Defaults to foaf:depiction.
conf:usePrefixesFrom <file.rdf>;
Links to an RDF document whose prefix declarations will be used in output. Defaults to the empty URL, which means the prefixes from the configuration file
will be used.
conf:defaultLanguage "en";
If labels and comments in multiple languages are present (using different language tags on RDF literals), then this language will be preferred. Defaults to "en" .
conf:indexResource <dataset_uri>;
The URI of a resource whose description will be displayed as the hom e page of the Pubby installation. Note that you have to specify a dataset URI , not a
mapped web URI.
conf:dataset [ ... ];
Required. Introduces a dataset configuration section. There can be one or more dataset sections.
Dataset Configuration Section
Below is a list of all supported directives for the server configuration section.
conf:sparqlEndpoint <sparql_endpoint_url>;
Required. The URL of the SPARQL endpoint whose data we want to expose.
conf:sparqlDefaultGraph <sparql_default_graph_name>;
If the data of interest is not located in the SPARQL dataset's default graph, but within a named graph, then its name must be specified here.
conf:datasetBase <dataset_uri_prefix>;
Required. The common URI prefix of the resource identifiers in the SPARQL dataset; only resources with this prefix will be mapped and made available by
Pubby.
conf:datasetURIPattern "regular expression";
If present, only dateset URIs matching this Java-style regular expression will be mapped and m ade available by Pubby. The regular expression must match
everything after the datasetBase part of the URI.
conf:datasetBase <http://example.org/>;
conf:datasetURIPattern "(users|documents)/.*";
This example configuration will publish the dataset URI http://example.org/users/alice , but not
invoices/5395842
http://example.org/invoices/5395842
because the URI part
does not match the regular expression.
conf:addSameAsStatements "true"/"false";
If set to "true" , an owl:sameAs statement of the form
<web_uri> owl:sameAs <dataset_uri>
will be present in Linked Data output.
conf:loadRDF <data1.rdf>, <data1.rdf>, ...;
Load one or more RDF documents from the Web or the file system and use them as the data source. The SPARQL endpoint configured above will be
ignored. Allows using Pubby as an RDF server for publishing static RDF files.
6/23/2016 5:39 PM
Pubby A Linked Data Frontend for SPARQL Endpoints
4 of 4
http://wifo5-03.informatik.uni-mannheim.de/pubby/
conf:rdfDocumentMetadata [ statement1; statement2; ...; ];
All statements inside a
conf:rdfDocumentMetadata
block will be added as document metadata to the RDF documents published for this dataset. This feature
can be used for instance to add licensing information to your published documents.
conf:rdfDocumentMetadata [
dc:publisher <http://richard.cyganiak.de/foaf.rdf#cygri>;
];
conf:metadataTemplate "metadata.ttl";
Refers to a metadata template that is used by the metadata extension. This file is expected in directory ./WEB-INF/templates/ .
conf:webResourcePrefix "uri_prefix/";
If present, this string will be prefixed to the mapped web URIs. This is useful if you have to avoid potential name clashes with URIs already used by the server
itself. For example, if the dataset includes a URI
http://mydataset/page ,
and the dataset prefix is
http://mydataset/,
then there would be a clash after
mapping because Pubby reserves the mapped URI http://myserver/mydataset/page for its own use. In this case, you may specify a prefix like
"resource/" ,
which will result in a mapped URI of http://myserver/mydataset/resource/page.
conf:fixUnescapedCharacters "abc";
(Only needed if you have problems with funny characters in the URIs when running Pubby behind an Apache proxy)
conf:redirectRDFRequestsToEndpoint "true"/"false";
Instead of serving RDF documents, Pubby will redirect requests for RDF to
DESCRIBE
query results on the SPARQL server. This reduces Pubby's job to
serving HTML descriptions of resources. All features that affect the RDF output will have no effect, e.g. URI rewriting and adding of owl:same statements won't
work. This is useful to improve performance in cases where the SPARQL dataset has been designed with Pubby publication in mind.
Limitations
Only works for SPARQL endpoint that can answer DESCRIBE queries
Multiple dataset support may not work as expected: If a requested URI is matched by the conf:datasetURIPattern of more than one dataset (or one
doesn't have a
conf:datasetURIPattern),
then only one of the possible endpoints will be queried at a time. Pubby will never try to query multiple
endpoints in order to create a single response. In most cases, it is recommended to simply set up a separate Pubby instance for each dataset.
Hash URIs on the web side are not supported.
Support and feedback
Please email richard@cyganiak.de.
Source code and development
Pubby is open source (Apache License, Version 2.0). Pubby is hosted on GitHub. The official version of the source code is available from the
cygri/pubby repository.
Acknowledgements
This project has received contributions from Olaf Hartig and Boris Villazn-Terrazas.
6/23/2016 5:39 PM