KEMBAR78
Intertwingularity, Semantic Web and linked Geo data | PDF
Dan Brickley <danbri@few.vu.nl>
‘Semantic Web and linked Geo data’
Geonovum workshop,
Wageningen, 2010-10-12
Tuesday, 2 November 2010
Overview
• historical origins of the Semantic Web initiative
• example of SPARQL querying ‘Linked Data’
• some conclusions and suggestions
A brief introduction to SemanticWeb data sharing,
focussing on underlying principles.
Tuesday, 2 November 2010
Part 1: RDF & history
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Part 2: SemWeb today
• lessons: no global consistency;Web pages that
make claims; inter-twingularity...
• what does this mean for modern RDF tools?
• how can we share and link data in the Web, in
practice?
Tuesday, 2 November 2010
over 24.7 billion triples
over 436 million links between datasets
Tuesday, 2 November 2010
Tuesday, 2 November 2010
Tuesday, 2 November 2010
USA
UK
Tuesday, 2 November 2010
Linked Data guidelines
• 1. Use URIs as names for things (eg. schools!)
• 2. Use HTTP URIs to allow people to get info.
• 3. Publish useful info there (eg. using RDF).
• 4. Include links to other URIs in your data.
see: http://www.w3.org/DesignIssues/LinkedData.html
Tuesday, 2 November 2010
RDF/SPARQL example
“Q: Which schools in the BANES area have a nursery?”
prefix sch-ont: <http://education.data.gov.uk/def/school/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://statistics.data.gov.uk/id/local-authority-district/00HA> ;
sch-ont:nurseryProvision "true"^^xsd:boolean
}
ORDER BY ?name
examples by Leigh Dodds,Talis: http://blogs.talis.com/n2/archives/818
Tuesday, 2 November 2010
In RDF “nodes and arcs”:
Tuesday, 2 November 2010
Fosse Way School, Fosseway Infant School, Keynsham Primary
School, King Edward's School, Midsomer Norton Primary School,
Monkton Prep School, Peasedown St John Primary School, Royal
High School, Southdown Community Infant School, St Andrew's
CofE Primary School, St Keyna Primary School, St Martin's
Garden Primary School, St Saviour's CofE Infant School, The
Paragon School, Junior School of Prior Park, College Trinity Coe
VC Primary, Twerton Infant School...
(according to the SPARQL RDF database at
http://services.data.gov.uk/education/sparql )
Answer:
Tuesday, 2 November 2010
RDF/XML at http://statistics.data.gov.uk/id/local-authority-district/00HA ...
Tuesday, 2 November 2010
More SPARQL-able queries from UK linked data :
Select the name, lowest and highest age ranges,
capacity and pupil:teacher ratio for all schools
in the Bath & North East Somerset district.
What is the uri, name, and opening date
of the oldest school in the UK?
Select the name, easting and northing for
the 100 newest schools in the UK.
Select the uri, name, and the reason for closing for all
schools that are currently scheduled for closure. The reason
is a URI from a controlled vocabulary in the ontology.
In which parliamentary constituencies did schools open in 2008?
examples by Leigh Dodds,Talis: http://blogs.talis.com/n2/archives/818
Tuesday, 2 November 2010
Lessons from part 1
• no global consistency: RDF and SPARQL
allow for contradictory, competing data
• semantics: RDF/XML, RDFa, GRDDL -
several ways to get RDF statements from a
document; several publishing models for
RDF in your Web site.
• intertwingularity:“the interconnectedness
of all things” as an engineering problem...
Tuesday, 2 November 2010
‘Scope creep’
• “intertwingularity” is a silly name for a
serious problem: scope creep
• Schema designers are under constant
pressure to change, add, improve their
designs. Problems are not tidily packaged.
• RDF is built to survive this: independent
schemas and datasets can be freely mixed
together, without always ‘asking permission’.
Tuesday, 2 November 2010
In practice
• Each school could have an HTML/RDFa
page (or RDF/XML too)
• Datasets that distinguish institution from
location might publish one set of RDF;
others that flatten these aspects together
can do likewise with their data.
• Cross-dataset consistency comes later, if at
all.
Tuesday, 2 November 2010
Problems don't come nicely scoped and packaged into cleanly distinct
domains. Whenever you try to solve one problem, it borders on a dozen
others that are a higher priority for people elsewhere.
You think you're working with 'events' data but find yourself with
information describing musicians; you think you're describing musicians,
but find yourself describing digital images; you think you're describing
digital images, but find yourself describing geographic locations; you
think you're building a database of geographic locations, and find
yourself modeling the opening hours of the businesses based at those
locations.
To a poet or idealist, these interconnections might be beautiful or
inspiring; to a project manager or product manager, they are as likely to
be terrifying.
By dropping in identifiers that link to a big pile of other people's
data, we can hopefully make it easier to keep projects nicely scoped
without needlessly restricting future functionality.
An events database can remain an events database, but use identifiers
for artists and performers, making it possible to filter events by
properties of those participants. A database of places can be only a
link or two away from records describing the opening hours or business
offerings of the things at those places.
Tuesday, 2 November 2010
“Pay as you go”
integration
• there is no single “right” ontology
• data can be mixed and merged ad-hoc
• relations like owl:sameAs, skos:closeMatch
can be used to interlink datasets later
• common models emerge from bottom up,
“pave the cowpaths...”
*
* analogy by Richard Cyganiak
Tuesday, 2 November 2010
Geo questions
• Can GML, KML etc be handled in RDF?
• yes, either as links, textual ‘islands’ or some
RDF systems have extensions to support
spatial queries within SPARQL.
• Which geo-related ontology to use?
• several exist, simple and complex. It depends.
• Is it better to use a common ontology, or capture
our data exactly in a custom one?
• you can do both and let others decide.
Tuesday, 2 November 2010
Suggestions
• Build a Linked Data test-bed with several
datasets whose coverage overlaps in scope
• each dataset initially mapped to its own RDF
• experiment with finding common models;
schemas/ontologies, and shared identifiers
• evaluate against use cases expressed as
SPARQL queries
Tuesday, 2 November 2010
Conclusions
• The Semantic Web project applies Web ideas to data
sharing.
• Linked RDF datasets have different emphasis (eg.
geo, schools, politics, events), accuracy and focus.
• Treated properly this is a strength, as it allows the
Web of data to grow organically without central
control.
• Location-related data is a natural ‘hub’, often mixed
with non-geo data. RDF and SPARQL offer Web
standards for sharing and querying such mixed data,
allowing for decentralised schemas.
Tuesday, 2 November 2010
Questions?
Credits: original NeXT browser, see
http://en.wikipedia.org/wiki/WorldWideWeb
Images:Tim Berners-Lee, Richard Cyganiak,Anja Jentzsch
Tuesday, 2 November 2010

Intertwingularity, Semantic Web and linked Geo data

  • 1.
    Dan Brickley <danbri@few.vu.nl> ‘SemanticWeb and linked Geo data’ Geonovum workshop, Wageningen, 2010-10-12 Tuesday, 2 November 2010
  • 2.
    Overview • historical originsof the Semantic Web initiative • example of SPARQL querying ‘Linked Data’ • some conclusions and suggestions A brief introduction to SemanticWeb data sharing, focussing on underlying principles. Tuesday, 2 November 2010
  • 3.
    Part 1: RDF& history Tuesday, 2 November 2010
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    Part 2: SemWebtoday • lessons: no global consistency;Web pages that make claims; inter-twingularity... • what does this mean for modern RDF tools? • how can we share and link data in the Web, in practice? Tuesday, 2 November 2010
  • 14.
    over 24.7 billiontriples over 436 million links between datasets Tuesday, 2 November 2010
  • 15.
  • 16.
  • 17.
  • 18.
    Linked Data guidelines •1. Use URIs as names for things (eg. schools!) • 2. Use HTTP URIs to allow people to get info. • 3. Publish useful info there (eg. using RDF). • 4. Include links to other URIs in your data. see: http://www.w3.org/DesignIssues/LinkedData.html Tuesday, 2 November 2010
  • 19.
    RDF/SPARQL example “Q: Whichschools in the BANES area have a nursery?” prefix sch-ont: <http://education.data.gov.uk/def/school/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ?name WHERE { ?school a sch-ont:School; sch-ont:establishmentName ?name; sch-ont:districtAdministrative <http://statistics.data.gov.uk/id/local-authority-district/00HA> ; sch-ont:nurseryProvision "true"^^xsd:boolean } ORDER BY ?name examples by Leigh Dodds,Talis: http://blogs.talis.com/n2/archives/818 Tuesday, 2 November 2010
  • 20.
    In RDF “nodesand arcs”: Tuesday, 2 November 2010
  • 21.
    Fosse Way School,Fosseway Infant School, Keynsham Primary School, King Edward's School, Midsomer Norton Primary School, Monkton Prep School, Peasedown St John Primary School, Royal High School, Southdown Community Infant School, St Andrew's CofE Primary School, St Keyna Primary School, St Martin's Garden Primary School, St Saviour's CofE Infant School, The Paragon School, Junior School of Prior Park, College Trinity Coe VC Primary, Twerton Infant School... (according to the SPARQL RDF database at http://services.data.gov.uk/education/sparql ) Answer: Tuesday, 2 November 2010
  • 22.
  • 23.
    More SPARQL-able queriesfrom UK linked data : Select the name, lowest and highest age ranges, capacity and pupil:teacher ratio for all schools in the Bath & North East Somerset district. What is the uri, name, and opening date of the oldest school in the UK? Select the name, easting and northing for the 100 newest schools in the UK. Select the uri, name, and the reason for closing for all schools that are currently scheduled for closure. The reason is a URI from a controlled vocabulary in the ontology. In which parliamentary constituencies did schools open in 2008? examples by Leigh Dodds,Talis: http://blogs.talis.com/n2/archives/818 Tuesday, 2 November 2010
  • 24.
    Lessons from part1 • no global consistency: RDF and SPARQL allow for contradictory, competing data • semantics: RDF/XML, RDFa, GRDDL - several ways to get RDF statements from a document; several publishing models for RDF in your Web site. • intertwingularity:“the interconnectedness of all things” as an engineering problem... Tuesday, 2 November 2010
  • 25.
    ‘Scope creep’ • “intertwingularity”is a silly name for a serious problem: scope creep • Schema designers are under constant pressure to change, add, improve their designs. Problems are not tidily packaged. • RDF is built to survive this: independent schemas and datasets can be freely mixed together, without always ‘asking permission’. Tuesday, 2 November 2010
  • 26.
    In practice • Eachschool could have an HTML/RDFa page (or RDF/XML too) • Datasets that distinguish institution from location might publish one set of RDF; others that flatten these aspects together can do likewise with their data. • Cross-dataset consistency comes later, if at all. Tuesday, 2 November 2010
  • 27.
    Problems don't comenicely scoped and packaged into cleanly distinct domains. Whenever you try to solve one problem, it borders on a dozen others that are a higher priority for people elsewhere. You think you're working with 'events' data but find yourself with information describing musicians; you think you're describing musicians, but find yourself describing digital images; you think you're describing digital images, but find yourself describing geographic locations; you think you're building a database of geographic locations, and find yourself modeling the opening hours of the businesses based at those locations. To a poet or idealist, these interconnections might be beautiful or inspiring; to a project manager or product manager, they are as likely to be terrifying. By dropping in identifiers that link to a big pile of other people's data, we can hopefully make it easier to keep projects nicely scoped without needlessly restricting future functionality. An events database can remain an events database, but use identifiers for artists and performers, making it possible to filter events by properties of those participants. A database of places can be only a link or two away from records describing the opening hours or business offerings of the things at those places. Tuesday, 2 November 2010
  • 28.
    “Pay as yougo” integration • there is no single “right” ontology • data can be mixed and merged ad-hoc • relations like owl:sameAs, skos:closeMatch can be used to interlink datasets later • common models emerge from bottom up, “pave the cowpaths...” * * analogy by Richard Cyganiak Tuesday, 2 November 2010
  • 29.
    Geo questions • CanGML, KML etc be handled in RDF? • yes, either as links, textual ‘islands’ or some RDF systems have extensions to support spatial queries within SPARQL. • Which geo-related ontology to use? • several exist, simple and complex. It depends. • Is it better to use a common ontology, or capture our data exactly in a custom one? • you can do both and let others decide. Tuesday, 2 November 2010
  • 30.
    Suggestions • Build aLinked Data test-bed with several datasets whose coverage overlaps in scope • each dataset initially mapped to its own RDF • experiment with finding common models; schemas/ontologies, and shared identifiers • evaluate against use cases expressed as SPARQL queries Tuesday, 2 November 2010
  • 31.
    Conclusions • The SemanticWeb project applies Web ideas to data sharing. • Linked RDF datasets have different emphasis (eg. geo, schools, politics, events), accuracy and focus. • Treated properly this is a strength, as it allows the Web of data to grow organically without central control. • Location-related data is a natural ‘hub’, often mixed with non-geo data. RDF and SPARQL offer Web standards for sharing and querying such mixed data, allowing for decentralised schemas. Tuesday, 2 November 2010
  • 32.
    Questions? Credits: original NeXTbrowser, see http://en.wikipedia.org/wiki/WorldWideWeb Images:Tim Berners-Lee, Richard Cyganiak,Anja Jentzsch Tuesday, 2 November 2010