KEMBAR78
Linked Open Government Data in UK | PPTX
Linked Data

John Sheridan
@johnlsheridan
18 January 2012
“We shape our tools and
 they in turn shape us”
    Marshall McLuhan




2
The Wealth of Networks
 “Different technologies make different kinds of human action and
 interaction easier or harder to perform. All other things being equal,
 things that are easier to do are more likely to be done and things that
 are harder to do are less likely to be done.
 All other things are *never* equal.
 That is why technological determinism in the strict sense–if you have
 technology “t” you should expect social structure or relation “s” to
 emerge–is false…Neither deterministic nor wholly malleable, technology
 sets some parameters of individual and social action. It can make some
 actions, relationships, organizations and institutions easier to pursue,
 and others harder…
 The same technologies of networked computers can be adopted in very
 different patterns. There is no guarantee that networked information
 technology will lead to the improvements in innovation, freedom and
 justice that I suggest are possible…The way we develop will, in
 significant measure, depend on choices we make in the next decade or
 so.”
 – Yochai Benkler, The Wealth of Networks
Information economics and data
•   Better informed markets operate more efficiently
•   Governments are making more data available on the web
•   We are at the beginning of an age of data abundance
•   Large scale data aggregation is now possible




4
Interoperability with the world?
• [DN: insert picture of globe]




5
UK POLICY CONTEXT


6
Transparency and data.gov.uk




7
Commitments




8
Which says…
16. GOVERNMENT TRANSPARENCY

    The Government believes that we need to throw open the doors of
    public bodies, to enable the public to hold politicians and public
    bodies to account. We also recognise that this will help to deliver
    better value for money in public spending, and help us achieve our
    aim of cutting the record deficit. Setting government data free will
    bring significant economic benefits by enabling businesses and non-
    profit organisations to build innovative applications and websites.

    We will ensure that all data published by public bodies is published in
    an open and standardised format, so that it can be used easily and
    with minimal cost by third parties.



9
Open Data Policy in the UK
• Open by default
• Open Government Licence
• Seeking to address substantial policy issues through the
  use of open data
• Health and Transport data are at the forefront of this drive
• Consultation in Autumn 2011, White Paper early this year




10
CHOICES


11
Choosing formats for data




     Formats for people             Formats for machines
      Focused on presentation or    Focused on data interchange
     typographic layout             between computers
      Look good, but hard to        Look dreadful, hard for people
     access the underlying data     to understand but easy to
                                    import into other systems and
                                    use



12
A false dichotomy




     Formats for           Single     Formats for
     people               source of   machines
      Focused on                      Focused on data
     presentation or        data      interchange
     typographic layout               between computers




13
Download or programmatic access?
• Download
     o Good for static information
     o Small files
     o Used for export/import
     o Easy for publishers
     o Most of the data registered on data.gov.uk
• Programmatic access
     o Good for dynamic or real-time information or very large datasets
     o Lets developers select and use just the information they need
     o Retains more control for the publisher
     o More complicated to implement but much more powerful
     o Vital for many useful datasets



14
STANDARDS


15
Henry Maudslay (1771–1831)

He also developed the first industrially
practical screw-cutting lathe in 1800,
allowing standardisation of screw thread
sizes for the first time. This allowed the
concept of interchangeability (a idea that
was already taking hold) to be practically
applied to nuts and bolts. Before this, all
nuts and bolts had to be made as matching
pairs only. This meant that when machines
were disassembled, careful account had to
be kept of the matching nuts and bolts
ready for when reassembly took place.
http://en.wikipedia.org/wiki/Henry_Maudslay
Joseph Whitworth (1804-1887)

In 1841, Joseph Whitworth created a
design that, through its adoption by many
British railroad companies, became a
national standard for the United Kingdom
called British Standard Whitworth. During
the 1840s through 1860s, this standard
was often used in the United States and
Canada as well, in addition to myriad
intra- and inter-company standards. .
http://en.wikipedia.org/wiki/Screw_thread
#History_of_standardization
Tim Berners-Lee five stars
*     make your stuff available on the Web
      (whatever format)        under an open licence
**    make it available as structured data (e.g.,
      Excel instead of image scan of a table)
*** use non-proprietary formats (e.g., CSV
      instead of Excel)
**** use URIs to identify things, so that people
      can point at your stuff
***** link your data to other data to provide
      context

18
LINKED DATA


19
Linked Data
•    Give names, or web identifiers (URIs), to things
•    Publish information about them as Web Resources
•    Use RDF triples (subject, property, value)
•    Link to other data about those things




20
Benefits
• Enables web-scale data publishing - distributed
  publication with web-based discovery mechanisms
• Everything is a resource – follow your nose to
  discover more about properties, classes, or codes
  within a code list
• Everything can be annotated - make comments
  about observations, data series, points on a map
• Easy to extend - create new properties as required,
  no need to plan everything up-front
• Easy to merge - slot together RDF graphs, no need
  to worry about name clashes


                                                    21
You can do more with Linked Data
UK Government has been:
• developing standards for responsible publishing of key
  types of data (financial data, organisation data, aggregate
  statistics, location data)
• developing guidance, practices and tools that make it
  easy to publish data in Linked Data form, at low cost
• making it easy for people to consume data in a
  programmatic way
Types of data:
                                                                  2008      2009    2010
                           Director
                           General                         A      1,345     1,456   2,301

                                                           B      2,112     3,543   2,111

                                                           C      2,345     2,987   2,455
              Director                 Director
            (Operations)              (Strategy)           D      6,342     6,256   6,123

                                                           E      7,435     7,432   8,102

Deputy Director     Deputy Director
     (A)                  Transaction
                         (A)                  Date             Supplier              Amount

                            A-1263            09/09/2010       Spottiswoode & Co     £ 2,345

                            A-1264            09/09/2010       JSB & Sons            £ 2,111

                            A-1265            09/09/2010       BLG Ltd               £ 2,455

                            A-1266            09/09/2010       Spottiswoode & Co     £ 6,123

                            A-1267            09/09/2010       BLG Ltd               £ 8,102
Naming things with URIs
•   URI = uniform resource identifier
•   Everything starts HTTP – which gives us actionable names
•   There is choice about how to make URIs
•   We are using
    {sector}.data.gov.uk/id/{something}




                                                               25
Location URIs for INSPIRE
Naming things in legislation
Naming things in legislation
• If you visit legislation.gov.uk you will see we have taken
  great care with naming things



Returns an html document for United Kingdom Public General Act (ukpga),
2005, Chapter 14, Section 1




Returns an html document with a list from all legislation types where the
title contains “wildlife”
Some names are quite sophisticated…


•    UK Public General Act (ukpga)
•    1981
•    Chapter 69
•    Section 5
•    As it extends to England
•    As it stood on 30th January 2001
•    Displayed as an HTML document with the timeline on
•    Although URIs are opaque having this type of design
     changes how people use the service


29
Legislation as Open Data
• Everything on legislation.gov.uk is available as open data
  under the terms of our Open Government Licence
• To access the data, visit any page and add:
     o /data.xml
     o /data.rdf
     o /data.xht
• For lists
     o /data.feed




30
Linked Data Standards
• Re-use where we can, create where we must
• Small, high level, light weight vocabularies
   o Examples include datacube, organization, provenance
• Create local specialisations
   o Examples include payments, central-government
• Post hoc linking




                                                           31
Data cube vocabulary
                                                                                            qb:componentRequired : boolean
 qb:DataStructureDefinition                              qb:ComponentSpecification          qb:componentAttachment : rdfs:Class
                                                                                            qb:order : xsd:int
                                   qb:sliceKey             qb:componentProperty
                                                                                           qb:dimension
  qb:structure                                                                             qb:attribute
                                                     qb:componentProperty                  qb:measure
 qb:DataSet                    qb:SliceKey
                 qb:slice
                              qb:sliceStructure
                                                                   qb:ComponentProperty
  qb:dataset
                                                      qb:concept                       qb:DimensionProperty
                  qb:Slice
                                                                                             qb:measureType
                                qb:subSlice
                                                    skos:Concept
                                                                                        qb:AttributeProperty
                  qb:observation

qb:Observation                                     sdmx:Concept                          qb:MeasureProperty

                                                                                  qb:CodedProperty
                                                  sdmx:ConceptRole
                                                                                  qb:codeList
                                                        sdmx:FrequencyRole                      skos:ConceptScheme
                                                        sdmx:CountRole
                                                        sdmx:EntityRole
                                                        sdmx:TimeRole                               sdmx:CodeList
                                                        ...
Payments (a cube specialisation)
                                                    qb:structure
                                                                                                  payer
                                                                                                           foaf:Agent
                       qb:dataset              PaymentDataset
                                                                                                  payee
                                                                                                           foaf:Agent

                                                               qb:slice
                                                                                                  unit
                                                                                                          org:OrganizationalUnit

                                                                                                  date
                                              payment                                                     interval:Interval
                                                                          Payment
                                          expenditureLine                                    purchase

                                                                               order                      Purchase
                  ExpenditureLine
                                                                               invoice
                                                                                                                         narrative
                                    amountIncludingVAT
                                                                               contract
                                    amountExcludingVAT
                                                                                                               procurementCategory
                                                                               transactionReference
     expenditureCode                vatCategory
                                    vatRate                                    paymentReference                          skos:Concept

                                                                               totalAmountIncludingVAT
     skos:Concept
                                item                                           totalAmountExcludingVAT
                                                                                                              redacted
        revenue                                Item
        capital
                                                                    skos:Concept
                                                ItemCategory



33
DATA


34
Reference data
 http://reference.data.gov.uk/id/day/2012-01-18

 http://reference.data.gov.uk/id/department/CO

 http://transport.data.gov.uk/id/station/WAT

 http://education.data.gov.uk/id/school/341451

 http://location.data.gov.uk/id/3245677362123

 http://www.legislation.gov.uk/id/ukpga/2009/12/section/2
British time intervals
 • http://reference.data.gov.uk/id/day/2011-06-1
 • There are similar URIs for seconds, minutes, hours,
   weeks, months, quarters, years
 • We were a bit slow (170 years) to move from the Julian
   to Gregorian Calendar (see the Calendar Act, 1750)
 • To transition, we lost 11 days in 1752
 • Convoluted explanation of why the tax year in the UK
   starts on the 6th April
 • Our URIs for time intervals work this way too and the
   British time intervals URI Set is linked to the
   legislation
PRODUCTION


37
Chop-O-Matic
• Malcolm Gladwell article on Ron Popeil from 2000 in the
  New Yorker:

• ”And how do you persuade people to disrupt their lives? Not
  merely by ingratiation or sincerity, and not by being famous
  or beautiful. You have to explain the invention to consumers
  - not once or twice but three or four times, with a different
  twist each time. You have to show them exactly how it
  works and why it works, and make them follow your hands
  as you chop liver with it, and then tell them precisely how it
  fits into their routine, and, finally, sell them on the
  paradoxical fact that, revolutionary as the gadget is, it's not
  at all hard to use.”
Google Refine (formerly Gridworks)




39
Use Refine to map and export Linked Data




40
PUBLISHING


41
42
Linked Data API
•   Open Standard
•   Generic approach for creating APIs from Linked Data
•   Sits on top of a Linked Data store
•   Several implementations, most mature is Puelia




                                                          43
44
45
CASE STUDIES


46
Back to those commitments




47
Publishing Organisation Data
• We will require public bodies to publish online the job titles
  of every member of staff and the salaries and expenses of
  senior officials paid more than the lowest salary permissible
  in Pay Band 1 of the Senior Civil Service pay scale, and
  organograms that include all positions in those bodies.
Our first go…
• October 2010
• CSV template and PDFs of organograms, typically authored
  using Powerpoint
• Emphasis on visual appearance, led to inconsistent
  datasets which are very hard to re-use
• No relationship between the organogram and data
• Not using web standards




                                                         49
Press Release

    “The Government has published
    the most comprehensive
    organisational charts of the UK
    Civil Service ever released online,
    taking another step towards its
    goal of being the most transparent
    government in the world and
    opening up the structure of the
    Civil Service to public scrutiny”
It’s *all* Linked Data
• 100s of UK Government Organisations published their
  organisation data as Linked Data
• Distributed data publishing
• The data is deeply linked (Departments, Grades ,
  Professions, date of the snapshot)
• Cross dataset queries are perhaps the most interesting
• Proves Linked Data is moving from research topic to
  commodity publishing
• We can now extend this approach to other types of dataset
  and link our transparency data




                                                              51
Our aims with Organogram Data
• Make it as simple as possible for people in Departments to
  create Linked Data
• Create high quality, consistent data that matches the policy
  intent and guidance
• Distributed capture and publishing
• Create open data in open standards using open source
  tools
• Human readable and machine readable from single source
• Provide download and API access in different formats
  (CSV, XML, JSON, RDF, HTML)
• Evolutionary route to create longitudinal datasets,
  reconciling against previous data
• Enable everyone to publish 5 Star Linked Data

                                                                 52
The process
• Capture organisation data using a spreadsheet, which
  verifies policy rules and datatypes
• Upload spreadsheet
• Preview organogram
• Download RDF and two CSVs
• Publish on your website and register with data.gov.uk




                                                          53
The Excel bit…
• It’s the tool most Civil Servants have
• This *does* also work in Libre Office / Open Office etc




                                                            54
55
56
57
Linked Data Publishing Infrastructure
                                                                                                       Organogram
                                                                                                       HTML, CSS &
                                                                                                        JavaScript

     Excel
      file
                                                                                          HTML     XML JSON
        1. Upload Excel


                                Organogram (PHP)                                           Linked Data API
         2. Create        3. Create        4. Query           5. Create
           CSVs           Mapping         (SPARQL)               RDF

                                                         RDF file
 Senior         Junior        Mapping                                                                    API
                                                                          6. Load
  CSV            CSV           TRiG                                        RDF                          Config

                                                                                                 7. Query
                                          XLWrap                                                (SPARQL)

                                                                                     Sesame
                                            TDB                                     RDF Store

                                        Reconciliation


58
Linked Data adds value
• Implicit properties are made explicit (person, role, person in
  a role)
• Reconciliation adds value by automatic linking to other data
• Provenance
• Example data
• Explicit open licence
60
On the web, everything is a claim
• How did you come by this information?
• What did you do with it?
• When, who and how?




62
An opportunity
• We are developing a new system for publishing legislation,
  operating inside the government secure intranet / extranet
• We want to provide evidence that supports the data we are
  publishing




63
Legislation workflows
• Complicated and vary by jurisdiction and content type
• We take documents in different formats (Word,
  Framemaker) and convert them to a single format (XML)
• We store XML documents in an XML Database
• We take documents from a single format (XML) and
  transform them to different formats (HTML and PDF)
• Complex processes for handling images etc
• Sometimes mistakes are made, which can be corrected
  through a “Correction Slip”




64
Objectives for provenance with legislation
• Transparency and public trust - we substantiate our claim
  that this web page is what the legislation says
• The audit trail is repeatable
• Performs automatic checks along the way and evidence
  that checking
• Use digital signatures rather than rely on the immutability of
  paper, to ensure authenticity
• Create a data source we can use to resolve any disputes
  (where did that footnote go?)
• Create a data source we can use to measure contractual
  performance (how long did it take to publish that
  document?)

65
Our technology choices
• We use both XML and RDF
• XML is brilliant for single source publishing solutions – one
  source, many outputs
• RDF provides a flexible data model for other types of
  information (bibliographic metadata, but also things like
  which item of legislation has changed what)
• We are recording provenance in RDF using the Open
  Provenance Model Vocabulary




66
Open Provenance Model Vocabulary

     Opmv:Artifact(k-1)                                   Opmv:Agent
       Opmv:Artifact(k-1)
          Opmv:Artifact     Opmv:wasPerformedBy


          Opmv:used
                                                   Opmv:wasControlledBy

                       Opmv:Process


                                             Opmv:wasGeneratedBy


       Document(k)                                Opmv:Artifact(k)
         Document(k)                                Opmv:Artifact(k)
            Document                                  Opmv:Artifact




67
Provenance chain audit trail
<urn:uuid:6F677120-152C-11E1-8715-
95963F5713B6>
<http://w8www077254:9999/vsrs_api/bundle/2011-11-
09/2/uksi/task/word-export-wml/1>
       a     ns0:Process ;
       rdfs: "Word Export to WML1 Process" ;
       ns0:wasControlledBy                                         Container1
<http://www.legislation.gov.uk/id/software/MsWord/2003> ,
<http://www.legislation.gov.uk/id/software/WordToClml/1.0> ;
       ns1:hasParentProcess
            <http://w8www077254:9999/vsrs_api/bundle/2011-11-
09/2/uksi/task/word-to-xml> ;
       ns2:source <http://w8www077254:9999/vsrs_api/bundle/2011-
11-09/2/uksi/data.doc> .
}                                                                  Signature(c1)       Container2
<urn:uuid:6FA2F380-152C-11E1-8715-
C9B1D4C6E3FB> {<urn:uuid:6F677120-152C-
11E1-8715-95963F5713B6>
     swp:assertedBy <urn:uuid:6FA2F380-152C-11E1-8715-
C9B1D4C6E3FB> ;
     swp:digest
                                                                                Signature(c2)
"N2U1ZGZhMzI3M2IzNmFjNDNlMmZkZTkyZTkwY2RlYWY4NmU5M
DJiYw=="^^<http://www.w3.org/2001/XMLSchema#base64Binary> ;
                                                                                                   Container3
     swp:digestMethod swp:JjcRdfC14N-sha1 .

  <urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB>
      swp:assertedBy <urn:uuid:6FA2F380-152C-11E1-8715-
C9B1D4C6E3FB> ;
      swp:authority <http://www.tsoshop.co.uk> ;
      swp:signature
"kWcf…6g=="^^<http://www.w3.org/2001/XMLSchema#base64Binary
                                                                                                Signature(c3)
>;
      swp:signatureMethod swp:JjcRdfC14N-rsa-sha1 .
  <http://www.tsoshop.co.uk>
      swp:X509Certificate "MIIG …. “ .
}



68
Publishing provenance
• Provenance information may be associated by including a
  <link> element in the HTML <head> section:

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
     <link rel="provenance" href="provenance-URI">
     <link rel="anchor" href="entity-URI">
     <title>Welcome to example.com</title>
  </head>
  <body> ... </body>
</html>




69
Summary
• Linked Data is essential to realising the promise of Open
  Government Data
• Using Linked Data means working on
   o Standards
   o Reference Data
   o Production
   o Publishing
• Benefits grow with the more data you want to combine
• Lots of opportunities for international collaboration
• Best advice, just start
Questions?




71

Linked Open Government Data in UK

  • 1.
  • 2.
    “We shape ourtools and they in turn shape us” Marshall McLuhan 2
  • 3.
    The Wealth ofNetworks “Different technologies make different kinds of human action and interaction easier or harder to perform. All other things being equal, things that are easier to do are more likely to be done and things that are harder to do are less likely to be done. All other things are *never* equal. That is why technological determinism in the strict sense–if you have technology “t” you should expect social structure or relation “s” to emerge–is false…Neither deterministic nor wholly malleable, technology sets some parameters of individual and social action. It can make some actions, relationships, organizations and institutions easier to pursue, and others harder… The same technologies of networked computers can be adopted in very different patterns. There is no guarantee that networked information technology will lead to the improvements in innovation, freedom and justice that I suggest are possible…The way we develop will, in significant measure, depend on choices we make in the next decade or so.” – Yochai Benkler, The Wealth of Networks
  • 4.
    Information economics anddata • Better informed markets operate more efficiently • Governments are making more data available on the web • We are at the beginning of an age of data abundance • Large scale data aggregation is now possible 4
  • 5.
    Interoperability with theworld? • [DN: insert picture of globe] 5
  • 6.
  • 7.
  • 8.
  • 9.
    Which says… 16. GOVERNMENTTRANSPARENCY The Government believes that we need to throw open the doors of public bodies, to enable the public to hold politicians and public bodies to account. We also recognise that this will help to deliver better value for money in public spending, and help us achieve our aim of cutting the record deficit. Setting government data free will bring significant economic benefits by enabling businesses and non- profit organisations to build innovative applications and websites. We will ensure that all data published by public bodies is published in an open and standardised format, so that it can be used easily and with minimal cost by third parties. 9
  • 10.
    Open Data Policyin the UK • Open by default • Open Government Licence • Seeking to address substantial policy issues through the use of open data • Health and Transport data are at the forefront of this drive • Consultation in Autumn 2011, White Paper early this year 10
  • 11.
  • 12.
    Choosing formats fordata Formats for people Formats for machines  Focused on presentation or  Focused on data interchange typographic layout between computers  Look good, but hard to  Look dreadful, hard for people access the underlying data to understand but easy to import into other systems and use 12
  • 13.
    A false dichotomy Formats for Single Formats for people source of machines  Focused on  Focused on data presentation or data interchange typographic layout between computers 13
  • 14.
    Download or programmaticaccess? • Download o Good for static information o Small files o Used for export/import o Easy for publishers o Most of the data registered on data.gov.uk • Programmatic access o Good for dynamic or real-time information or very large datasets o Lets developers select and use just the information they need o Retains more control for the publisher o More complicated to implement but much more powerful o Vital for many useful datasets 14
  • 15.
  • 16.
    Henry Maudslay (1771–1831) Healso developed the first industrially practical screw-cutting lathe in 1800, allowing standardisation of screw thread sizes for the first time. This allowed the concept of interchangeability (a idea that was already taking hold) to be practically applied to nuts and bolts. Before this, all nuts and bolts had to be made as matching pairs only. This meant that when machines were disassembled, careful account had to be kept of the matching nuts and bolts ready for when reassembly took place. http://en.wikipedia.org/wiki/Henry_Maudslay
  • 17.
    Joseph Whitworth (1804-1887) In1841, Joseph Whitworth created a design that, through its adoption by many British railroad companies, became a national standard for the United Kingdom called British Standard Whitworth. During the 1840s through 1860s, this standard was often used in the United States and Canada as well, in addition to myriad intra- and inter-company standards. . http://en.wikipedia.org/wiki/Screw_thread #History_of_standardization
  • 18.
    Tim Berners-Lee fivestars * make your stuff available on the Web (whatever format) under an open licence ** make it available as structured data (e.g., Excel instead of image scan of a table) *** use non-proprietary formats (e.g., CSV instead of Excel) **** use URIs to identify things, so that people can point at your stuff ***** link your data to other data to provide context 18
  • 19.
  • 20.
    Linked Data • Give names, or web identifiers (URIs), to things • Publish information about them as Web Resources • Use RDF triples (subject, property, value) • Link to other data about those things 20
  • 21.
    Benefits • Enables web-scaledata publishing - distributed publication with web-based discovery mechanisms • Everything is a resource – follow your nose to discover more about properties, classes, or codes within a code list • Everything can be annotated - make comments about observations, data series, points on a map • Easy to extend - create new properties as required, no need to plan everything up-front • Easy to merge - slot together RDF graphs, no need to worry about name clashes 21
  • 22.
    You can domore with Linked Data
  • 23.
    UK Government hasbeen: • developing standards for responsible publishing of key types of data (financial data, organisation data, aggregate statistics, location data) • developing guidance, practices and tools that make it easy to publish data in Linked Data form, at low cost • making it easy for people to consume data in a programmatic way
  • 24.
    Types of data: 2008 2009 2010 Director General A 1,345 1,456 2,301 B 2,112 3,543 2,111 C 2,345 2,987 2,455 Director Director (Operations) (Strategy) D 6,342 6,256 6,123 E 7,435 7,432 8,102 Deputy Director Deputy Director (A) Transaction (A) Date Supplier Amount A-1263 09/09/2010 Spottiswoode & Co £ 2,345 A-1264 09/09/2010 JSB & Sons £ 2,111 A-1265 09/09/2010 BLG Ltd £ 2,455 A-1266 09/09/2010 Spottiswoode & Co £ 6,123 A-1267 09/09/2010 BLG Ltd £ 8,102
  • 25.
    Naming things withURIs • URI = uniform resource identifier • Everything starts HTTP – which gives us actionable names • There is choice about how to make URIs • We are using {sector}.data.gov.uk/id/{something} 25
  • 26.
  • 27.
    Naming things inlegislation
  • 28.
    Naming things inlegislation • If you visit legislation.gov.uk you will see we have taken great care with naming things Returns an html document for United Kingdom Public General Act (ukpga), 2005, Chapter 14, Section 1 Returns an html document with a list from all legislation types where the title contains “wildlife”
  • 29.
    Some names arequite sophisticated… • UK Public General Act (ukpga) • 1981 • Chapter 69 • Section 5 • As it extends to England • As it stood on 30th January 2001 • Displayed as an HTML document with the timeline on • Although URIs are opaque having this type of design changes how people use the service 29
  • 30.
    Legislation as OpenData • Everything on legislation.gov.uk is available as open data under the terms of our Open Government Licence • To access the data, visit any page and add: o /data.xml o /data.rdf o /data.xht • For lists o /data.feed 30
  • 31.
    Linked Data Standards •Re-use where we can, create where we must • Small, high level, light weight vocabularies o Examples include datacube, organization, provenance • Create local specialisations o Examples include payments, central-government • Post hoc linking 31
  • 32.
    Data cube vocabulary qb:componentRequired : boolean qb:DataStructureDefinition qb:ComponentSpecification qb:componentAttachment : rdfs:Class qb:order : xsd:int qb:sliceKey qb:componentProperty qb:dimension qb:structure qb:attribute qb:componentProperty qb:measure qb:DataSet qb:SliceKey qb:slice qb:sliceStructure qb:ComponentProperty qb:dataset qb:concept qb:DimensionProperty qb:Slice qb:measureType qb:subSlice skos:Concept qb:AttributeProperty qb:observation qb:Observation sdmx:Concept qb:MeasureProperty qb:CodedProperty sdmx:ConceptRole qb:codeList sdmx:FrequencyRole skos:ConceptScheme sdmx:CountRole sdmx:EntityRole sdmx:TimeRole sdmx:CodeList ...
  • 33.
    Payments (a cubespecialisation) qb:structure payer foaf:Agent qb:dataset PaymentDataset payee foaf:Agent qb:slice unit org:OrganizationalUnit date payment interval:Interval Payment expenditureLine purchase order Purchase ExpenditureLine invoice narrative amountIncludingVAT contract amountExcludingVAT procurementCategory transactionReference expenditureCode vatCategory vatRate paymentReference skos:Concept totalAmountIncludingVAT skos:Concept item totalAmountExcludingVAT redacted revenue Item capital skos:Concept ItemCategory 33
  • 34.
  • 35.
    Reference data http://reference.data.gov.uk/id/day/2012-01-18 http://reference.data.gov.uk/id/department/CO http://transport.data.gov.uk/id/station/WAT http://education.data.gov.uk/id/school/341451 http://location.data.gov.uk/id/3245677362123 http://www.legislation.gov.uk/id/ukpga/2009/12/section/2
  • 36.
    British time intervals • http://reference.data.gov.uk/id/day/2011-06-1 • There are similar URIs for seconds, minutes, hours, weeks, months, quarters, years • We were a bit slow (170 years) to move from the Julian to Gregorian Calendar (see the Calendar Act, 1750) • To transition, we lost 11 days in 1752 • Convoluted explanation of why the tax year in the UK starts on the 6th April • Our URIs for time intervals work this way too and the British time intervals URI Set is linked to the legislation
  • 37.
  • 38.
    Chop-O-Matic • Malcolm Gladwellarticle on Ron Popeil from 2000 in the New Yorker: • ”And how do you persuade people to disrupt their lives? Not merely by ingratiation or sincerity, and not by being famous or beautiful. You have to explain the invention to consumers - not once or twice but three or four times, with a different twist each time. You have to show them exactly how it works and why it works, and make them follow your hands as you chop liver with it, and then tell them precisely how it fits into their routine, and, finally, sell them on the paradoxical fact that, revolutionary as the gadget is, it's not at all hard to use.”
  • 39.
  • 40.
    Use Refine tomap and export Linked Data 40
  • 41.
  • 42.
  • 43.
    Linked Data API • Open Standard • Generic approach for creating APIs from Linked Data • Sits on top of a Linked Data store • Several implementations, most mature is Puelia 43
  • 44.
  • 45.
  • 46.
  • 47.
    Back to thosecommitments 47
  • 48.
    Publishing Organisation Data •We will require public bodies to publish online the job titles of every member of staff and the salaries and expenses of senior officials paid more than the lowest salary permissible in Pay Band 1 of the Senior Civil Service pay scale, and organograms that include all positions in those bodies.
  • 49.
    Our first go… •October 2010 • CSV template and PDFs of organograms, typically authored using Powerpoint • Emphasis on visual appearance, led to inconsistent datasets which are very hard to re-use • No relationship between the organogram and data • Not using web standards 49
  • 50.
    Press Release “The Government has published the most comprehensive organisational charts of the UK Civil Service ever released online, taking another step towards its goal of being the most transparent government in the world and opening up the structure of the Civil Service to public scrutiny”
  • 51.
    It’s *all* LinkedData • 100s of UK Government Organisations published their organisation data as Linked Data • Distributed data publishing • The data is deeply linked (Departments, Grades , Professions, date of the snapshot) • Cross dataset queries are perhaps the most interesting • Proves Linked Data is moving from research topic to commodity publishing • We can now extend this approach to other types of dataset and link our transparency data 51
  • 52.
    Our aims withOrganogram Data • Make it as simple as possible for people in Departments to create Linked Data • Create high quality, consistent data that matches the policy intent and guidance • Distributed capture and publishing • Create open data in open standards using open source tools • Human readable and machine readable from single source • Provide download and API access in different formats (CSV, XML, JSON, RDF, HTML) • Evolutionary route to create longitudinal datasets, reconciling against previous data • Enable everyone to publish 5 Star Linked Data 52
  • 53.
    The process • Captureorganisation data using a spreadsheet, which verifies policy rules and datatypes • Upload spreadsheet • Preview organogram • Download RDF and two CSVs • Publish on your website and register with data.gov.uk 53
  • 54.
    The Excel bit… •It’s the tool most Civil Servants have • This *does* also work in Libre Office / Open Office etc 54
  • 55.
  • 56.
  • 57.
  • 58.
    Linked Data PublishingInfrastructure Organogram HTML, CSS & JavaScript Excel file HTML XML JSON 1. Upload Excel Organogram (PHP) Linked Data API 2. Create 3. Create 4. Query 5. Create CSVs Mapping (SPARQL) RDF RDF file Senior Junior Mapping API 6. Load CSV CSV TRiG RDF Config 7. Query XLWrap (SPARQL) Sesame TDB RDF Store Reconciliation 58
  • 59.
    Linked Data addsvalue • Implicit properties are made explicit (person, role, person in a role) • Reconciliation adds value by automatic linking to other data • Provenance • Example data • Explicit open licence
  • 60.
  • 62.
    On the web,everything is a claim • How did you come by this information? • What did you do with it? • When, who and how? 62
  • 63.
    An opportunity • Weare developing a new system for publishing legislation, operating inside the government secure intranet / extranet • We want to provide evidence that supports the data we are publishing 63
  • 64.
    Legislation workflows • Complicatedand vary by jurisdiction and content type • We take documents in different formats (Word, Framemaker) and convert them to a single format (XML) • We store XML documents in an XML Database • We take documents from a single format (XML) and transform them to different formats (HTML and PDF) • Complex processes for handling images etc • Sometimes mistakes are made, which can be corrected through a “Correction Slip” 64
  • 65.
    Objectives for provenancewith legislation • Transparency and public trust - we substantiate our claim that this web page is what the legislation says • The audit trail is repeatable • Performs automatic checks along the way and evidence that checking • Use digital signatures rather than rely on the immutability of paper, to ensure authenticity • Create a data source we can use to resolve any disputes (where did that footnote go?) • Create a data source we can use to measure contractual performance (how long did it take to publish that document?) 65
  • 66.
    Our technology choices •We use both XML and RDF • XML is brilliant for single source publishing solutions – one source, many outputs • RDF provides a flexible data model for other types of information (bibliographic metadata, but also things like which item of legislation has changed what) • We are recording provenance in RDF using the Open Provenance Model Vocabulary 66
  • 67.
    Open Provenance ModelVocabulary Opmv:Artifact(k-1) Opmv:Agent Opmv:Artifact(k-1) Opmv:Artifact Opmv:wasPerformedBy Opmv:used Opmv:wasControlledBy Opmv:Process Opmv:wasGeneratedBy Document(k) Opmv:Artifact(k) Document(k) Opmv:Artifact(k) Document Opmv:Artifact 67
  • 68.
    Provenance chain audittrail <urn:uuid:6F677120-152C-11E1-8715- 95963F5713B6> <http://w8www077254:9999/vsrs_api/bundle/2011-11- 09/2/uksi/task/word-export-wml/1> a ns0:Process ; rdfs: "Word Export to WML1 Process" ; ns0:wasControlledBy Container1 <http://www.legislation.gov.uk/id/software/MsWord/2003> , <http://www.legislation.gov.uk/id/software/WordToClml/1.0> ; ns1:hasParentProcess <http://w8www077254:9999/vsrs_api/bundle/2011-11- 09/2/uksi/task/word-to-xml> ; ns2:source <http://w8www077254:9999/vsrs_api/bundle/2011- 11-09/2/uksi/data.doc> . } Signature(c1) Container2 <urn:uuid:6FA2F380-152C-11E1-8715- C9B1D4C6E3FB> {<urn:uuid:6F677120-152C- 11E1-8715-95963F5713B6> swp:assertedBy <urn:uuid:6FA2F380-152C-11E1-8715- C9B1D4C6E3FB> ; swp:digest Signature(c2) "N2U1ZGZhMzI3M2IzNmFjNDNlMmZkZTkyZTkwY2RlYWY4NmU5M DJiYw=="^^<http://www.w3.org/2001/XMLSchema#base64Binary> ; Container3 swp:digestMethod swp:JjcRdfC14N-sha1 . <urn:uuid:6FA2F380-152C-11E1-8715-C9B1D4C6E3FB> swp:assertedBy <urn:uuid:6FA2F380-152C-11E1-8715- C9B1D4C6E3FB> ; swp:authority <http://www.tsoshop.co.uk> ; swp:signature "kWcf…6g=="^^<http://www.w3.org/2001/XMLSchema#base64Binary Signature(c3) >; swp:signatureMethod swp:JjcRdfC14N-rsa-sha1 . <http://www.tsoshop.co.uk> swp:X509Certificate "MIIG …. “ . } 68
  • 69.
    Publishing provenance • Provenanceinformation may be associated by including a <link> element in the HTML <head> section: <html xmlns="http://www.w3.org/1999/xhtml"> <head> <link rel="provenance" href="provenance-URI"> <link rel="anchor" href="entity-URI"> <title>Welcome to example.com</title> </head> <body> ... </body> </html> 69
  • 70.
    Summary • Linked Datais essential to realising the promise of Open Government Data • Using Linked Data means working on o Standards o Reference Data o Production o Publishing • Benefits grow with the more data you want to combine • Lots of opportunities for international collaboration • Best advice, just start
  • 71.

Editor's Notes

  • #4 Different technologies make different kinds of human action and interaction easier or harder to perform. All other things being equal, things that are easier to do are more likely to be done and things that are harder to do are less likely to be done. All other things are never equal. That is why technological determinism in the strict sense–if you have technology “t” you should expect social structure or relation “s” to emerge–is false…Neither deterministic nor wholly malleable, technology sets some parameters of individual and social action. It can make some actions, relationships, organizations and institutions easier to pursue, and others harder…The same technologies of networked computers can be adopted in very different patterns. There is no guarantee that networked information technology will lead to the improvements in innovation, freedom and justice that I suggest are possible…The way we develop will, in significant measure, depend on choices we make in the next decade or so.
  • #5 Combination of OSS, cloud computing and other similar trends.
  • #29 Names are important, they provide the framework or the architecture around which
  • #56 Upload your spreadsheet
  • #57 Preview your data in different ways
  • #58 Then simply download your RDF!
  • #59 Architecture
  • #61 A “data explorer” view, for filtering and querying data and pulling it back in different formats