KEMBAR78
The Other Side of Linked Open Data: Managing Metadata Aggregation | PPTX
The Other Side of Linked Data:
Managing Metadata Aggregation
ALCTS Metadata Interest Group
ALA Midwinter 2014
Where Are We Now?
• Major projects so far focused on exposing
selected portions of their data for
‘experimentation’
– Who’s using this data?
– Can LOD for libraries succeed on that basis?
• LOD is not just outputs, needs actual use to
inform practice
– A more complete view of the environment and
workflow should help
Outline
• Limitations of the traditional database strategy
– Including records, normalization, de-duplication, etc.
• Components of a fuller view
– Workflow
– Inputs, outputs
– Data cache and services
– Need for automated orchestration
– The maintenance conundrum
Substituting a Cache for a Database
• Supports multiple streams of data
• Allows detailed provenance to be carried over
time
• Separates services from data storage
• Allows more extensive automation (and
orchestration of services)
• Focuses valuable human effort where it’s
needed: analysis, design and implementation
of improvement services
Workflow
• Obtain data (possibly as ‘records’)
• Store data as statements in cache
• Evaluate data by source or collection
• Improve data using specific services, as
determined by evaluation
• Publish improved data
• [Rinse, repeat]
Yellow=Data we use now
Green=Data we’re adding
Yellow=Data we share now
Orange=Data we propose to share
Green=Data categories we can share
Developing and Defining Services
• Small single purpose services are easier to
develop and maintain
– What services you need are determined by goals,
evaluation results, etc.
– ‘Orchestration’ of services applies them to specific
kinds of data, in order
– Services can be described, and linked, to expose
who, what, when and how to downstream users
Developing Automated Interaction
• Rule: Use humans for things requiring human
understanding and decision making
– Use machines for everything else
– A manual process for something a machine can do as
well or better is a failure
• Improvement services can be granular, invoked in
prescribed order, and report results for later use
– Continuous improvement necessary to respond to
continuous change
Data Maintenance
• Improved data returns as statements to the data
cache, with provenance attached
• Statement strategy avoids overwriting of new data
over ‘improved’ data
• Each new statement adds to what is known about a
described resource
• Statements can be cherry picked and exposed to others in
statements or records, in ‘flavors’ or as a ‘everything we
have’
Contact
Information
Diane Hillmann
metadata.maven@gmail.com
Gordon Dunsire
gordon@gordondunsire.com
Jon Phipps
jonphipps@gmail.com
The First MetadataMobile

The Other Side of Linked Open Data: Managing Metadata Aggregation

  • 1.
    The Other Sideof Linked Data: Managing Metadata Aggregation ALCTS Metadata Interest Group ALA Midwinter 2014
  • 2.
    Where Are WeNow? • Major projects so far focused on exposing selected portions of their data for ‘experimentation’ – Who’s using this data? – Can LOD for libraries succeed on that basis? • LOD is not just outputs, needs actual use to inform practice – A more complete view of the environment and workflow should help
  • 3.
    Outline • Limitations ofthe traditional database strategy – Including records, normalization, de-duplication, etc. • Components of a fuller view – Workflow – Inputs, outputs – Data cache and services – Need for automated orchestration – The maintenance conundrum
  • 4.
    Substituting a Cachefor a Database • Supports multiple streams of data • Allows detailed provenance to be carried over time • Separates services from data storage • Allows more extensive automation (and orchestration of services) • Focuses valuable human effort where it’s needed: analysis, design and implementation of improvement services
  • 5.
    Workflow • Obtain data(possibly as ‘records’) • Store data as statements in cache • Evaluate data by source or collection • Improve data using specific services, as determined by evaluation • Publish improved data • [Rinse, repeat]
  • 7.
    Yellow=Data we usenow Green=Data we’re adding
  • 9.
    Yellow=Data we sharenow Orange=Data we propose to share Green=Data categories we can share
  • 10.
    Developing and DefiningServices • Small single purpose services are easier to develop and maintain – What services you need are determined by goals, evaluation results, etc. – ‘Orchestration’ of services applies them to specific kinds of data, in order – Services can be described, and linked, to expose who, what, when and how to downstream users
  • 11.
    Developing Automated Interaction •Rule: Use humans for things requiring human understanding and decision making – Use machines for everything else – A manual process for something a machine can do as well or better is a failure • Improvement services can be granular, invoked in prescribed order, and report results for later use – Continuous improvement necessary to respond to continuous change
  • 13.
    Data Maintenance • Improveddata returns as statements to the data cache, with provenance attached • Statement strategy avoids overwriting of new data over ‘improved’ data • Each new statement adds to what is known about a described resource • Statements can be cherry picked and exposed to others in statements or records, in ‘flavors’ or as a ‘everything we have’
  • 14.

Editor's Notes

  • #3 If LOD exists in multiple versions, and nobody uses it, does it make noise?
  • #9 Evaluation using statistical analysis tool, from http://dcpapers.dublincore.org/pubs/article/view/744, Analyzing Metadata for Effective Use and Re-Use Naomi Dushay, Diane I. Hillmann
  • #13 Revised diagram from: Orchestrating metadata enhancement services: Introducing Lenny Jon Phipps, Diane I. Hillmann, Gordon Paynter. Note that XForms in this context means ‘Transforms’—was well before an XForms standard that means something specific. http://dcpapers.dublincore.org/pubs/article/view/803