KEMBAR78
An introduction to open data | PDF
AN INTRODUCTION
TO OPEN DATA
Sally Jenkinson - Fronteers - Amsterdam - 09.10.2015
@sjenkinson | sally@recordssoundthesame.com
sally@recordssoundthesame.com | @sjenkinson
Digital solutions architect & consultant
Records Sound the Same Ltd
Sally Jenkinson
DATA
OPEN DATA
“Big data”
@sjenkinson
90% of the world’s total
data has been created
within the last 2 years
!
(IBM, 2014)
@sjenkinson
I ♡ DATA
@sjenkinson
@sjenkinson
sallyjenkinson.co.uk/labs/teatracker
BUT

“You agree to maintain your apps
and your systems in accordance with
industry standard quality levels
”
DATA SHARING
WHAT IS OPEN DATA?
Open data and content can be
freely used, modified, and shared
by anyone for any purpose.
opendefinition.org
Re-publish
Derive new content or data
Make money by selling products
Charge a fee for access
Make money by selling products
Charge a fee for access
“We observed that often people think of
open data as a specific ‘kind’ of data –
something separate and distinct from the
data they use day-to-day in their
organisation or team – rather than a choice
about how people publish data.”
theodi.org/blog/closed-shared-open-data-whats-in-a-name
theodi.org/guides/publishers-guide-open-data-licensing | theodi.org/guides/reusers-guide-open-data-licensing
Public domain (CC0)
Attribution (CC-by)
Attribution & share-alike (CC-by-sa)
OPEN LICENCES FOR
CREATIVE CONTENT
theodi.org/guides/publishers-guide-open-data-licensing | theodi.org/guides/reusers-guide-open-data-licensing
Public domain (PDDL)
Attribution (ODC-by)
Attribution & share-alike (ODbL)
OPEN LICENCES FOR
DATABASES
theodi.org/guides/publishers-guide-open-data-licensing | theodi.org/guides/reusers-guide-open-data-licensing
Open Government Licence
OS Open Licence
etc
OTHER OPEN LICENCES
WHERE CAN I GET IT FROM?
wiki.dbpedia.org
musicbrainz.org
earthquake.usgs.gov/earthquakes/search/
plaidplug.com
data.id/dataset/daftar-titik-reklame-di-dki-jakarta/resource/361ce01f-34ed-4e00-a204-6062c7b9ad64
web.archive.org/web/20150520175645/http://137.189.35.203/WebUI/CatDatabase/catData.html
vision.stanford.edu/aditya86/ImageNetDogs/
{"gilded":
0,"author_flair_text":"Male","author_flair_css_class":"ma
le","retrieved_on":1425124228,"ups":
3,"subreddit_id":"t5_2s30g","edited":false,"controversial
ity":
0,"parent_id":"t1_cnapn0k","subreddit":"AskMen","body":"I
can't agree with passing the blame, but I'm glad to hear
it's at least helping you with the anxiety. I went the
other direction and started taking responsibility for
everything. I had to realize that people make mistakes
including myself and it's gonna be alright. I don't have
to be shackled to my mistakes and I don't have to be
afraid of making them.
","created_utc":"1420070668","downs":0,"score":
3,"author":"TheDukeofEtown","archived":false,"distinguish
ed":null,"id":"cnasd6x","score_hidden":false,"name":"t1_c
nasd6x","link_id":"t3_2qyhmp"}
x ~1.7 billion
reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/
♄
github.com/caesar0301/awesome-public-datasets
CONSUMING OPEN DATA
@sjenkinson
d3js.org
MORE THAN WEBSITES
iquantny.tumblr.com/post/92116352544/mapping-nyc-hydrant-revenue-upper-easts-19th
Generating value & making savings
@sjenkinson
+$3 trillion / year
mckinsey.com/insights/business_technology/open_data_unlocking_innovation_and_performance_with_liquid_information
open data
Transparency
@sjenkinson
“
within two years chemical emissions
nationwide (at least as reported, and
presumably also in fact) had decreased
by 40 percent.
!
Some companies were launching
policies to bring their emissions down
by 90 percent, just because of the
release of previously sequestered
information.”
maban.co.uk/80
DATA & USER EXPERIENCES
“How far do you live from your
workplace? Chances are, you'd answer
that question in minutes rather than
miles.
!
An hour on the bus tells us a lot more than
47 miles. That's why we made
Mapumental.
!
Given any start point or destination, it'll
show everywhere within the chosen
commute time, by public transport.”
mapumental.com/services/travel-time
“How accessible is your nearest school, post
office, or GP’s surgery?
!
In Wales, that’s not always a simple question: the
country’s mountainous landscapes, rural
populations, and sometimes infrequent bus
services can mean that those without cars are
rather cut off from public service provision.”
mapumental.com/services/accessibility
“Just how quickly could fire engines reach
a given postcode in case of a fire?
!
It’s a question that’s pivotal to decisions
made by both the emergency services and
the insurance industry.”
mysociety.org/2013/04/22/fire-fire-mapumental-and-fire-engine-journey-times
Improved eïŹ€iciency
Improved eïŹ€ectiveness
Impact measurement
@sjenkinson
Improved or new private products or
services & innovation
@sjenkinson
NOT JUST DIGITAL
opensensors.io
DOUG MCCUNE
dougmccune.com
STEFANIE POSAVEC
stefanieposavec.co.uk
“Air Transformed is a series of
wearable data objects that communicate
this physical burden in different ways.
Though seemingly decorative, they are
based entirely on open air quality data
from Sheffield, UK, a former steelmaking
city and notorious for its bad air.”
stefanieposavec.co.uk/data/#/airtransformed
Participation & self-empowerment
@sjenkinson
LINKED DATA
New knowledge from combined
data sources and patterns in large
data volumes
@sjenkinson
Misrepresentation
tylervigen.com/spurious-correlations
tylervigen.com/spurious-correlations
Combining data sets & licences
clipol.org/tools/compatibility
PUBLISHING OPEN DATA
“There are known knowns; there are things
we know we know. We also know there are
known unknowns; that is to say we know
there are some things we do not know. But
there are also unknown unknowns – the
ones we don't know we don't know.”
en.wikipedia.org/wiki/There_are_known_knowns
STEP ONE
Identification & planning
@sjenkinson
Clear licensing & usage information
Structure & quality
A plan for support
@sjenkinson
Accuracy
STEP TWO
Extracting & cleaning
@sjenkinson
Data privacy &
the individual
openrefine.org
STEP THREE
Sharing
@sjenkinson
FIVE STAR DATA
5stardata.info
★
Make your data available on the web (in whatever format)
under an open license.
★★
Make it available as structured data
(e.g., Excel instead of image scan of a table).
★★★ Use non-proprietary formats (e.g., CSV instead of Excel).
★★★★ Use URIs to denote things, so that people can point at your data.
★★★★★ Link your data to other data to provide context.
OPEN DATA CERTIFICATES
certificates.theodi.org
IN CONCLUSION

1. Choose open data
2. Publish your data
3. Link it
4. Use standards
5. Promote freedom
6. Do some good
7. Be creative
@sjenkinson
!
sally@recordssoundthesame.com
!
recordssoundthesame.com
THANK YOU. Thank you to these lovely people for making their
content open:
Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer,
Anja Jentzsch and Richard Cyganiak - lod-cloud.net
The Data Spectrum - theodi.org/data-spectrum
Doug McCune - dougmccune.com
Stefanie Posavec - stefanieposavec.co.uk
Data abstract painting - flickr.com/photos/rachubarama/2709346242
IE Market Share vs Murder Rate - imgur.com/47D7zGq
Troy Marusek - flickr.com/photos/troymars/9113025616
The Roof of Wales - flickr.com/photos/stray_croc/4743302841
Fire Wall - flickr.com/photos/epleitez/1714341218
Money - flickr.com/photos/mikephotoart/12839909303
cc - flickr.com/photos/kalexanderson/7175627336
RDF - flickr.com/photos/gertcha/8292978031
Small Parts - flickr.com/photos/oskay/2156889157/
Hydrant - flickr.com/photos/pamhule/4677109732/
Upsala Glacier Retreat - flickr.com/photos/nasamarshall/10726540434/

An introduction to open data