KEMBAR78
New and Emerging Forms of Data | PDF
David De Roure 
 @dder
New and Emerging Forms of Data:
Past, Present, and Future
OXFORD E-RESEARCH CENTRE
http://www.data-archive.ac.uk/media/54761/ukda-40thanniversary.pdf
When	did	(me	begin?
More people
Moremachines
Big Data
High Performance
Computing
Conventional
computing
Web 2
Social Media
e-infrastructure
online
R&D
New and
Emerging
Forms of Data
deeply
about
society
Nigel Shadbolt et
https://twitter.com/CR_UK/status/446223117841494016/
Some people's smartphones
had autocorrected the word
"BEAT" to instead read
"BEAR".
"Thank you for choosing an
adorable polar bear," the
reply from the WWF said.
"We will call you today to set
up your adoption."
http://www.bbc.com/news/technology-26723457
http://www.parliament.uk/business/committees/committees-a-z/commons-select/science-and-technology-committee/news/report-responsible-use-of-data/
theODI.org
Social Media Triangle
social media
data and
analytics
social media
for engagement
with research
social media
as a subject
of research
Sam McGregor
New Forms of Data
▶ Internet data, derived from social
media and other online interactions
(including data gathered by
connected people and devices, eg
mobile devices, wearable
technology, Internet of Things)
▶ Tracking data, monitoring the
movement of people and objects
(including GPS/geolocation data,
traffic and other transport sensor
data, CCTV images etc)
▶ Satellite and aerial imagery (eg
Google Earth, Landsat, infrared,
radar mapping etc) http://www.oecd.org/sti/sci-tech/new-data-for-
understanding-the-human-condition.htm
What do we mean by real-time analytics?
▶  Live data streams vs live data analysis
▶  Different kinds of data, at a different pace
▶  Time-critical integration and analysis
▶  Influencing processes as they unfold, at speed & at scale
▶  New methodological apparatus
▶  New computational methods and infrastructure
▶  Not just social media – but social media is a rehearsal
New and emerging CDTs
Real life is and must be full of all kinds of social
constraint – the very processes from which society
arises. Computers can help if we use them to create
abstract social machines on the Web: processes in
which the people do the creative work and the
machine does the administration... The stage is set
for an evolutionary growth of new social engines.
The ability to create new forms of social process
would be given to the world at large, and
development would be rapid.
Berners-Lee, Weaving the Web, 1999 (pp. 172–175)
Social Machines
The Macroscope
Observer of
one social
machine
Observers using third
party observatory
Observer of
multiple social
machines
Human
participants in
Social
Machine
Human participants in
multiple Social Machines
Observer of Social
Machine infrastructure
1	
4	
2	
3	
5	
6	
SM
SM
SM
Social Machine
Observing Social
Machines
7	
@dder
De Roure, D.,
Hooper, C., Page,
K., Tarte, S., and
Willcox, P. 2015.
Observing Social
Machines Part 2:
How to Observe?
ACM Web Science
STORYTELLING AS A STETHOSCOPE
FOR SOCIAL MACHINES
1.  Sociality through storytelling potential
and realization
2.  Sustainability through reactivity and
interactivity
3.  Emergence through collaborative
authorship and mixed authority
Zooniverse	is	a	highly	
storified	Social	Machine	
Facebook	doesn’t	allow	
for	improvisa(on	
Wikipedia	assigns	
authority	rights	rigidly	
Tarte, S. M., De Roure, D., and Willcox, P. Working out the plot: the role of stories in social machines. In Proceedings of the
companion publication of the 23rd international conference on World wide web companion (2014), International World Wide Web
Conferences Steering Committee, pp. 909–914."
Seizing the tiger by the tail
▶  The Internet of Things
describes a world in which
everyday objects are
connected to a network so that
data can be shared
▶  But it is really as much about
people as the inanimate object
▶  It is impossible to anticipate
all the social changes that
could be created by connecting
billions of devices
https://www.gov.uk/government/publications/internet-of-things-blackett-review
PETRAS Privacy, Ethics, Trust, Reliability, Acceptability, and Security
for the Internet of Things
•  Use an integrated approach of collaborative social and
physical science expertise
•  Remove barriers to the beneficial adoption of Internet of
Things
•  Address generic knowledge gaps through case study
approaches covering major sectors
•  Use innovative methodologies including ‘in the wild’ and
citizen science
Principles
PETRAS Privacy, Ethics, Trust, Reliability, Acceptability, and Security
for the Internet of Things
Key Facts about PETRAS
•  9 world leading universities via
the core and spoke model (4
from the Alan Turing Institute)
•  Combined hub value: £23m
•  Blackett Review expertise
•  47 partners at submission
combining presence in the UK,
Central Europe and America
(giving International links and
perspective)
•  Inter– and multi-disciplinary
focus
Normal	Accidents	
Small	events	cascade	through	the	
system,	with	catastrophic	
consequences,	when:	
•  The	system	is	complex	
•  The	system	is	(ghtly	coupled	
•  The	system	has	catastrophic	
poten(al	
doi:10.1016/B0-08-043076-7/04509-5
More people
Moremachines
Big Data
High Performance
Computing
Conventional
computing
Web 2
Social Media
e-infrastructure
online
R&D
New and
Emerging
Forms of Data
deeply
about
society
Thefuture
increasing automation
machine learning
Data Detect Store AnalyticsFilter Analysts
Edwards, P. N., et al. (2013) Knowledge Infrastructures: Intellectual Frameworks and Research
Challenges. Ann Arbor: Deep Blue. http://hdl.handle.net/2027.42/97552
Findable
Accessible
Interoperable
Reusable
Jameson L. Toole, Yu-Ru Lin, Erich Muehlegger, Daniel Shoag, Marta C.
González, David Lazer. Journal of the Royal Society Interface. Volume 12,
issue 107. Published 27 May 2015.DOI: 10.1098/rsif.2015.0185
Tracking employment shocks using mobile phone data
A computationally-enabled
sense-making network of
expertise, data, software,
models and narratives
Big Data, in a
Big Data Centre
The	R	Dimensions	
Research	Objects	facilitate	research	that	is	
reproducible,	repeatable,	replicable,	reusable,	
referenceable,	retrievable,	reviewable,	
replayable,	re-interpretable,	reprocessable,	
recomposable,	reconstructable,	repurposable,	
reliable,	respecZul,	reputable,	revealable,	
recoverable,	restorable,	reparable,	refreshable?	
@dder 14 April 2014
sci	method	
access	
understand	
new	use	
social	
cura(on	
Research	
Object	
Principles
What	are	we	trying	to	achieve?	
My reflection is that the reason we seek
“reproducible research” is principally to achieve
two ends:
1.  Confidence in results, because they inform
policy, decision-making, and further research
2.  Sharing and citation of methods, data,
software, to make it easier to stand on each
others shoulders not toes
Let’s focus on (1)…
Research	in	the	Wild	(West)	
Imagine you are a conference chair… or responsible for urban
planning, or security. Confidence in results is getting harder:








What interventions should we make to improve confidence
and quality? What (socio-)technology can we adopt?
Trusting the analysis that
is occurring
Automation of workflows,
crowd-sourced data reduction,
software vulnerabilities, increasing
adoption of machine learning,
and no critical human in the loop

Knowing what the data is,
where it has come from,
and what we can do with it
Multiple and partial data sources,
at speed and scale, in an evolving
ecosystem of data processing
intermediaries, with complexity in
permissions for data use
Provocation One
▶ Are there questions which are answered using
longitudinal studies data today that could be
answered in other ways?
▶  There is massive (voluntary) supply of data about
individuals on a huge scale
▶  The supply is set to increase with Internet of Things
▶  This data is “real time” (fitbit, smartphone,
accelerometer methods…)
Provocation Two
▶ Sometimes we really do need a longitudinal study
in order to answer a question
▶  So can we do that longitudinal study in a new way?
▶  By:
–  Supplementing existing studies, using linkage
–  Using new techniques with easy reporting at scale
–  Working internationally, regionally—shining the torch
Provocation Three
▶ Are we planning for how the world will be in
5 years?
▶  What have we learned from the rehearsal so far?
▶  Increasing automation, bots, robots
▶  Behaviour in the digital world (physical-digital world)
▶  Changing data ecosystem, e.g. personal data stores
consume	
		
produce	
		
compose	
perform	
capture	
		
distribute	
		
		
		
		
www.semanticaudio.ac.uk
Closing	reflec(ons	
1.  Not just new forms of data, but new social
processes and new research questions
2.  What can we learn from the social media
analytics rehearsal?
3.  Are we ready?
– for the data supply ahead
– for inevitable automation
4.  How do we ensure the quality of research?
david.deroure@oerc.ox.ac.uk
@dder
Thanks to Peter Elias, Wendy Hall, Sam McGregor, Mark
Sandler, Nigel Shadbolt, Jeremy Watson, also Grant Miller,
Petar Radanliev, Ségolène Tarte, and Pip Willcox.
http://www.slideshare.net/dder/new-and-emerging-forms-of-data
www.oerc.ox.ac.uk	
david.deroure@oerc.ox.ac.uk	
@dder

New and Emerging Forms of Data

  • 1.
    David De Roure @dder New and Emerging Forms of Data: Past, Present, and Future OXFORD E-RESEARCH CENTRE
  • 2.
  • 4.
    More people Moremachines Big Data HighPerformance Computing Conventional computing Web 2 Social Media e-infrastructure online R&D New and Emerging Forms of Data deeply about society
  • 5.
  • 6.
    https://twitter.com/CR_UK/status/446223117841494016/ Some people's smartphones hadautocorrected the word "BEAT" to instead read "BEAR". "Thank you for choosing an adorable polar bear," the reply from the WWF said. "We will call you today to set up your adoption." http://www.bbc.com/news/technology-26723457
  • 7.
  • 8.
  • 9.
    Social Media Triangle socialmedia data and analytics social media for engagement with research social media as a subject of research Sam McGregor
  • 10.
    New Forms ofData ▶ Internet data, derived from social media and other online interactions (including data gathered by connected people and devices, eg mobile devices, wearable technology, Internet of Things) ▶ Tracking data, monitoring the movement of people and objects (including GPS/geolocation data, traffic and other transport sensor data, CCTV images etc) ▶ Satellite and aerial imagery (eg Google Earth, Landsat, infrared, radar mapping etc) http://www.oecd.org/sti/sci-tech/new-data-for- understanding-the-human-condition.htm
  • 11.
    What do wemean by real-time analytics? ▶  Live data streams vs live data analysis ▶  Different kinds of data, at a different pace ▶  Time-critical integration and analysis ▶  Influencing processes as they unfold, at speed & at scale ▶  New methodological apparatus ▶  New computational methods and infrastructure ▶  Not just social media – but social media is a rehearsal
  • 12.
  • 14.
    Real life isand must be full of all kinds of social constraint – the very processes from which society arises. Computers can help if we use them to create abstract social machines on the Web: processes in which the people do the creative work and the machine does the administration... The stage is set for an evolutionary growth of new social engines. The ability to create new forms of social process would be given to the world at large, and development would be rapid. Berners-Lee, Weaving the Web, 1999 (pp. 172–175) Social Machines
  • 15.
  • 16.
    Observer of one social machine Observersusing third party observatory Observer of multiple social machines Human participants in Social Machine Human participants in multiple Social Machines Observer of Social Machine infrastructure 1 4 2 3 5 6 SM SM SM Social Machine Observing Social Machines 7 @dder De Roure, D., Hooper, C., Page, K., Tarte, S., and Willcox, P. 2015. Observing Social Machines Part 2: How to Observe? ACM Web Science
  • 17.
    STORYTELLING AS ASTETHOSCOPE FOR SOCIAL MACHINES 1.  Sociality through storytelling potential and realization 2.  Sustainability through reactivity and interactivity 3.  Emergence through collaborative authorship and mixed authority Zooniverse is a highly storified Social Machine Facebook doesn’t allow for improvisa(on Wikipedia assigns authority rights rigidly Tarte, S. M., De Roure, D., and Willcox, P. Working out the plot: the role of stories in social machines. In Proceedings of the companion publication of the 23rd international conference on World wide web companion (2014), International World Wide Web Conferences Steering Committee, pp. 909–914."
  • 19.
    Seizing the tigerby the tail ▶  The Internet of Things describes a world in which everyday objects are connected to a network so that data can be shared ▶  But it is really as much about people as the inanimate object ▶  It is impossible to anticipate all the social changes that could be created by connecting billions of devices https://www.gov.uk/government/publications/internet-of-things-blackett-review
  • 21.
    PETRAS Privacy, Ethics,Trust, Reliability, Acceptability, and Security for the Internet of Things •  Use an integrated approach of collaborative social and physical science expertise •  Remove barriers to the beneficial adoption of Internet of Things •  Address generic knowledge gaps through case study approaches covering major sectors •  Use innovative methodologies including ‘in the wild’ and citizen science Principles
  • 22.
    PETRAS Privacy, Ethics,Trust, Reliability, Acceptability, and Security for the Internet of Things Key Facts about PETRAS •  9 world leading universities via the core and spoke model (4 from the Alan Turing Institute) •  Combined hub value: £23m •  Blackett Review expertise •  47 partners at submission combining presence in the UK, Central Europe and America (giving International links and perspective) •  Inter– and multi-disciplinary focus
  • 23.
  • 24.
    More people Moremachines Big Data HighPerformance Computing Conventional computing Web 2 Social Media e-infrastructure online R&D New and Emerging Forms of Data deeply about society Thefuture increasing automation machine learning
  • 25.
    Data Detect StoreAnalyticsFilter Analysts
  • 26.
    Edwards, P. N.,et al. (2013) Knowledge Infrastructures: Intellectual Frameworks and Research Challenges. Ann Arbor: Deep Blue. http://hdl.handle.net/2027.42/97552
  • 27.
  • 28.
    Jameson L. Toole,Yu-Ru Lin, Erich Muehlegger, Daniel Shoag, Marta C. González, David Lazer. Journal of the Royal Society Interface. Volume 12, issue 107. Published 27 May 2015.DOI: 10.1098/rsif.2015.0185 Tracking employment shocks using mobile phone data
  • 29.
    A computationally-enabled sense-making networkof expertise, data, software, models and narratives Big Data, in a Big Data Centre
  • 30.
  • 31.
    What are we trying to achieve? My reflection isthat the reason we seek “reproducible research” is principally to achieve two ends: 1.  Confidence in results, because they inform policy, decision-making, and further research 2.  Sharing and citation of methods, data, software, to make it easier to stand on each others shoulders not toes Let’s focus on (1)…
  • 32.
    Research in the Wild (West) Imagine you area conference chair… or responsible for urban planning, or security. Confidence in results is getting harder: What interventions should we make to improve confidence and quality? What (socio-)technology can we adopt? Trusting the analysis that is occurring Automation of workflows, crowd-sourced data reduction, software vulnerabilities, increasing adoption of machine learning, and no critical human in the loop Knowing what the data is, where it has come from, and what we can do with it Multiple and partial data sources, at speed and scale, in an evolving ecosystem of data processing intermediaries, with complexity in permissions for data use
  • 33.
    Provocation One ▶ Are therequestions which are answered using longitudinal studies data today that could be answered in other ways? ▶  There is massive (voluntary) supply of data about individuals on a huge scale ▶  The supply is set to increase with Internet of Things ▶  This data is “real time” (fitbit, smartphone, accelerometer methods…)
  • 34.
    Provocation Two ▶ Sometimes wereally do need a longitudinal study in order to answer a question ▶  So can we do that longitudinal study in a new way? ▶  By: –  Supplementing existing studies, using linkage –  Using new techniques with easy reporting at scale –  Working internationally, regionally—shining the torch
  • 35.
    Provocation Three ▶ Are weplanning for how the world will be in 5 years? ▶  What have we learned from the rehearsal so far? ▶  Increasing automation, bots, robots ▶  Behaviour in the digital world (physical-digital world) ▶  Changing data ecosystem, e.g. personal data stores
  • 36.
  • 37.
    Closing reflec(ons 1.  Not justnew forms of data, but new social processes and new research questions 2.  What can we learn from the social media analytics rehearsal? 3.  Are we ready? – for the data supply ahead – for inevitable automation 4.  How do we ensure the quality of research?
  • 38.
    david.deroure@oerc.ox.ac.uk @dder Thanks to PeterElias, Wendy Hall, Sam McGregor, Mark Sandler, Nigel Shadbolt, Jeremy Watson, also Grant Miller, Petar Radanliev, Ségolène Tarte, and Pip Willcox. http://www.slideshare.net/dder/new-and-emerging-forms-of-data
  • 39.