KEMBAR78
Developing and assessing FAIR digital resources | PPTX
Developing and assessing
FAIR digital resources
1
Michel Dumontier, Ph.D.
Distinguished Professor of Data Science
@micheldumontier::datastewards:2017-10-03
2 @micheldumontier::datastewards:2017-10-03
Most published research findings are false.
- John Ioannidis, Stanford University
Non-reproducibility of
64% in psychological studies and
65–89% in pharmacological studies
PLoS Med 2005;2(8): e124.
Grand Challenge:
How can we
automatically find
the evidence that
support or dispute a
hypothesis using the
totality of available
data, tools and
scientific
knowledge?
@micheldumontier::datastewards:2017-10-033
@micheldumontier::datastewards:2017-10-034
https://doi.org/10.1016/j.radonc.2013.07.007
Can we empower scientists to make new discoveries
from the analysis of other people’s data?
5
A common rejection module (CRM) for acute rejection across multiple organs identifies
novel therapeutics for organ transplantation
Khatri et al. JEM. 210 (11): 2205
DOI: 10.1084/jem.20122709
@micheldumontier::datastewards:2017-10-03
How important is data reuse?
@micheldumontier::datastewards:2017-10-036
http://bit.ly/BiopharmaDataStewardship
(0 is not important, 5 is very important)
- Tom Plasterer
Is integrating internal data a challenge?
@micheldumontier::datastewards:2017-10-037
So what do we need to achieve this?
1. Data Science
Infrastructure to identify, represent, store, transport,
retrieve, aggregate, query, and analyze data and
execute services on demand in a reproducible manner.
Methods to continuously uncover plausible, supported,
prioritized, and experimentally verifiable associations.
2. Community
to build a massive, decentralized network of
interconnected and interoperable data and services
@micheldumontier::datastewards:2017-10-038
15 Principles to enhance the value of all digital
resources and their metadata.
data, images, software, web services, repositories
@micheldumontier::datastewards:2017-10-039
http://www.nature.com/articles/sdata201618
Rapid Adoption of Principles
Developed and
endorsed by
researchers, publishers,
funding agencies,
industry partners.
As of May 2017,
200+ citations since
2016 publication
Included in G20
communique, EOSC,
H2020, NIH, and more…
@micheldumontier::datastewards:2017-10-0310
F1: (meta) data are assigned globally
unique and persistent identifiers
F2: Data are described with rich
metadata
F3: Metadata clearly and explicitly
include the identifier of the data it
describes
F4: (meta)data are registered or
indexed in a searchable resource
A1: (meta)data are retrievable by their
identifier using a standardized
communication protocol.
A1.1: The protocol is open, free and
universally implementable
A1.2: The protocol allows for an
authentication and authorization
A2: Metadata should be accessible
even when the data is no longer
available
I1: (meta)data use a formal,
accessible, shared, and broadly
applicable language for
knowledge representation.
I2: (meta)data use vocabularies that
follow the FAIR principles
I3: (meta)data include qualified
references to other (meta)data.
R1: meta(data) are richly described with
a plurality of accurate and relevant
attributes
R1.1: (meta)data are released with a
clear and accessible data usage
license.
R1.2: (meta)data are associated with
detailed provenance
R1.3: (meta)data meet domain-relevant
community standards
Findable Accessible
Interoperable Reusable
11
The Semantic Web
is a global web of FAIR data
12 @micheldumontier::datastewards:2017-10-03
standards for publishing, sharing and querying
facts, expert knowledge and services
scalable approach for the discovery
of independently constructed,
collaboratively described,
distributed knowledge
We are building a massive
decentralized knowledge graph
13 @micheldumontier::datastewards:2017-10-03Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
@micheldumontier::datastewards:2017-10-0314
1
5
metadatacenter.org
NIH COMMONS
@micheldumontier::datastewards:2017-10-03
@micheldumontier::datastewards:2017-10-0316
http://www.w3.org/TR/hcls-dataset/
Dataset Metadata
Core Metadata
• Identifiers
• Title
• Description
• Homepage
• License
• Language
• Keywords
• Concepts and vocabularies
used
• Standard compliance
• Publication
Extended Metadata
• Provenance Metadata
• Versioning Metadata
• Content Metadata
@micheldumontier::datastewards:2017-10-0317
http://hw-swel.github.io/Validata/
VALIDATA DEMO
@micheldumontier::datastewards:2017-10-0318
RDF constraint validation tool
Configurable to any profile
Declarative reusable schema description
Shape Expression (ShEx) constraints
Open source javascript implementation
smartAPI: semantic (meta)data
to Link Data
@micheldumontier::datastewards:2017-10-0319
Build on API metadata specification standards
@micheldumontier::datastewards:2017-10-0320
SWAGGER
@micheldumontier::datastewards:2017-10-0321
Find new uses for existing drugs
Finding melanoma drugs through a probabilistic knowledge
graph. PeerJ Computer Science. 2017. 3:e106
https://doi.org/10.7717/peerj-cs.106
by exploring a probabilistic
knowledge graph
And validate them against
pipelines for drug discovery
Investigate the claims made by others
@micheldumontier::datastewards:2017-10-0322
AUC 0.91 across all therapeutic indications
How do we measure
how FAIR something is?
@micheldumontier::datastewards:2017-10-0323
We can ask investigators
what they intend to do…
Section 2. FAIR data
1. Making data findable, including provisions for
metadata (5 questions)
2. Making data openly accessible (10 questions)
3. Making data interoperable (4 questions)
4. Increase data re-use (through clarifying
licenses - 4 questions)
Additional sections:
1. Data summary (6 questions, 5 of which also
cover aspects of FAIRness)
2. Allocation of resources (4 questions)
3. Data security (2 questions)
4. Ethical aspects (2 questions)
5. Other issues (2 questions)
Total of 23 + 16 = 39 questions!!
@micheldumontier::datastewards:2017-10-0324
https://goo.gl/Strjua
FAIRness
FAIRness reflects the extent to which a digital
resource addresses the FAIR principles as per the
expectations defined by a community of
stakeholders.
@micheldumontier::datastewards:2017-10-0325
Stakeholders
People worried about
– Findability
– Accessibility
– Interoperability
– Reuse
– Provenance
– Licensing
– Recognition
– Value
@micheldumontier::datastewards:2017-10-0326
People who are
- Potential users
- Resource creators
- Academics
- Publishers
- Industry
- Funding agencies
- The public
Metrics
as explicit measures of expectation
• A metric is a standard of measurement.
• It must provide clear definition of what is being
measured, why one wants to measure it.
• It must describe the process by which you
obtain a valid measurement result, so that it
can be reproduced by others. It needs to
specify what a valid result is.
@micheldumontier::datastewards:2017-10-0327
Candidate Metrics
FM-F1A - Identifier uniqueness
FM-F1B - Identifier persistence
FM-F2 - Machine-readability of metadata
FM-F3 - Identifier in metadata
FM-F4 - Findable in search results
FM-A1.1 - Access protocol
FM-A1.2 - Access authorization
FM-A2 - Metadata Longevity
FM-I1 - Use of a knowledge representation language
FM-I2 - Use FAIR vocabularies
FM-I3 - Use qualified references
FM-R1.1 - Accessible licenses
FM-R1.2 - Provenance
FM-R1.3 - Standard conformance
@micheldumontier::datastewards:2017-10-0328
@micheldumontier::datastewards:2017-10-0329
FAIRness Index
• A community, comprised of clearly defined
stakeholders (researchers, publishers, users,
etc), may define their own FAIRness Index
(Indicators) that expresses what makes a
digital resource ideally or maximally FAIR.
• A FAIRness Index is a collection of metrics that
are aligned to the FAIR principles and can be
consistently and transparently evaluated.
@micheldumontier::datastewards:2017-10-0330
Measures for Digital Repositories
• Data Seal of Approval
– 6 core requirements
– 16 criteria
• DIN31644: Information and documentation -
Criteria for trustworthy digital archives
– 10 core requirements
– 34 criteria
• ISO16363: : Audit and certification of trustworthy
digital repositories
– 100+ criteria
@micheldumontier::datastewards:2017-10-0331
DSA 16 requirements
1. mission to provide access to and preserve data
2. licenses covering data access and use and monitors compliance.
3. continuity plan
4. ensures that data created/used in compliance with norms.
5. adequate funding and qualified staff through clear governance
6. mechanism(s) for expert guidance and feedback
7. guarantees the integrity and authenticity of the data
8. accepts data and metadata to ensure relevance and understandability
9. applies documented processes in archival
10. responsibility for preservation that is documented.
11. expertise to address data and metadata quality
12. archiving according to defined workflows.
13. enables discovery and citation.
14. enables reuse with appropriate metadata.
15. infrastructure
16. infrastructure
@micheldumontier::datastewards:2017-10-0332
https://www.datasealofapproval.org
Data Seal of Approval
• self-assessment in the DSA online tool. The
online tool takes you through the
16 requirements and provides you with
support.
• Once you have completed your self-
assessment you can submit it for peer review
@micheldumontier::datastewards:2017-10-0333
Ways can we gather information to
assess FAIRness
A) Self assessment
B) FAIR Assessment Team
C) Automated assessment
D) Crowdsourcing
E) All of the above
@micheldumontier::datastewards:2017-10-0334
Redefining Scientific Publishing
@micheldumontier::datastewards:2017-10-0335
http://www.tkuhn.org/pub/sempub/
Summary
• Coupling discovery science with research data
management is the right incentive to produce
high quality data and metadata
• New infrastructure is needed to enable this
collaboration
• A framework to assess the FAIRness of digital
resources according to community
expectations is being developed
@micheldumontier::datastewards:2017-10-0336
michel.dumontier@maastrichtuniversity.nl
Website: http://maastrichtuniversity.nl/ids
Presentations: http://slideshare.com/micheldumontier
37 @micheldumontier::datastewards:2017-10-03

Developing and assessing FAIR digital resources

  • 1.
    Developing and assessing FAIRdigital resources 1 Michel Dumontier, Ph.D. Distinguished Professor of Data Science @micheldumontier::datastewards:2017-10-03
  • 2.
    2 @micheldumontier::datastewards:2017-10-03 Most publishedresearch findings are false. - John Ioannidis, Stanford University Non-reproducibility of 64% in psychological studies and 65–89% in pharmacological studies PLoS Med 2005;2(8): e124.
  • 3.
    Grand Challenge: How canwe automatically find the evidence that support or dispute a hypothesis using the totality of available data, tools and scientific knowledge? @micheldumontier::datastewards:2017-10-033
  • 4.
  • 5.
    Can we empowerscientists to make new discoveries from the analysis of other people’s data? 5 A common rejection module (CRM) for acute rejection across multiple organs identifies novel therapeutics for organ transplantation Khatri et al. JEM. 210 (11): 2205 DOI: 10.1084/jem.20122709 @micheldumontier::datastewards:2017-10-03
  • 6.
    How important isdata reuse? @micheldumontier::datastewards:2017-10-036 http://bit.ly/BiopharmaDataStewardship (0 is not important, 5 is very important) - Tom Plasterer
  • 7.
    Is integrating internaldata a challenge? @micheldumontier::datastewards:2017-10-037
  • 8.
    So what dowe need to achieve this? 1. Data Science Infrastructure to identify, represent, store, transport, retrieve, aggregate, query, and analyze data and execute services on demand in a reproducible manner. Methods to continuously uncover plausible, supported, prioritized, and experimentally verifiable associations. 2. Community to build a massive, decentralized network of interconnected and interoperable data and services @micheldumontier::datastewards:2017-10-038
  • 9.
    15 Principles toenhance the value of all digital resources and their metadata. data, images, software, web services, repositories @micheldumontier::datastewards:2017-10-039 http://www.nature.com/articles/sdata201618
  • 10.
    Rapid Adoption ofPrinciples Developed and endorsed by researchers, publishers, funding agencies, industry partners. As of May 2017, 200+ citations since 2016 publication Included in G20 communique, EOSC, H2020, NIH, and more… @micheldumontier::datastewards:2017-10-0310
  • 11.
    F1: (meta) dataare assigned globally unique and persistent identifiers F2: Data are described with rich metadata F3: Metadata clearly and explicitly include the identifier of the data it describes F4: (meta)data are registered or indexed in a searchable resource A1: (meta)data are retrievable by their identifier using a standardized communication protocol. A1.1: The protocol is open, free and universally implementable A1.2: The protocol allows for an authentication and authorization A2: Metadata should be accessible even when the data is no longer available I1: (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2: (meta)data use vocabularies that follow the FAIR principles I3: (meta)data include qualified references to other (meta)data. R1: meta(data) are richly described with a plurality of accurate and relevant attributes R1.1: (meta)data are released with a clear and accessible data usage license. R1.2: (meta)data are associated with detailed provenance R1.3: (meta)data meet domain-relevant community standards Findable Accessible Interoperable Reusable 11
  • 12.
    The Semantic Web isa global web of FAIR data 12 @micheldumontier::datastewards:2017-10-03 standards for publishing, sharing and querying facts, expert knowledge and services scalable approach for the discovery of independently constructed, collaboratively described, distributed knowledge
  • 13.
    We are buildinga massive decentralized knowledge graph 13 @micheldumontier::datastewards:2017-10-03Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
  • 14.
  • 15.
  • 16.
  • 17.
    Dataset Metadata Core Metadata •Identifiers • Title • Description • Homepage • License • Language • Keywords • Concepts and vocabularies used • Standard compliance • Publication Extended Metadata • Provenance Metadata • Versioning Metadata • Content Metadata @micheldumontier::datastewards:2017-10-0317
  • 18.
    http://hw-swel.github.io/Validata/ VALIDATA DEMO @micheldumontier::datastewards:2017-10-0318 RDF constraintvalidation tool Configurable to any profile Declarative reusable schema description Shape Expression (ShEx) constraints Open source javascript implementation
  • 19.
    smartAPI: semantic (meta)data toLink Data @micheldumontier::datastewards:2017-10-0319
  • 20.
    Build on APImetadata specification standards @micheldumontier::datastewards:2017-10-0320 SWAGGER
  • 21.
    @micheldumontier::datastewards:2017-10-0321 Find new usesfor existing drugs Finding melanoma drugs through a probabilistic knowledge graph. PeerJ Computer Science. 2017. 3:e106 https://doi.org/10.7717/peerj-cs.106 by exploring a probabilistic knowledge graph And validate them against pipelines for drug discovery
  • 22.
    Investigate the claimsmade by others @micheldumontier::datastewards:2017-10-0322 AUC 0.91 across all therapeutic indications
  • 23.
    How do wemeasure how FAIR something is? @micheldumontier::datastewards:2017-10-0323
  • 24.
    We can askinvestigators what they intend to do… Section 2. FAIR data 1. Making data findable, including provisions for metadata (5 questions) 2. Making data openly accessible (10 questions) 3. Making data interoperable (4 questions) 4. Increase data re-use (through clarifying licenses - 4 questions) Additional sections: 1. Data summary (6 questions, 5 of which also cover aspects of FAIRness) 2. Allocation of resources (4 questions) 3. Data security (2 questions) 4. Ethical aspects (2 questions) 5. Other issues (2 questions) Total of 23 + 16 = 39 questions!! @micheldumontier::datastewards:2017-10-0324 https://goo.gl/Strjua
  • 25.
    FAIRness FAIRness reflects theextent to which a digital resource addresses the FAIR principles as per the expectations defined by a community of stakeholders. @micheldumontier::datastewards:2017-10-0325
  • 26.
    Stakeholders People worried about –Findability – Accessibility – Interoperability – Reuse – Provenance – Licensing – Recognition – Value @micheldumontier::datastewards:2017-10-0326 People who are - Potential users - Resource creators - Academics - Publishers - Industry - Funding agencies - The public
  • 27.
    Metrics as explicit measuresof expectation • A metric is a standard of measurement. • It must provide clear definition of what is being measured, why one wants to measure it. • It must describe the process by which you obtain a valid measurement result, so that it can be reproduced by others. It needs to specify what a valid result is. @micheldumontier::datastewards:2017-10-0327
  • 28.
    Candidate Metrics FM-F1A -Identifier uniqueness FM-F1B - Identifier persistence FM-F2 - Machine-readability of metadata FM-F3 - Identifier in metadata FM-F4 - Findable in search results FM-A1.1 - Access protocol FM-A1.2 - Access authorization FM-A2 - Metadata Longevity FM-I1 - Use of a knowledge representation language FM-I2 - Use FAIR vocabularies FM-I3 - Use qualified references FM-R1.1 - Accessible licenses FM-R1.2 - Provenance FM-R1.3 - Standard conformance @micheldumontier::datastewards:2017-10-0328
  • 29.
  • 30.
    FAIRness Index • Acommunity, comprised of clearly defined stakeholders (researchers, publishers, users, etc), may define their own FAIRness Index (Indicators) that expresses what makes a digital resource ideally or maximally FAIR. • A FAIRness Index is a collection of metrics that are aligned to the FAIR principles and can be consistently and transparently evaluated. @micheldumontier::datastewards:2017-10-0330
  • 31.
    Measures for DigitalRepositories • Data Seal of Approval – 6 core requirements – 16 criteria • DIN31644: Information and documentation - Criteria for trustworthy digital archives – 10 core requirements – 34 criteria • ISO16363: : Audit and certification of trustworthy digital repositories – 100+ criteria @micheldumontier::datastewards:2017-10-0331
  • 32.
    DSA 16 requirements 1.mission to provide access to and preserve data 2. licenses covering data access and use and monitors compliance. 3. continuity plan 4. ensures that data created/used in compliance with norms. 5. adequate funding and qualified staff through clear governance 6. mechanism(s) for expert guidance and feedback 7. guarantees the integrity and authenticity of the data 8. accepts data and metadata to ensure relevance and understandability 9. applies documented processes in archival 10. responsibility for preservation that is documented. 11. expertise to address data and metadata quality 12. archiving according to defined workflows. 13. enables discovery and citation. 14. enables reuse with appropriate metadata. 15. infrastructure 16. infrastructure @micheldumontier::datastewards:2017-10-0332 https://www.datasealofapproval.org
  • 33.
    Data Seal ofApproval • self-assessment in the DSA online tool. The online tool takes you through the 16 requirements and provides you with support. • Once you have completed your self- assessment you can submit it for peer review @micheldumontier::datastewards:2017-10-0333
  • 34.
    Ways can wegather information to assess FAIRness A) Self assessment B) FAIR Assessment Team C) Automated assessment D) Crowdsourcing E) All of the above @micheldumontier::datastewards:2017-10-0334
  • 35.
  • 36.
    Summary • Coupling discoveryscience with research data management is the right incentive to produce high quality data and metadata • New infrastructure is needed to enable this collaboration • A framework to assess the FAIRness of digital resources according to community expectations is being developed @micheldumontier::datastewards:2017-10-0336
  • 37.

Editor's Notes

  • #2 A talk prepared for Workshop Working on data stewardship? Meet your peers! Datum: 03 OKT 2017  https://www.surf.nl/agenda/2017/10/workshop-working-on-data-stewardship-meet-your-peers/index.html SURFacademy organiseert in samenwerking met LCRDM en de UKB werkgroep Research Data op 3 oktober 2017 een netwerkbijeenkomst voor data stewards en anderen, die onderzoekers binnen de universiteiten en onderzoeksinstellingen ondersteunen in research data management. In deze bijeenkomst leer je collega's kennen en leer je van elkaars praktijk.
  • #6 Abstract Using meta-analysis of eight independent transplant datasets (236 graft biopsy samples) from four organs, we identified a common rejection module (CRM) consisting of 11 genes that were significantly overexpressed in acute rejection (AR) across all transplanted organs. The CRM genes could diagnose AR with high specificity and sensitivity in three additional independent cohorts (794 samples). In another two independent cohorts (151 renal transplant biopsies), the CRM genes correlated with the extent of graft injury and predicted future injury to a graft using protocol biopsies. Inferred drug mechanisms from the literature suggested that two FDA-approved drugs (atorvastatin and dasatinib), approved for nontransplant indications, could regulate specific CRM genes and reduce the number of graft-infiltrating cells during AR. We treated mice with HLA-mismatched mouse cardiac transplant with atorvastatin and dasatinib and showed reduction of the CRM genes, significant reduction of graft-infiltrating cells, and extended graft survival. We further validated the beneficial effect of atorvastatin on graft survival by retrospective analysis of electronic medical records of a single-center cohort of 2,515 renal transplant patients followed for up to 22 yr. In conclusion, we identified a CRM in transplantation that provides new opportunities for diagnosis, drug repositioning, and rational drug design.
  • #11 G20: http://europa.eu/rapid/press-release_STATEMENT-16-2967_en.htm EOSC: https://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.pdf H2020: https://goo.gl/Strjua