KEMBAR78
Open Science in Software Engineering.pdf
Open Science in
Software Engineering
Blekinge Institute of Technology, Sweden
fortiss GmbH, Germany
www.mendezfe.org
mendezfe
Daniel Méndez
The Why, the What, and the How
About me
Head of research division
Requirements Engineering
mendezfe@acm.org
mendezfe.org
Munich
Karlskrona
Full Professor in
Software Engineering
Research areas / interests
• Empirical Software Engineering
• Requirements engineering
• RE in regulated domains (HealthCare, FinTech)
• RE for ML-intensive Systems
• Human-centric RE
Community engagement
• Academia-Industry Collaborations and Technology Transfer
• International Software Engineering Research Network
• Area editor: EMSE ‘Open Science’ and JSS ‘In-Practice’
RE research group
Gijon
DISCLAIMER
▪ Open Science is inherently difficult and comes often with very idealistic /
evangelical (and sometimes unrealistic) views.
▪ Especially Software Engineering poses many challenges that makes Open
Science “by the book” impossible.
▪ The following talk focuses on key principles, but also on some pragmatic
advice based on my own experiences.
Special acknowledgement
▪ Daniel Graziotin with whom we have implemented Open Science
Policies for our major Empirical Software Engineering conferences
▪ Robert Feldt with whom we have implemented the Open Science
Initiative for the Empirical Software Engineering Journal
▪ Julian Frattini with whom we have adopted artefact sharing guidelines
▪ Davide Fucci with whom we have elaborated more practical advice
(and from whom I have stolen some slides)
Open Science, in a nutshell
Open Science describes the movement of making any research artefact
available to the public and includes, but is not limited to, open data, open
source, and open access.
Source: D. Mendez, D. Graziotin, S. Wagner, H. Seibold. Open Science in Software Engineering
+ + =
Why do we need Open Science?
What do you think?
What can go wrong in research?
Ill formulated
No hypothesis
Flawed instrument
Inconsistent design
Software bugs
Wrong analysis
methods
Misinterpretation
Story telling
Post-diction
Extrapolation
File Drawer Bias
Paywalls
What can go wrong in sharing data?
-- Own example
Source: D. Mendez, B. Penzenstadler. Artefact-based Requirements Engineering: The AMDiRE Approach, 2014
What can go wrong in sharing data?
-- Own example
5 years later
What can go wrong in sharing data?
-- Own example
• Where can I find it?
-- Link in paper deprecated
• How exactly should I use it?
-- No readme instructions
• How should I cite it?
-- Repository not self-contained
• What can I do with the data?
-- No licence information
Even after moving to new repository…
What can go wrong in sharing data?
-- Own example
• Where can I find it?
-- Link in paper deprecated
• How exactly should I use it?
-- No readme instructions
• How should I cite it?
-- Repository not self-contained
• What can I do with the data?
-- No licence information
Even after moving to new repository…
Sharing data so that the community can efficiently and sustainably use it is difficult,
but it nevertheless is crucial!
Why we need Open Science
Source: Garousi, V., Fernandes, J.M. Quantity versus impact of software engineering papers: a quantitative study.
Transparency Replicability Discoverability
Publicly-funded research
belongs to everyone
Central to the creation
of theories and their
acceptance
Research has no impact
if people cannot find it
14% of published
research in public
health disclosed
their data
70% of researchers
failed to reproduce
natural science
experiments
43% of papers in
SE are never cited*
Open Science in Software Engineering
Software Engineering (and research) is a
scientific, engineering-based approach to
software development
▪ Address insight-oriented questions
(e.g. exploring natural phenomena)
▪ Apply scientific methods to practical ends
(e.g. testing hypothesis, formal proofs)
Food for thoughts:
1. Who is this person to the left?
2. Did she engage in Open Science?
Image Source: https://en.wikipedia.org/wiki/Margaret_Hamilton_(software_engineer)
Open Science in Software Engineering
Software Engineering (and research) is a
scientific, engineering-based approach to
software development
▪ Address insight-oriented questions
(e.g. exploring natural phenomena)
▪ Apply scientific methods to practical ends
(e.g. testing hypothesis, formal proofs)
Food for thoughts:
1. Who is this person to the left?
2. Did she engage in Open Science?
Image Source: https://en.wikipedia.org/wiki/Margaret_Hamilton_(software_engineer)
There is a lot to unpack when it comes to Open Science in Software Engineering
Open Science in Software Engineering
▪ Why do we need Open Science in Software Engineering?
… where I will provide a personal brief reflection
▪ What is Open Science in Software Engineering?
… where we will discuss the adoption of OS to Software Engineering (and why it is so challenging)
▪ How can we engage in Open Science in Software Engineering?
… where we will discover hands-on advice and guidelines
Open Science in Software Engineering
▪ Why do we need Open Science in Software Engineering?
… where I will provide a personal brief reflection
▪ What is Open Science in Software Engineering?
… where we will discuss the adoption of OS to Software Engineering (and why it is so challenging)
▪ How can we engage in Open Science in Software Engineering?
… where we will discover hands-on advice and guidelines
As a reviewer of manuscripts
As a reviewer, I am responsible for peer-reviewing submitted articles and
papers to journals, conferences, and workshops. Peer-review is key activity
to preserve the quality and integrity of scientific publications [1].
Typical guiding questions:
▪ Is the paper “well” written
(encompasses various aspects)
▪ Are the research methods appropriate?
▪ Are the results credible and valid?
Source [1]: N. Ernst, J. Carver, D. Mendez, M. Torchiano. Understanding Peer Review of Software Engineering Papers (https://arxiv.org/abs/1904.06499)
Source [2]: L. Prechelt, D. Graziotin, D. Mendez. A Community’s Perspective on the Status and Future of Peer Review in Software Engineering (https://arxiv.org/abs/1706.07196)
Checking the credibility and validity
requires access to the data
Community survey [2]
As an area editor or PC co-chair
As an area editor and PC-chair, I am responsible for the
organisation and oversight of reviewing processes in journals and
conferences and ensure a high-quality programme.
Typical guiding questions:
▪ Does the topic fit the scope of the conference/journal?
▪ Do the reviewers recommend acceptance?
▪ Do the authors disclose data/material/source code?
▪ If not, are there valid reasons? (“Do they have something to hide?”)
▪ If yes, do they comply with copyrights, non-disclosure agreements, other regulations?
…
Quite often, even experienced researchers
violate regulations by accident.
As an researcher/author of manuscripts
As a researcher and author of manuscripts, I want to disseminate my
research results to the public for other researchers to build their work on top
of my own research (and cite it) and for industry partners to adopt my
research.
Typical guiding questions:
▪ When reading other people’s work:
▪ Can I access their publication or is it behind a paywall?
▪ Can I use their data?
▪ When writing and publishing myself:
▪ Are my results reproducible and the data reusable?
▪ Is my paper accessible to the public (free of costly subscriptions or APC*)?
* Article Processing Charge, i.e. cost of golden open access publication, usually ~3.000 EUR per manuscript
Image Source: https://library.hkust.edu.hk/sc/das/
As a research manager…
As a research manager, I am responsible for coordinating my research group and
facilitating their research with guidance and external funding.
Typical guiding questions:
▪ Which are topic areas of high interest (for the community, for industry partners)?
▪ How can we foster collaborations in the community?
▪ How should results be disseminated?
▪ How can we ensure proper funding and how can we align openness with funding
regulations?
▪ Industry funding: typically opposed to open science
▪ Public funding: typically in favour (“research funded by tax payers’ money should also
belong to the public”)
What do funding agencies say
about Open Science?
“[…] the ERC expects its grantees to publish in peer-reviewed articles and
monographs. The ERC considers that providing free online access to these
materials is the most effective way of ensuring that the fruits of the research
it funds can be accessed, read, and used as the basis for further research.”
▪ Open Access: ” Beneficiaries must ensure open, free-of-charge access to
the end-user to peer-reviewed scientific publications relating to their
results.”
▪ Open Data: “The data management plan must address the FAIR
principles”
What do funding agencies say
about Open Science?
“[…] The results of projects financed by DFG funding should be made available to
the public. […]”
▪ Open Access: ” The recipients of DFG grants are hereby requested to publish
their project results for the purpose of scholarly communication in open access
[…] without delay.
▪ Open Data: “The long-term archiving and accessibility of research data
contributes to the traceability and quality of scientific work and enables
researchers to carry on work begun by others. [Detailed instructions in the
guideline]”
TL;DR
Open Science is becoming
the norm.
Open Science in Software Engineering
▪ Why do we need Open Science in Software Engineering?
… where I will provide a personal brief reflection
▪ What is Open Science in Software Engineering?
… where we will discuss the adoption of OS to Software Engineering (and why it is so challenging)
▪ How can we engage in Open Science in Software Engineering?
… where we will discover hands-on advice and guidelines
What is happening in the
Software Engineering Research Community
Workshops
Conferences
Journals
2017
ICSE
2019 2020 2024
2018
ACM SIGSOFT
Artefact Evaluation
EMSE Open
Science Policy
JSS Open
Science Policy
…
Systematisation via guidelines
to support authors
…
…
Reflection to make
Open Science mandatory
…
…
Result of our joint work: multiple guidelines
▪ Book chapters: Open Science in Software Engineering
▪ Editorial
▪ Artefact Evaluation Guidelines
▪ Review guidelines
▪ Repositories: https://github.com/emsejournal/openscience
▪ …
…
What is it now what matters most
for Software Engineering?
In essence, we focus on 4 aspects
Open Access: The published manuscript is freely accessible.
Source: Center for Open Science, www.cos.io (preregistered reports: https://osf.io/prereg/)
Pre-registered: At least one study’s design has been
preregistered with descriptions of the research design and study
materials. Done, e.g., via conferences.
Open Materials: All materials necessary to reproduce and
replicate the reported results are publicly available.
Note: Open source typically addressed separately.
Open Data: All data used in the study is publicly available. Data
should be FAIR.
What is FAIR data?
Research data should be…
Data Objects should be uniquely and persistently identifiable.
Data Objects can always be obtained by machines and humans upon
appropriate authorisation.
Data Objects need to be integrated with others and interoperate
with workflows.
Data Objects need to be well-described to be replicated in different
settings.
Findable
Accessible
Interoperable
Reusable
FAIR principles put emphasis on enhancing the ability of machines to automatically
find and use the data, not necessarily humans. FAIR data not necessarily open data.
Different stages of “openness”
Source (badges): ACM Artifact Review and Badging. https://www.acm.org/publications/policies/artifact-review-and-badging-current
Artefact found to be documented
Artefact documentation exceeds functional quality
Reusable & permanently and publicly available
Available & results are reproduced
Reproduced & replicated independently
Source (adoption): S. Abrahão, D. Mendez. ICSE‘21 Artifact Evaluation Track – Submission and Reviewing Guideline. 10.6084/m9.figshare.14123639
Every conference has own adoption
With everything in place, how well
are we doing?
What do you think?
Software Engineering data sharing
in general
~70%
Of papers to recent conferences
share their data [1].
Source [1]: Own observation from recent conference and track organisations (e.g. REFSQ 2024)
~30%
Of data shared is shared via
personal websites or consumer cloud
storage (e.g. google) [1]. These will
eventually disappear [2].
Source [2]: https://github.com/dgraziotin/disclose-data-dbr-first-then-opendata
Software Engineering data sharing
via proper repositories
Source: W. Hasselbring, L. Car, S. Hettrick, H. Packer, T. Tiropanis. FAIR and Open Computer Science Research Software
of the GitHub repositories
mentioned in the ACM Digital
Library are in the area
Software and its Engineering
~27% 15 days
Is their median life span…
1/3 are live for less than
one day.
Software Engineering artefacts
(open data, open material) – 1/2
Source: C. Timperley, L. Herckis, C. Le Goues, M. Hilton. Unterstanding and Improving Artifact Sharing in Software Engineering Research
▪ 2/3 of all papers published in
major SE conferences* shared
their artefacts
▪ Trend grew from 50,56% in 2014
to 69,47% in 2018
▪ In tune with more specialised
conferences (e.g. RE)
* ICSE, ESEC/FSE, ESEM, ASE
“Some artifacts are dangerous,
difficult, or impossible to share
due to privacy and IP concerns or
belonging to a large ecosystem”
Software Engineering artefacts
(open data, open material) – 2/2
Source: D. Graziotin. https://ineed.coffee/5836/effectiveness-of-open-science-policies-at-esec-fse-2019/
A closer look at the example of the ESEC/FSE 2019 conference:
▪ 67% of the submissions with
accompanying data available for peer review
▪ 87% of submissions promised
open data in case of acceptance
▪ 25% of data properly archived
Reasons provided by authors
Lack of motivation Simply not possible
When having so many principles
and practices, why are we still not
engaging in Open Science?
What do you think?
Challenges in Open Science
for Software Engineering
▪ Often, researchers lack the awareness:
▪ It is perceived as very time-consuming and often not seen as worthwhile.
▪ It is easy to do it wrong (e.g. wrong licence or not properly anonymising data)
▪ Even if there is awareness, it is still inherently difficult.
Our research data is often:
mixed (qualitative and quantitative)
not self-explanatory
confidential & company-specific
difficult to anonymize
*Source: B. Hermann, S. Winter, J. Siegmund. Community expectations for research artifacts and evaluation processes
We still need an increased awareness for the importance of Open Science in
Software Engineering, and proper (adopted) guidelines and tools.
Open Science in Software Engineering
▪ Why do we need Open Science in Software Engineering?
… where I will provide a personal brief reflection
▪ What is Open Science in Software Engineering?
… where we will discuss the adoption of OS to Software Engineering (and why it is so challenging)
▪ How can we engage in Open Science in Software Engineering?
… where we will discover hands-on advice and guidelines
A few pragmatic recommendations only
How to make data publicly available?
How to make manuscripts publicly available?
Making manuscripts publicly available
▪ Open access means public access without
financial
legal
technical
… restrictions
▪ 2 options
▪ Publisher makes manuscript available (against Article Process Charge / APC*)
▪ Self-archiving (for free ☺)
* On average around 3.000,- EUR per article
Self-archiving is something we always can do (or at least try). Important is to legally
comply with the publishing agreement and to make it permanently available.
Making manuscripts publicly available
Am I allowed to self-archive my manuscript?
Check Sherpa/Romeo data base for regulations.
1
2
How can I self-archive my manuscript?
Do it via arXiv.org
Making data publicly available
Preparing data and making it publicly available (following FAIR principles)
includes various aspects to consider:
▪ What information is relevant and how do I need to prepare it?
▪ How do I document it?
▪ What is an appropriate licence?
▪ Where should I archive it (and how) to make it permanently available?
Making data publicly available
How should I prepare the data?
Use our checklist
1
Where should I archive it?
Do it via GitHub and Zenodo (or figshare)
2
Concluding recommended readings
What is Open Science for Software
Engineering and how can I do it
(pragmatically)?
How should I disclose my
artefacts?
https://arxiv.org/abs/1904.06499
https://doi.org/10.5281/zenodo.8134402
Open Science in Software Engineering
▪ Why do we need Open Science in Software Engineering?
… where I will provide a personal brief reflection
▪ What is Open Science in Software Engineering?
… where we will discuss the adoption of OS to Software Engineering (and why it is so challenging)
▪ How can we engage in Open Science in Software Engineering?
… where we will discover hands-on advice and guidelines
Open Science in Software Engineering
Open science is inherently important for any
scientific endeavour.
Software Engineering research, however, poses
various challenges that require pragmatic guidelines.
As a community, we are continuously evolving, but it
remains our all (also your!) responsibility to engage.

Open Science in Software Engineering.pdf

  • 1.
    Open Science in SoftwareEngineering Blekinge Institute of Technology, Sweden fortiss GmbH, Germany www.mendezfe.org mendezfe Daniel Méndez The Why, the What, and the How
  • 2.
    About me Head ofresearch division Requirements Engineering mendezfe@acm.org mendezfe.org Munich Karlskrona Full Professor in Software Engineering Research areas / interests • Empirical Software Engineering • Requirements engineering • RE in regulated domains (HealthCare, FinTech) • RE for ML-intensive Systems • Human-centric RE Community engagement • Academia-Industry Collaborations and Technology Transfer • International Software Engineering Research Network • Area editor: EMSE ‘Open Science’ and JSS ‘In-Practice’ RE research group Gijon
  • 3.
    DISCLAIMER ▪ Open Scienceis inherently difficult and comes often with very idealistic / evangelical (and sometimes unrealistic) views. ▪ Especially Software Engineering poses many challenges that makes Open Science “by the book” impossible. ▪ The following talk focuses on key principles, but also on some pragmatic advice based on my own experiences.
  • 4.
    Special acknowledgement ▪ DanielGraziotin with whom we have implemented Open Science Policies for our major Empirical Software Engineering conferences ▪ Robert Feldt with whom we have implemented the Open Science Initiative for the Empirical Software Engineering Journal ▪ Julian Frattini with whom we have adopted artefact sharing guidelines ▪ Davide Fucci with whom we have elaborated more practical advice (and from whom I have stolen some slides)
  • 5.
    Open Science, ina nutshell Open Science describes the movement of making any research artefact available to the public and includes, but is not limited to, open data, open source, and open access. Source: D. Mendez, D. Graziotin, S. Wagner, H. Seibold. Open Science in Software Engineering + + =
  • 6.
    Why do weneed Open Science? What do you think?
  • 7.
    What can gowrong in research? Ill formulated No hypothesis Flawed instrument Inconsistent design Software bugs Wrong analysis methods Misinterpretation Story telling Post-diction Extrapolation File Drawer Bias Paywalls
  • 8.
    What can gowrong in sharing data? -- Own example Source: D. Mendez, B. Penzenstadler. Artefact-based Requirements Engineering: The AMDiRE Approach, 2014
  • 9.
    What can gowrong in sharing data? -- Own example 5 years later
  • 10.
    What can gowrong in sharing data? -- Own example • Where can I find it? -- Link in paper deprecated • How exactly should I use it? -- No readme instructions • How should I cite it? -- Repository not self-contained • What can I do with the data? -- No licence information Even after moving to new repository…
  • 11.
    What can gowrong in sharing data? -- Own example • Where can I find it? -- Link in paper deprecated • How exactly should I use it? -- No readme instructions • How should I cite it? -- Repository not self-contained • What can I do with the data? -- No licence information Even after moving to new repository… Sharing data so that the community can efficiently and sustainably use it is difficult, but it nevertheless is crucial!
  • 12.
    Why we needOpen Science Source: Garousi, V., Fernandes, J.M. Quantity versus impact of software engineering papers: a quantitative study. Transparency Replicability Discoverability Publicly-funded research belongs to everyone Central to the creation of theories and their acceptance Research has no impact if people cannot find it 14% of published research in public health disclosed their data 70% of researchers failed to reproduce natural science experiments 43% of papers in SE are never cited*
  • 13.
    Open Science inSoftware Engineering Software Engineering (and research) is a scientific, engineering-based approach to software development ▪ Address insight-oriented questions (e.g. exploring natural phenomena) ▪ Apply scientific methods to practical ends (e.g. testing hypothesis, formal proofs) Food for thoughts: 1. Who is this person to the left? 2. Did she engage in Open Science? Image Source: https://en.wikipedia.org/wiki/Margaret_Hamilton_(software_engineer)
  • 14.
    Open Science inSoftware Engineering Software Engineering (and research) is a scientific, engineering-based approach to software development ▪ Address insight-oriented questions (e.g. exploring natural phenomena) ▪ Apply scientific methods to practical ends (e.g. testing hypothesis, formal proofs) Food for thoughts: 1. Who is this person to the left? 2. Did she engage in Open Science? Image Source: https://en.wikipedia.org/wiki/Margaret_Hamilton_(software_engineer) There is a lot to unpack when it comes to Open Science in Software Engineering
  • 15.
    Open Science inSoftware Engineering ▪ Why do we need Open Science in Software Engineering? … where I will provide a personal brief reflection ▪ What is Open Science in Software Engineering? … where we will discuss the adoption of OS to Software Engineering (and why it is so challenging) ▪ How can we engage in Open Science in Software Engineering? … where we will discover hands-on advice and guidelines
  • 16.
    Open Science inSoftware Engineering ▪ Why do we need Open Science in Software Engineering? … where I will provide a personal brief reflection ▪ What is Open Science in Software Engineering? … where we will discuss the adoption of OS to Software Engineering (and why it is so challenging) ▪ How can we engage in Open Science in Software Engineering? … where we will discover hands-on advice and guidelines
  • 17.
    As a reviewerof manuscripts As a reviewer, I am responsible for peer-reviewing submitted articles and papers to journals, conferences, and workshops. Peer-review is key activity to preserve the quality and integrity of scientific publications [1]. Typical guiding questions: ▪ Is the paper “well” written (encompasses various aspects) ▪ Are the research methods appropriate? ▪ Are the results credible and valid? Source [1]: N. Ernst, J. Carver, D. Mendez, M. Torchiano. Understanding Peer Review of Software Engineering Papers (https://arxiv.org/abs/1904.06499) Source [2]: L. Prechelt, D. Graziotin, D. Mendez. A Community’s Perspective on the Status and Future of Peer Review in Software Engineering (https://arxiv.org/abs/1706.07196) Checking the credibility and validity requires access to the data Community survey [2]
  • 18.
    As an areaeditor or PC co-chair As an area editor and PC-chair, I am responsible for the organisation and oversight of reviewing processes in journals and conferences and ensure a high-quality programme. Typical guiding questions: ▪ Does the topic fit the scope of the conference/journal? ▪ Do the reviewers recommend acceptance? ▪ Do the authors disclose data/material/source code? ▪ If not, are there valid reasons? (“Do they have something to hide?”) ▪ If yes, do they comply with copyrights, non-disclosure agreements, other regulations? … Quite often, even experienced researchers violate regulations by accident.
  • 19.
    As an researcher/authorof manuscripts As a researcher and author of manuscripts, I want to disseminate my research results to the public for other researchers to build their work on top of my own research (and cite it) and for industry partners to adopt my research. Typical guiding questions: ▪ When reading other people’s work: ▪ Can I access their publication or is it behind a paywall? ▪ Can I use their data? ▪ When writing and publishing myself: ▪ Are my results reproducible and the data reusable? ▪ Is my paper accessible to the public (free of costly subscriptions or APC*)? * Article Processing Charge, i.e. cost of golden open access publication, usually ~3.000 EUR per manuscript Image Source: https://library.hkust.edu.hk/sc/das/
  • 20.
    As a researchmanager… As a research manager, I am responsible for coordinating my research group and facilitating their research with guidance and external funding. Typical guiding questions: ▪ Which are topic areas of high interest (for the community, for industry partners)? ▪ How can we foster collaborations in the community? ▪ How should results be disseminated? ▪ How can we ensure proper funding and how can we align openness with funding regulations? ▪ Industry funding: typically opposed to open science ▪ Public funding: typically in favour (“research funded by tax payers’ money should also belong to the public”)
  • 21.
    What do fundingagencies say about Open Science? “[…] the ERC expects its grantees to publish in peer-reviewed articles and monographs. The ERC considers that providing free online access to these materials is the most effective way of ensuring that the fruits of the research it funds can be accessed, read, and used as the basis for further research.” ▪ Open Access: ” Beneficiaries must ensure open, free-of-charge access to the end-user to peer-reviewed scientific publications relating to their results.” ▪ Open Data: “The data management plan must address the FAIR principles”
  • 22.
    What do fundingagencies say about Open Science? “[…] The results of projects financed by DFG funding should be made available to the public. […]” ▪ Open Access: ” The recipients of DFG grants are hereby requested to publish their project results for the purpose of scholarly communication in open access […] without delay. ▪ Open Data: “The long-term archiving and accessibility of research data contributes to the traceability and quality of scientific work and enables researchers to carry on work begun by others. [Detailed instructions in the guideline]”
  • 23.
    TL;DR Open Science isbecoming the norm.
  • 24.
    Open Science inSoftware Engineering ▪ Why do we need Open Science in Software Engineering? … where I will provide a personal brief reflection ▪ What is Open Science in Software Engineering? … where we will discuss the adoption of OS to Software Engineering (and why it is so challenging) ▪ How can we engage in Open Science in Software Engineering? … where we will discover hands-on advice and guidelines
  • 25.
    What is happeningin the Software Engineering Research Community Workshops Conferences Journals 2017 ICSE 2019 2020 2024 2018 ACM SIGSOFT Artefact Evaluation EMSE Open Science Policy JSS Open Science Policy … Systematisation via guidelines to support authors … … Reflection to make Open Science mandatory … …
  • 26.
    Result of ourjoint work: multiple guidelines ▪ Book chapters: Open Science in Software Engineering ▪ Editorial ▪ Artefact Evaluation Guidelines ▪ Review guidelines ▪ Repositories: https://github.com/emsejournal/openscience ▪ … …
  • 27.
    What is itnow what matters most for Software Engineering?
  • 28.
    In essence, wefocus on 4 aspects Open Access: The published manuscript is freely accessible. Source: Center for Open Science, www.cos.io (preregistered reports: https://osf.io/prereg/) Pre-registered: At least one study’s design has been preregistered with descriptions of the research design and study materials. Done, e.g., via conferences. Open Materials: All materials necessary to reproduce and replicate the reported results are publicly available. Note: Open source typically addressed separately. Open Data: All data used in the study is publicly available. Data should be FAIR.
  • 29.
    What is FAIRdata? Research data should be… Data Objects should be uniquely and persistently identifiable. Data Objects can always be obtained by machines and humans upon appropriate authorisation. Data Objects need to be integrated with others and interoperate with workflows. Data Objects need to be well-described to be replicated in different settings. Findable Accessible Interoperable Reusable FAIR principles put emphasis on enhancing the ability of machines to automatically find and use the data, not necessarily humans. FAIR data not necessarily open data.
  • 30.
    Different stages of“openness” Source (badges): ACM Artifact Review and Badging. https://www.acm.org/publications/policies/artifact-review-and-badging-current Artefact found to be documented Artefact documentation exceeds functional quality Reusable & permanently and publicly available Available & results are reproduced Reproduced & replicated independently Source (adoption): S. Abrahão, D. Mendez. ICSE‘21 Artifact Evaluation Track – Submission and Reviewing Guideline. 10.6084/m9.figshare.14123639 Every conference has own adoption
  • 31.
    With everything inplace, how well are we doing? What do you think?
  • 32.
    Software Engineering datasharing in general ~70% Of papers to recent conferences share their data [1]. Source [1]: Own observation from recent conference and track organisations (e.g. REFSQ 2024) ~30% Of data shared is shared via personal websites or consumer cloud storage (e.g. google) [1]. These will eventually disappear [2]. Source [2]: https://github.com/dgraziotin/disclose-data-dbr-first-then-opendata
  • 33.
    Software Engineering datasharing via proper repositories Source: W. Hasselbring, L. Car, S. Hettrick, H. Packer, T. Tiropanis. FAIR and Open Computer Science Research Software of the GitHub repositories mentioned in the ACM Digital Library are in the area Software and its Engineering ~27% 15 days Is their median life span… 1/3 are live for less than one day.
  • 34.
    Software Engineering artefacts (opendata, open material) – 1/2 Source: C. Timperley, L. Herckis, C. Le Goues, M. Hilton. Unterstanding and Improving Artifact Sharing in Software Engineering Research ▪ 2/3 of all papers published in major SE conferences* shared their artefacts ▪ Trend grew from 50,56% in 2014 to 69,47% in 2018 ▪ In tune with more specialised conferences (e.g. RE) * ICSE, ESEC/FSE, ESEM, ASE “Some artifacts are dangerous, difficult, or impossible to share due to privacy and IP concerns or belonging to a large ecosystem”
  • 35.
    Software Engineering artefacts (opendata, open material) – 2/2 Source: D. Graziotin. https://ineed.coffee/5836/effectiveness-of-open-science-policies-at-esec-fse-2019/ A closer look at the example of the ESEC/FSE 2019 conference: ▪ 67% of the submissions with accompanying data available for peer review ▪ 87% of submissions promised open data in case of acceptance ▪ 25% of data properly archived Reasons provided by authors Lack of motivation Simply not possible
  • 36.
    When having somany principles and practices, why are we still not engaging in Open Science? What do you think?
  • 37.
    Challenges in OpenScience for Software Engineering ▪ Often, researchers lack the awareness: ▪ It is perceived as very time-consuming and often not seen as worthwhile. ▪ It is easy to do it wrong (e.g. wrong licence or not properly anonymising data) ▪ Even if there is awareness, it is still inherently difficult. Our research data is often: mixed (qualitative and quantitative) not self-explanatory confidential & company-specific difficult to anonymize *Source: B. Hermann, S. Winter, J. Siegmund. Community expectations for research artifacts and evaluation processes We still need an increased awareness for the importance of Open Science in Software Engineering, and proper (adopted) guidelines and tools.
  • 38.
    Open Science inSoftware Engineering ▪ Why do we need Open Science in Software Engineering? … where I will provide a personal brief reflection ▪ What is Open Science in Software Engineering? … where we will discuss the adoption of OS to Software Engineering (and why it is so challenging) ▪ How can we engage in Open Science in Software Engineering? … where we will discover hands-on advice and guidelines
  • 39.
    A few pragmaticrecommendations only How to make data publicly available? How to make manuscripts publicly available?
  • 40.
    Making manuscripts publiclyavailable ▪ Open access means public access without financial legal technical … restrictions ▪ 2 options ▪ Publisher makes manuscript available (against Article Process Charge / APC*) ▪ Self-archiving (for free ☺) * On average around 3.000,- EUR per article Self-archiving is something we always can do (or at least try). Important is to legally comply with the publishing agreement and to make it permanently available.
  • 41.
    Making manuscripts publiclyavailable Am I allowed to self-archive my manuscript? Check Sherpa/Romeo data base for regulations. 1 2 How can I self-archive my manuscript? Do it via arXiv.org
  • 42.
    Making data publiclyavailable Preparing data and making it publicly available (following FAIR principles) includes various aspects to consider: ▪ What information is relevant and how do I need to prepare it? ▪ How do I document it? ▪ What is an appropriate licence? ▪ Where should I archive it (and how) to make it permanently available?
  • 43.
    Making data publiclyavailable How should I prepare the data? Use our checklist 1 Where should I archive it? Do it via GitHub and Zenodo (or figshare) 2
  • 44.
    Concluding recommended readings Whatis Open Science for Software Engineering and how can I do it (pragmatically)? How should I disclose my artefacts? https://arxiv.org/abs/1904.06499 https://doi.org/10.5281/zenodo.8134402
  • 45.
    Open Science inSoftware Engineering ▪ Why do we need Open Science in Software Engineering? … where I will provide a personal brief reflection ▪ What is Open Science in Software Engineering? … where we will discuss the adoption of OS to Software Engineering (and why it is so challenging) ▪ How can we engage in Open Science in Software Engineering? … where we will discover hands-on advice and guidelines
  • 46.
    Open Science inSoftware Engineering Open science is inherently important for any scientific endeavour. Software Engineering research, however, poses various challenges that require pragmatic guidelines. As a community, we are continuously evolving, but it remains our all (also your!) responsibility to engage.