Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

The Research Software Alliance (ReSA) and the Community Landscape

5 minute read

Published:

(This post is cross-posted on the UK Software Sustainability Institute blog, the Netherlands eScience Center blog and the US Research Software Sustainability Institute blog.) ReSA’s mission is to bring research software communities together to collaborate on the advancement of research software. Its vision is to have research software recognized and valued as a fundamental and vital component of research worldwide. Given our mission, there are multiple reasons that it’s important for us to understand the landscape of communities that are involved with software, in aspects such as preservation, citation, career paths, productivity, and sustainability. One of these reasons is that ReSA seeks to be a link between these communities, which requires identifying and understanding them. We want to be sure that there aren’t significant community organizations that we don’t know about to involve in our work. Also, identifying where there are gaps will help us create the opportunities and communities of practices as required. When thinking about these communities, it’s clear that in addition to those that focus on software, there are others for which software is just a small part of their interest. Some examples are communities that focus on open science, reproducibility, roles and careers for people who are less visible in research, publishing and review, and other types of scholarly products and digital objects. ReSA also wants to define how we fit and interact with that broader scholarly landscape.

How was this work undertaken?

In September 2019, a ReSA taskforce came together to map the software community landscape, consisting of the authors of this blog. This group distributed a survey to ReSA google group members to identify other groups interested in software. Other useful sources included:The taskforce then met to consider the results and how to analyze them. The ReSA list of research software communities is now publicly available as a living community resource, with the version of this list used by the ReSA taskforce in February 2020 and a copy of this post archived in Zenodo. Suggested additions or corrections are welcome by making comments in the list. Some of the issues we’ve had in assembling this list are:
  • How much interest in software does an organization need to have to be listed?
  • When is an organization sufficiently research focused to be included?
  • What momentum/scale does an organization need to have so that we consider it relevant in the global picture?
On the other hand, once we started adding entries to the list, for many we found that we immediately thought of other similar organizations that should be added. For example, some organizations have a geographic aspect, and this led us to think of other similar organizations with different geographic aspects, such as all the national and regional RSE associations.

What did we learn?

There were a range of interesting outcomes of the analysis:
  • There are many, many communities that support research software, emphasizing the need for a coordinating organization such as ReSA. The importance of community development is captured in articles such as Community Organizations: Changing the Culture in Which Research Software is Developed and Sustained by Daniel S. Katz et al., which provides an overview of key groups and discusses opportunities to leverage their synergistic activities.
  • There is an increasing (and wide) range of community initiatives. For example, the Open Science Grassroots Community Networks list has evolved into the Community of Open Scholarship Grassroots Networks (COSGN), whose networks communicate and coordinate on topics of common interest. COSGN has submitted an NSF proposal to formalize governance and coordination of the networks to maximize impact and establish standard practices for sustainability.
  • The increasing focus on open software makes it hard to separate research and non-research initiatives. As per the points above, it is very hard to define which initiatives are part of the research software community, and which aren’t.
  • Some organizations that were originally data-centric now include a software focus. For example, the Research Data Alliance now includes the Software Source Code Interest Group, which provides a forum to discuss issues on management, sharing, discovery, archiving, and provenance of software source code.

What are the next steps?

We invite readers to continue to add or make corrections to the ReSA list of research software communities by making comments in the list, which will continue to be curated by ReSA. We are also interested to hear from community members who would like to engage with us in writing a landscape paper based on further analysis and work. This could address questions such as what are the axes that create the space, where do the currently-known organizations fit in the space, and are there gaps where no organization is currently working? We also invite readers to consider involvement in other ReSA activities, including Taskforces.

Conclusion

The ever-growing number of constituents of the research software community both reflects and demonstrates the increasing recognition of research software. The research software community is now a complex ecosystem comprised of a wide variety of organizations and initiatives, some of which are community networks themselves. Collaboration and coordination across these initiatives is important, to enable the broader community to work together to achieve bigger goals. ReSA aims to coordinate across these efforts to leverage investments, to achieve the shared long-term goal of research software valued as a fundamental and vital component of research worldwide. Join the ReSA google group to stay up-to-date on our activities. Read more

portfolio

ExPaNDS

The ambitious ExPaNDS project is a collaboration between 10 national Photon and Neutron Research Infrastructures (PaN RIs) as well as EGI. The project aims to deliver standardised, interoperable, and integrated data sources and data analysis services for Photon and Neutron facilities.
Read more

publications

The FAIR Guiding Principles for scientific data management and stewardship

Published in Scientific Data, 2016

This is the first formalisation of the FAIR guiding principes for data management and stewardship, which aim at making data Findable, Accessible, Interoperable and Reusable (FAIR). Read more

Recommended citation: Wilkinson, Mark D. and Dumontier, Michel and Aalbersberg, IJsbrand Jan and Appleton, Gabrielle and Axton, Myles and Baak, Arie and Blomberg, Niklas and Boiten, Jan-Willem and da Silva Santos, Luiz Bonino and Bourne, Philip E. and Bouwman, Jildau and Brookes, Anthony J. and Clark, Tim and Crosas, Mercè and Dillo, Ingrid and Dumon, Olivier and Edmunds, Scott and Evelo, Chris T. and Finkers, Richard and Gonzalez-Beltran, Alejandra and Gray, Alasdair J. G. and Groth, Paul and Goble, Carole and Grethe, Jeffrey S. and Heringa, Jaap and ’t Hoen, Peter A. C and Hooft, Rob and Kuhn, Tobias and Kok, Ruben and Kok, Joost and Lusher, Scott J. and Martone, Maryann E. and Mons, Albert and Packer, Abel L. and Persson, Bengt and Rocca-Serra, Philippe and Roos, Marco and van Schaik, Rene and Sansone, Susanna-Assunta and Schultes, Erik and Sengstag, Thierry and Slater, Ted and Strawn, George and Swertz, Morris A. and Thompson, Mark and van der Lei, Johan and van Mulligen, Erik and Velterop, Jan and Waagmeester, Andra and Wittenburg, Peter and Wolstencroft, Katherine and Zhao, Jun and Mons, Barend. "The FAIR Guiding Principles for scientific data management and stewardship", Scientific Data, https://doi.org/10.1038/sdata.2016.18 https://doi.org/10.1038/sdata.2016.18

Data discovery with DATS: exemplar adoptions and lessons learned

Published in Journal of the American Medical Informatics Association, 2017

This paper analyses the implementation of the DATS model for data discovery in a set of exemplar data sources Read more

Recommended citation: Alejandra N Gonzalez-Beltran, John Campbell, Patrick Dunn, Diana Guijarro, Sanda Ionescu, Hyeoneui Kim, Jared Lyle, Jeffrey Wiser, Susanna-Assunta Sansone, Philippe Rocca-Serra. "Data discovery with DATS: exemplar adoptions and lessons learned" Journal of the American Medical Informatics Association, Volume 25, Issue 1, 1 January 2018, Pages 13–16, https://doi.org/10.1093/jamia/ocx119 https://doi.org/10.1093/jamia/ocx119

PhenoMeNal: processing and analysis of metabolomics data in the cloud

Published in GigaScience, 2018

This paper PhenoMeNal provides a cloud e-infrastructures solution to analyse metabolomics data. It provides easy-to-use web interfaces that can be scaled to any custom public and private cloud environment.. Read more

Recommended citation: Kristian Peters, James Bradbury, Sven Bergmann, Marco Capuccini, Marta Cascante, Pedro de Atauri, Timothy M D Ebbels, Carles Foguet, Robert Glen, Alejandra Gonzalez-Beltran, Ulrich L Günther, Evangelos Handakas, Thomas Hankemeier, Kenneth Haug, Stephanie Herman, Petr Holub, Massimiliano Izzo, Daniel Jacob, David Johnson, Fabien Jourdan, Namrata Kale, Ibrahim Karaman, Bita Khalili, Payam Emami Khonsari, Kim Kultima, Samuel Lampa, Anders Larsson, Christian Ludwig, Pablo Moreno, Steffen Neumann, Jon Ander Novella, Claire O'Donovan, Jake T M Pearce, Alina Peluso, Marco Enrico Piras, Luca Pireddu, Michelle A C Reed, Philippe Rocca-Serra, Pierrick Roger, Antonio Rosato, Rico Rueedi, Christoph Ruttkies, Noureddin Sadawi, Reza M Salek, Susanna-Assunta Sansone, Vitaly Selivanov, Ola Spjuth, Daniel Schober, Etienne A Thévenot, Mattia Tomasoni, Merlijn van Rijswijk, Michael van Vliet, Mark R Viant, Ralf J M Weber, Gianluigi Zanetti, Christoph Steinbeck; PhenoMeNal: processing and analysis of metabolomics data in the cloud, GigaScience, Volume 8, Issue 2, 1 February 2019, giy149, [https://doi.org/10.1093/gigascience/giy149](https://doi.org/10.1093/gigascience/giy149) https://doi.org/10.1093/gigascience/giy149

Discovering Data Access and Use Requirements Using the Data Tag Suite (DATS)

Published in bioRxiv, 2019

This paper is about the representation of data access and data use requirements for the Data Tag Suite (DATS) model. Read more

Recommended citation: Discovering Data Access and Use Requirements Using the Data Tag Suite (DATS) Model George Alter, Alejandra Gonzalez-Beltran, Lucila Ohno-Machado, Philippe Rocca-Serra bioRxiv 518571; doi: https://doi.org/10.1101/518571 https://doi.org/10.1101/518571

Interoperable and scalable data analysis with microservices: applications in metabolomics

Published in Bioinformatics, 2019

Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Read more

Recommended citation: Payam Emami Khoonsari, Pablo Moreno, Sven Bergmann, Joachim Burman, Marco Capuccini, Matteo Carone, Marta Cascante, Pedro de Atauri, Carles Foguet, Alejandra N Gonzalez-Beltran, Thomas Hankemeier, Kenneth Haug, Sijin He, Stephanie Herman, David Johnson, Namrata Kale, Anders Larsson, Steffen Neumann, Kristian Peters, Luca Pireddu, Philippe Rocca-Serra, Pierrick Roger, Rico Rueedi, Christoph Ruttkies, Noureddin Sadawi, Reza M Salek, Susanna-Assunta Sansone, Daniel Schober, Vitaly Selivanov, Etienne A Thévenot, Michael van Vliet, Gianluigi Zanetti, Christoph Steinbeck, Kim Kultima, Ola Spjuth, Interoperable and scalable data analysis with microservices: applications in metabolomics, Bioinformatics, , btz160, https://doi.org/10.1093/bioinformatics/btz160 https://doi.org/10.1093/bioinformatics/btz160

FAIRsharing as a community approach to standards, repositories and policies

Published in Nature Biotechnology, 2019

Read more

Recommended citation: FAIRsharing as a community approach to standards, repositories and policies Susanna-Assunta Sansone, Peter McQuilton, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Massimiliano Izzo, Allyson L. Lister, Milo Thurston & the FAIRsharing Community Nat Biotechnol. 2019 Apr;37(4):358-367. doi: https://doi.org/10.1038/s41587-019-0080-8 https://doi.org/10.1038/s41587-019-0080-8

Software Citation Checklist for Authors

Published in zenodo, 2019

This document provides a simple, generic checklist that authors of academic work (papers, books, conference abstracts, blog posts, etc.) can use to ensure they are following good practice when referencing and citing software they have used, both created by themselves for their research as well as obtained from other sources. It may also be used and adapted by journal editors, publishers and conference chairs as the basis of more specific guidance for their contributors and reviewers. Read more

Recommended citation: Chue Hong, Neil P., Allen, Alice, Gonzalez-Beltran, Alejandra, de Waard, Anita, Smith, Arfon M., Robinson, Carly, … Pollard, Tom. (2019, October 15). Software Citation Checklist for Authors (Version 0.9.0). Zenodo. http://doi.org/10.5281/zenodo.3479199 https://doi.org/10.5281/zenodo.3479199

Software Citation Checklist for Developers

Published in zenodo, 2019

This document provides a minimal, generic checklist that developers of software (either open or closed source) used in research can use to ensure they are following good practice around software citation. This will help developers get credit for the software they create, and improve transparency, reproducibility, and reuse. Read more

Recommended citation: Chue Hong, Neil P., Allen, Alice, Gonzalez-Beltran, de Waard, Anita, Smith, Arfon M., Robinson, Carly, Jones, Catherine, Bouquin, Daina, Katz, Daniel S., Kennedy, David, Ryder, Gerry, Hausman, Jessica, Hwang, Lorraine, Jones, Matthew B., Harrison, Melissa, Crosas, Mercè, Wu, Mingfang, Löwe, Peter, Haines, Robert, … Pollard, Tom. (2019). Software Citation Checklist for Developers (0.9.0). Zenodo. https://doi.org/10.5281/zenodo.3482769 https://doi.org/10.5281/zenodo.3482769

Special Issue on Scholarly Data Analysis (Semantics, Analytics, Visualisation)

Published in Data Science Journal, 2019

The increasing interest in analysing, describing, and improving the research process requires the development of new forms of scholarly data publication and analysis that integrates lessons and approaches from the field of Semantic Technologies, Science of Science, Digital Libraries, and Artificial Intelligence. This editorial summarises the content of the Special Issue on Scholarly Data Analysis (Semantics, Analytics, Visualisation), which aims to showcase some of the most interesting research efforts in the field. This issue includes an extended version of the best papers of the last two editions of the “Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination” (SAVE-SD 2017 and 2018) workshop at The Web Conference. Read more

Recommended citation: https://content.iospress.com/journals/data-science/2/1-2

The Data Tags Suite (DATS) model for discovering data access and use requirements

Published in GigaScience journal, 2020

This paper is about the representation of data access and data use requirements for the Data Tag Suite (DATS) model. Read more

Recommended citation: George Alter, Alejandra Gonzalez-Beltran, Lucila Ohno-Machado, Philippe Rocca-Serra, The Data Tags Suite (DATS) model for discovering data access and use requirements, GigaScience, Volume 9, Issue 2, February 2020, giz165, https://doi.org/10.1093/gigascience/giz165 https://doi.org/10.1101/518571

COPO: a metadata platform for brokering FAIR data in the life sciences

Published in F1000, 2020

COPO is a computational system that attempts to address some of these challenges by enabling scientists to describe their research objects (raw or processed data, publications, samples, images, etc.) using community-sanctioned metadata sets and vocabularies, and then use public or institutional repositories to share them with the wider scientific community. Read more

Recommended citation: Shaw F, Etuk A, Minotto A et al. COPO: a metadata platform for brokering FAIR data in the life sciences [version 1; peer review: awaiting peer review]. F1000Research 2020, 9:495 https://doi.org/10.12688/f1000research.23889.1

Draft extended data policy framework for Photon and Neutron RIs

Published in Zenodo, 2020

We review the FAIR data policy landscape at European and national levels, consider the current state of data policy adoption and implementation at ExPaNDS partner facilities, and examine existing FAIR ecosystem data policy recommendations, in particular, from the Turning FAIR into reality report and the recent FAIRsFAIR Deliverable 3.3: Policy enhancement recommendations In response, we make twenty-six recommendations of our own that serve to translate these recommendations to the local level of photon and neutron research infrastructures. Read more

Recommended citation: Matthews, Brian, McBirnie, Abigail, Vukolov, Andrei, Ashton, Alun, Collins, Stephen, Da Graca Ramos, Sylvie, Gagey, Brigitte, Gonzalez-Beltran, Alejandra, Johnsson, Maria, Krahl, Rolf, Ounsy, Majid and Van Daalen, Mirjam 2020. Draft extended data policy framework for Photon and Neutron RIs. Zenodo https://doi.org/10.5281/zenodo.4014811

Report on status, gap analysis and roadmap towards harmonised and federated metadata catalogues for EU national Photon and Neutron RIs

Published in Zenodo, 2020

The ExPaNDS project aims at deploying into EOSC Data Catalogues and data analysis services. This document describes the status, a gap analysis, and a roadmap required to achieve harmonised and federated (meta)data catalogues within EOSC of the participating national Photon and Neutron (PaN) Research Infrastructures (RIs). Read more

Recommended citation: Ashton, Alun, Da Graca Ramos, Sylvie & Gonzalez-Beltran, Alejandra. Report on status, gap analysis and roadmap towards harmonised and federated metadata catalogues for EU national Photon and Neutron RIs. (Zenodo, 2020). doi:[10.5281/zenodo.4146819](https://doi.org/10.5281/zenodo.4146819) https://doi.org/10.5281/zenodo.4146819

Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group

Published in Wellcome Open Research, 2020

The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. Read more

Recommended citation: Austin CC, Bernier A, Bezuidenhout L et al. Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group [version 2; peer review: 1 approved, 2 approved with reservations]. Wellcome Open Res 2021, 5:267 (https://doi.org/10.12688/wellcomeopenres.16378.2) https://doi.org/10.12688/wellcomeopenres.16378.1

Ten Simple Rules for making a vocabulary FAIR

Published in arXiv, 2020

We present ten simple rules that support converting a legacy vocabulary – a list of terms available in a print-based glossary or table not accessible using web standards – into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a distinct IRI for each term or concept. A standard representation of the concept should be returned when the individual IRI is de-referenced, using SKOS or OWL serialised in an RDF-based representation for machine-interchange, or in a web-page for human consumption. Guidelines for vocabulary and item metadata are provided, as well as development and maintenance considerations. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration. Read more

Recommended citation: Simon J D Cox, Alejandra N Gonzalez-Beltran, Barbara Magagna, Maria-Cristina Marinescu. "Ten Simple Rules for making a vocabulary FAIR" https://arxiv.org/abs/2012.02325 https://arxiv.org/abs/2012.02325

ExPaNDS ontologies v1.0

Published in Zenodo, 2021

We present ontologies for the domain of photon and neutron (PaN) science. With the primary goal of supporting PaN FAIR data catalogue services, we have developed three ontologies: PaN experimental techniques (PaNET), an ontology of NeXus definitions (NeXusOntology), and a semantic integration ontology for the PaN domain (PaNmapping). The ontologies are presented as initial versions, supported by community development workflows. The work represents deliverable D3.2 of the Horizon 2020 ExPaNDS project. Read more

Recommended citation: Collins, Steve P., da Graça Ramos, Silvia, Iyayi, Daniel, Görzig, Heike, González Beltrán, Alejandra, Ashton, Alun, Egli, Stefan, and Minotti, Carlo, 2021, ExPaNDS ontologies v1.0: Zenodo, doi:10.5281/zenodo.4806026. https://doi.org/10.5281/zenodo.4806026

Radical collaboration during a global health emergency: development of the RDA COVID-19 data sharing recommendations and guidelines

Published in Open Research Europe, 2021

The purpose of the present work was to explore how the RDA succeeded in engaging the participation of its community of scientists in a rapid response to the EC request. The three constructs of radical collaboration (inclusiveness, distributed digital practices, productive and sustainable collaboration) were found to be well supported in both the quantitative and qualitative analyses of the survey data. Other social factors, such as motivation and group identity were also found to be important to the success of this extreme collaborative effort. Read more

Recommended citation: Pickering B, Biro T, Austin CC et al. Radical collaboration during a global health emergency: development of the RDA COVID-19 data sharing recommendations and guidelines [version 1; peer review: awaiting peer review]. Open Research Europe 2021, 1:69 (https://doi.org/10.12688/openreseurope.13369.1) https://doi.org/10.12688/openreseurope.13369.1

Ten Simple Rules for making a vocabulary FAIR

Published in PLoS Computational Biology, 2021

We present ten simple rules that support converting a legacy vocabulary—a list of terms available in a print-based glossary or in a table not accessible using web standards—into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a globally unique resolvable identifier for each term or concept. A standard representation of the concept should be returned when the individual web identifier is resolved, using SKOS or OWL serialised in an RDF-based representation for machine-interchange and in a web-page for human consumption. Guidelines for vocabulary and term metadata are provided, as well as development and maintenance considerations. The rules are arranged as a stepwise recipe for creating a FAIR vocabulary based on the legacy vocabulary. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration. Read more

Recommended citation: Cox SJD, Gonzalez-Beltran AN, Magagna B, Marinescu MC (2021) Ten simple rules for making a vocabulary FAIR. PLOS Computational Biology 17(6): e1009041. https://doi.org/10.1371/journal.pcbi.1009041 https://doi.org/10.1371/journal.pcbi.1009041

ExPaNDS Metadata Catalogue Release

Published in Zenodo, 2021

This document presents the milestone achieved for a metadata catalogue release in the domain of photon and neutron (PaN) science. With the primary goal of supporting PaN FAIR data catalogue services, we have developed a self-contained, stand-alone metadata catalogue release that facilities can download to test/try and play with. Read more

Recommended citation: Minotti, Carlo, da Graca Ramos, Silvia, Ashton, Alun, Egli, Stephan, Bolmsten, Fredrik, Johansson, Henrik, Novelli, Massimiliano, Gonzalez-Beltran, Alejandra, & Pullinger, Stuart. (2021). ExPaNDS Metadata Catalogue Release. Zenodo. https://doi.org/10.5281/zenodo.5205909 https://doi.org/10.5281/zenodo.5205909

FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

Published in ArXiV, 2021

Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of “following the science” are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline developed during the COVID-19 pandemic that allows easy annotation of data as they are consumed by analyses, while tracing the provenance of scientific outputs back through the analytical source code to data sources. Such a tool provides a mechanism for the public, and fellow scientists, to better assess the trust that should be placed in scientific evidence, while allowing scientists to support policy-makers in openly justifying their decisions. We believe that tools such as this should be promoted for use across all areas of policy-facing research. Read more

Recommended citation: Sonia Natalie Mitchell, Andrew Lahiff, Nathan Cummings, Jonathan Hollocombe, Bram Boskamp, Dennis Reddyhoff, Ryan Field, Kristian Zarebski, Antony Wilson, Martin Burke, Blair Archibald, Paul Bessell, Richard Blackwell, Lisa A Boden, Alys Brett, Sam Brett, Ruth Dundas, Jessica Enright, Alejandra N. Gonzalez-Beltran, Claire Harris, Ian Hinder, Christopher David Hughes, Martin Knight, Vino Mano, Ciaran McMonagle, Dominic Mellor, Sibylle Mohr, Glenn Marion, Louise Matthews, Iain J. McKendrick, Christopher Mark Pooley, Thibaud Porphyre, Aaron Reeves, Edward Townsend, Robert Turner, Jeremy Walton, Richard Reeve. "FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows" https://arxiv.org/abs/2110.07117 https://arxiv.org/abs/2110.07117

FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

Published in Phil. Trans. R. Soc. A, 2022

Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’. Read more

Recommended citation: Sonia Natalie Mitchell, Andrew Lahiff, Nathan Cummings, Jonathan Hollocombe, Bram Boskamp, Dennis Reddyhoff, Ryan Field, Kristian Zarebski, Antony Wilson, Martin Burke, Blair Archibald, Paul Bessell, Richard Blackwell, Lisa A Boden, Alys Brett, Sam Brett, Ruth Dundas, Jessica Enright, Alejandra N. Gonzalez-Beltran, Claire Harris, Ian Hinder, Christopher David Hughes, Martin Knight, Vino Mano, Ciaran McMonagle, Dominic Mellor, Sibylle Mohr, Glenn Marion, Louise Matthews, Iain J. McKendrick, Christopher Mark Pooley, Thibaud Porphyre, Aaron Reeves, Edward Townsend, Robert Turner, Jeremy Walton, Richard Reeve. "FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows" https://arxiv.org/abs/2110.07117 https://doi.org/10.1098/rsta.2021.0300

The Trail from Data to Policy in the COVID-19 Pandemic and Beyond

Published in Scientific Computing Department, Annual Review 2021-22, 2022

COVID-19 was an event that marked a ‘before’ and ‘after’ in our lives. We will all remember the days when daily statistics on positive cases, hospitalisations and fatalities were reported in each region of the world, providing metrics on the looming situation. And some of those statistics will continue to be collected for the foreseeable future. Read more

Recommended citation: https://www.scd.stfc.ac.uk/SiteAssets/SCD%20Annual%20Review%202021-2022.pdf

Machine actionable metadata models

Published in Nature Scientific Data, 2022

Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset. Read more

Recommended citation: Batista, D., Gonzalez-Beltran, A., Sansone, SA. et al. Machine actionable metadata models. Sci Data 9, 592 (2022). https://doi.org/10.1038/s41597-022-01707-6 https://doi.org/10.1038/s41597-022-01707-6

FAIR-IMPACT M5.3 Semantic artefact assessment methodology

Published in Zenodo, 2023

Semantic artefacts (i.e., ontologies, vocabularies and SKOS taxonomies, among others) define the structure, guide the construction of, and help validate many existing Knowledge Graphs. In the last years, a number of guidelines have been proposed (Poveda-Villalón et al. 2020; Garijo and Poveda-Villalón 2020; Hugo et al. 2020; Le Franc et al., 2022; Xu et al. 2023) to align semantic artefact best practices against the Findable, Accessible, Interoperable and Reusable principles (FAIR principles) (Wilkinson et al. 2016). Based on these guidelines, new validators and assistants have been developed (Garijo et al. 2021; Amdouni et al. 2022a; 2022b) in order to guide users assessing their own semantic artefacts against the FAIR principles. However, different tests are based on different interpretations of the FAIR principles, resulting in different scores and checks for semantic artefacts. To the best of our knowledge, there is no generic methodology grouping the types of tests to perform in semantic artefacts, in order to map existing assessment efforts in a consistent manner. In this document, we propose such a methodology. We do so by taking an ontology development perspective, dividing semantic artefacts into smaller parts (their code, content, ontology metadata, etc.) that can be individually assessed at different stages of their development process. We build on the Linked Open Terms (LOT) methodology (Poveda-Villalón et al. 2022), adding a “FAIR assessment” module, and, for each activity, we validate our approach by mapping to two existing semantic artefact FAIR assessment validators: FOOPS! (Garijo et al. 2021) and O’FAIRe (Amdouni et al. 2022a; 2022b). The rest of the document outlines our methodology, describes each step in detail, and maps it to existing FAIR principles and guidelines. Read more

Recommended citation: Garijo, Daniel, Poveda-Villalón, María, Flohr, Pascal, Gonzalez-Beltran, Alejandra, le Franc, Yann, & Verburg, Maaike. (2023). M5.3 Semantic artefact assessment methodology (Version 1). Zenodo. https://doi.org/10.5281/zenodo.8305173 https://doi.org/10.5281/zenodo.8305173

The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake

Published in Data Intelligence 2023, 2023

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besides demonstrating the fitness of DCAT to meet its intended purpose, helped identify existing issues and gaps. Moreover, over the last few years, additional requirements emerged in data catalogs, given the increasing practice of documenting not only datasets but also data services and APIs. This paper illustrates the new version of DCAT, explaining the rationale behind its main revisions and extensions, based on the collected use cases and requirements, and outlines the issues yet to be addressed in future versions of DCAT. Read more

Recommended citation: Riccardo Albertoni, David Browning, Simon Cox, Alejandra N. Gonzalez-Beltran, Andrea Perego, Peter Winstanley; The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake. Data Intelligence 2023; doi: https://doi.org/10.1162/dint_a_00241 https://doi.org/10.1162/dint_a_00241

service

talks

What was the plan? A role for data standards, models and computational workflows in scholarly data publishing

Published:

This talk explores how principles derived from experimental design practice, data and computational models can greatly enhance data quality, data generation, data reporting, data publication and data review. For this, I presented a case study on reproducibility that was a collaboration between the GigaScience journal and the ISA-commons, Research Object and Nanopublication communities. You can read more about my presentation in Scott Edmund’s blog post for the GigaScience Journal and see my slides can be found below. Read more

EBI Metagenomics Bioinfomatics Course

Published:

Within the Metagenomics Bioinformatics Course, Eamonn Maguire and I gave a tutorial on “Metagenomic Data Provenance and Management using the ISA infrastructure — overview, implementation patterns & software tools”. Read more

Better software + better data = better research

Published:

In April 2019, I gave this talk in CIFASIS. CIFASIS, in Spanish Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, is a research institute of the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) or the National Council for Science and Technology of Argentina. Read more

Research Reproducibility

Published:

I was invited to give a talk on “Research Reproducibility” during the “Research Day” of the EPSRC- and MRC-funded Oxford-Nottingham Centre for Doctoral Training in Biomedical Imaging (ONBI CDT). Read more

Research object citation and cataloguing

Published:

Gemma Poulter and I were invited to talk in the CoSeC@CIUK session on 9th December 2021 about “Research object citation and cataloguing”. This is a session organised by the Computational Science Centre for Research Communities (CoSeC) in Computing Insight UK 2021 (CIUK2021) conference. Read more

Towards FAIR research data management

Published:

I was invited to present in the CECAM workshop on “Machine actionable data for chemical sciences: Bridging experiments, simulations, and machine learning for spectral data” (MADICES), which took place online from 7th to 9th February 2022. Read more

The Data Catalog Vocabulary (DCAT)

Published:

Peter Winstanley from Semantics Arts and I were invited to give a talk for the FAIRsFAIR webinar on “Using DCAT (Data Catalogue Vocabulary) to support metadata catalogue integration”. We gave an introduction on the DCAT vocabulary, described its evolution and gave an overview on the initial steps to implement it. Read more

FAIR data pipeline

Published:

The Software Sustainability Institute Fellows Community meet regularly to share activities and opportunities, and discuss topics of interest. Read more

teaching

EBI Metagenomics Bioinfomatics Course

Training, EMBL-EBI, 2014

Within the Metagenomics Bioinformatics Course, Eamonn Maguire and I gave a tutorial on “Metagenomic Data Provenance and Management using the ISA infrastructure — overview, implementation patterns & software tools”. Read more