(This post is cross-posted on the UK Software Sustainability Institute blog, the Netherlands eScience Center blog and the US Research Software Sustainability Institute blog.) ReSA’s mission is to bring research software communities together to collaborate on the advancement of research software. Its vision is to have research software recognized and valued as a fundamental and vital component of research worldwide. Given our mission, there are multiple reasons that it’s important for us to understand the landscape of communities that are involved with software, in aspects such as preservation, citation, career paths, productivity, and sustainability. One of these reasons is that ReSA seeks to be a link between these communities, which requires identifying and understanding them. We want to be sure that there aren’t significant community organizations that we don’t know about to involve in our work. Also, identifying where there are gaps will help us create the opportunities and communities of practices as required. When thinking about these communities, it’s clear that in addition to those that focus on software, there are others for which software is just a small part of their interest. Some examples are communities that focus on open science, reproducibility, roles and careers for people who are less visible in research, publishing and review, and other types of scholarly products and digital objects. ReSA also wants to define how we fit and interact with that broader scholarly landscape.

How was this work undertaken?

In September 2019, a ReSA taskforce came together to map the software community landscape, consisting of the authors of this blog. This group distributed a survey to ReSA google group members to identify other groups interested in software. Other useful sources included:

Netherlands eScience Center: Awesome-research-software-registries by Jurriaan Spaaks
eResearch-meeting-list by James Hetherington
International RSE groups by the Research Software Engineering (RSE) Association
Open Science Grassroots Community Networks, a consortium of 120 networks
In which journals should I publish my software? by Neil Chue Hong

The taskforce then met to consider the results and how to analyze them. The ReSA list of research software communities is now publicly available as a living community resource, with the version of this list used by the ReSA taskforce in February 2020 and a copy of this post archived in Zenodo. Suggested additions or corrections are welcome by making comments in the list. Some of the issues we’ve had in assembling this list are:

How much interest in software does an organization need to have to be listed?
When is an organization sufficiently research focused to be included?
What momentum/scale does an organization need to have so that we consider it relevant in the global picture?

On the other hand, once we started adding entries to the list, for many we found that we immediately thought of other similar organizations that should be added. For example, some organizations have a geographic aspect, and this led us to think of other similar organizations with different geographic aspects, such as all the national and regional RSE associations.

What did we learn?

There were a range of interesting outcomes of the analysis:

There are many, many communities that support research software, emphasizing the need for a coordinating organization such as ReSA. The importance of community development is captured in articles such as Community Organizations: Changing the Culture in Which Research Software is Developed and Sustained by Daniel S. Katz et al., which provides an overview of key groups and discusses opportunities to leverage their synergistic activities.
There is an increasing (and wide) range of community initiatives. For example, the Open Science Grassroots Community Networks list has evolved into the Community of Open Scholarship Grassroots Networks (COSGN), whose networks communicate and coordinate on topics of common interest. COSGN has submitted an NSF proposal to formalize governance and coordination of the networks to maximize impact and establish standard practices for sustainability.
The increasing focus on open software makes it hard to separate research and non-research initiatives. As per the points above, it is very hard to define which initiatives are part of the research software community, and which aren’t.
Some organizations that were originally data-centric now include a software focus. For example, the Research Data Alliance now includes the Software Source Code Interest Group, which provides a forum to discuss issues on management, sharing, discovery, archiving, and provenance of software source code.

What are the next steps?

We invite readers to continue to add or make corrections to the ReSA list of research software communities by making comments in the list, which will continue to be curated by ReSA. We are also interested to hear from community members who would like to engage with us in writing a landscape paper based on further analysis and work. This could address questions such as what are the axes that create the space, where do the currently-known organizations fit in the space, and are there gaps where no organization is currently working? We also invite readers to consider involvement in other ReSA activities, including Taskforces.

Conclusion

The ever-growing number of constituents of the research software community both reflects and demonstrates the increasing recognition of research software. The research software community is now a complex ecosystem comprised of a wide variety of organizations and initiatives, some of which are community networks themselves. Collaboration and coordination across these initiatives is important, to enable the broader community to work together to achieve bigger goals. ReSA aims to coordinate across these efforts to leverage investments, to achieve the shared long-term goal of research software valued as a fundamental and vital component of research worldwide. Join the ReSA google group to stay up-to-date on our activities. Read more

Web standards for describing datasets and profiles

6 minute read

Published: February 14, 2019

This is blog post was published on the Software Sustainability Institute’s website. Read more

Interact to Interoperate

less than 1 minute read

Published: December 06, 2018

This is blog post was published on the Software Sustainability Institute’s website. Read more

The #RSE18 Times

5 minute read

Published: September 27, 2018

This is blog post was published on the Software Sustainability Institute’s website Read more

Research Software Engineers and Data Scientists: More in Common

6 minute read

Published: April 05, 2018

This blog post was published in the Software Sustainability Institute’ website, and includes the conclusions of our discussions at the Research Software Engineers for Data Science (RSE4DataScience) meeting at the Alan Turing Institute in London. Read more

So you want to start a data science institute? Achieving sustainability

14 minute read

Published: April 05, 2018

Questions & Answers - Profile of Dr Gonzalez-Beltran for the OeRC 10th anniversary

6 minute read

Published: June 21, 2016

This blog post was published in the Oxford e-Research Centre’s website, University of Oxford, in the occasion of the Centre’s 10th Anniversary. Read more

ISA-explorer: A demo tool for discovering and exploring Scientific Data’s ISA-tab metadata

2 minute read

Published: December 17, 2015

This is blog post was published on the Nature’s Scientific Data blog. Read more

Winner of the ORCID Codefest prize

less than 1 minute read

Published: December 17, 2015

On 23rd May 2013, I participated in the ORCID CodeFest (see also the details in the ORCID website) and won the first prize to attend the ODIN Codefest in CERN. Read more

portfolio

NIH Data Commons

Read more

Machine-Actionable Metadata Models

An approach and set of open source software tools to produce machine-actionable and FAIR metadata models. Read more

ExPaNDS

The ambitious ExPaNDS project is a collaboration between 10 national Photon and Neutron Research Infrastructures (PaN RIs) as well as EGI. The project aims to deliver standardised, interoperable, and integrated data sources and data analysis services for Photon and Neutron facilities.
Read more

FAIR vocabularies

Guidelines for FAIR vocabularies Read more

DataGateway - a portal for large-scale facilities data

Data discovery and access for large-scale science facilities. Read more

FAIR-impact

Expanding FAIR Solutions across EOSC.
Read more

OperationsGateway

Accessing operational data from large-scale facilities Read more

publications

The FAIR Guiding Principles for scientific data management and stewardship

Published in Scientific Data, 2016

This is the first formalisation of the FAIR guiding principes for data management and stewardship, which aim at making data Findable, Accessible, Interoperable and Reusable (FAIR). Read more

Recommended citation: Wilkinson, Mark D. and Dumontier, Michel and Aalbersberg, IJsbrand Jan and Appleton, Gabrielle and Axton, Myles and Baak, Arie and Blomberg, Niklas and Boiten, Jan-Willem and da Silva Santos, Luiz Bonino and Bourne, Philip E. and Bouwman, Jildau and Brookes, Anthony J. and Clark, Tim and Crosas, Mercè and Dillo, Ingrid and Dumon, Olivier and Edmunds, Scott and Evelo, Chris T. and Finkers, Richard and Gonzalez-Beltran, Alejandra and Gray, Alasdair J. G. and Groth, Paul and Goble, Carole and Grethe, Jeffrey S. and Heringa, Jaap and ’t Hoen, Peter A. C and Hooft, Rob and Kuhn, Tobias and Kok, Ruben and Kok, Joost and Lusher, Scott J. and Martone, Maryann E. and Mons, Albert and Packer, Abel L. and Persson, Bengt and Rocca-Serra, Philippe and Roos, Marco and van Schaik, Rene and Sansone, Susanna-Assunta and Schultes, Erik and Sengstag, Thierry and Slater, Ted and Strawn, George and Swertz, Morris A. and Thompson, Mark and van der Lei, Johan and van Mulligen, Erik and Velterop, Jan and Waagmeester, Andra and Wittenburg, Peter and Wolstencroft, Katherine and Zhao, Jun and Mons, Barend. "The FAIR Guiding Principles for scientific data management and stewardship", Scientific Data, https://doi.org/10.1038/sdata.2016.18 https://doi.org/10.1038/sdata.2016.18

Data discovery with DATS: exemplar adoptions and lessons learned

Published in Journal of the American Medical Informatics Association, 2017

This paper analyses the implementation of the DATS model for data discovery in a set of exemplar data sources Read more

Recommended citation: Alejandra N Gonzalez-Beltran, John Campbell, Patrick Dunn, Diana Guijarro, Sanda Ionescu, Hyeoneui Kim, Jared Lyle, Jeffrey Wiser, Susanna-Assunta Sansone, Philippe Rocca-Serra. "Data discovery with DATS: exemplar adoptions and lessons learned" Journal of the American Medical Informatics Association, Volume 25, Issue 1, 1 January 2018, Pages 13–16, https://doi.org/10.1093/jamia/ocx119 https://doi.org/10.1093/jamia/ocx119

PhenoMeNal: processing and analysis of metabolomics data in the cloud

Published in GigaScience, 2018

This paper PhenoMeNal provides a cloud e-infrastructures solution to analyse metabolomics data. It provides easy-to-use web interfaces that can be scaled to any custom public and private cloud environment.. Read more

Recommended citation: Kristian Peters, James Bradbury, Sven Bergmann, Marco Capuccini, Marta Cascante, Pedro de Atauri, Timothy M D Ebbels, Carles Foguet, Robert Glen, Alejandra Gonzalez-Beltran, Ulrich L Günther, Evangelos Handakas, Thomas Hankemeier, Kenneth Haug, Stephanie Herman, Petr Holub, Massimiliano Izzo, Daniel Jacob, David Johnson, Fabien Jourdan, Namrata Kale, Ibrahim Karaman, Bita Khalili, Payam Emami Khonsari, Kim Kultima, Samuel Lampa, Anders Larsson, Christian Ludwig, Pablo Moreno, Steffen Neumann, Jon Ander Novella, Claire O'Donovan, Jake T M Pearce, Alina Peluso, Marco Enrico Piras, Luca Pireddu, Michelle A C Reed, Philippe Rocca-Serra, Pierrick Roger, Antonio Rosato, Rico Rueedi, Christoph Ruttkies, Noureddin Sadawi, Reza M Salek, Susanna-Assunta Sansone, Vitaly Selivanov, Ola Spjuth, Daniel Schober, Etienne A Thévenot, Mattia Tomasoni, Merlijn van Rijswijk, Michael van Vliet, Mark R Viant, Ralf J M Weber, Gianluigi Zanetti, Christoph Steinbeck; PhenoMeNal: processing and analysis of metabolomics data in the cloud, GigaScience, Volume 8, Issue 2, 1 February 2019, giy149, [https://doi.org/10.1093/gigascience/giy149](https://doi.org/10.1093/gigascience/giy149) https://doi.org/10.1093/gigascience/giy149

Discovering Data Access and Use Requirements Using the Data Tag Suite (DATS)

Published in bioRxiv, 2019

This paper is about the representation of data access and data use requirements for the Data Tag Suite (DATS) model. Read more

Recommended citation: Discovering Data Access and Use Requirements Using the Data Tag Suite (DATS) Model George Alter, Alejandra Gonzalez-Beltran, Lucila Ohno-Machado, Philippe Rocca-Serra bioRxiv 518571; doi: https://doi.org/10.1101/518571 https://doi.org/10.1101/518571

The FAIR Funder pilot programme to make it easy for funders to require and for grantees to produce FAIR Data

Published in arXiv, 2019

The FAIR Funders design envisions a data-management workflow having seven essential stages, where solution providers are openly invited to participate. The initial pilot programme will launch using existing computer-based tools of those who attended the M4M Workshop. Read more

Recommended citation: https://arxiv.org/abs/1902.11162

Interoperable and scalable data analysis with microservices: applications in metabolomics

Published in Bioinformatics, 2019

Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Read more

Recommended citation: Payam Emami Khoonsari, Pablo Moreno, Sven Bergmann, Joachim Burman, Marco Capuccini, Matteo Carone, Marta Cascante, Pedro de Atauri, Carles Foguet, Alejandra N Gonzalez-Beltran, Thomas Hankemeier, Kenneth Haug, Sijin He, Stephanie Herman, David Johnson, Namrata Kale, Anders Larsson, Steffen Neumann, Kristian Peters, Luca Pireddu, Philippe Rocca-Serra, Pierrick Roger, Rico Rueedi, Christoph Ruttkies, Noureddin Sadawi, Reza M Salek, Susanna-Assunta Sansone, Daniel Schober, Vitaly Selivanov, Etienne A Thévenot, Michael van Vliet, Gianluigi Zanetti, Christoph Steinbeck, Kim Kultima, Ola Spjuth, Interoperable and scalable data analysis with microservices: applications in metabolomics, Bioinformatics, , btz160, https://doi.org/10.1093/bioinformatics/btz160 https://doi.org/10.1093/bioinformatics/btz160

FAIRsharing as a community approach to standards, repositories and policies

Published in Nature Biotechnology, 2019

Read more

Recommended citation: FAIRsharing as a community approach to standards, repositories and policies Susanna-Assunta Sansone, Peter McQuilton, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Massimiliano Izzo, Allyson L. Lister, Milo Thurston & the FAIRsharing Community Nat Biotechnol. 2019 Apr;37(4):358-367. doi: https://doi.org/10.1038/s41587-019-0080-8 https://doi.org/10.1038/s41587-019-0080-8

Software Citation Implementation Challenges

Published in arXiv, 2019

The purpose of this document is to provide an explanation of current issues impacting scholarly attribution of research software, organize updated implementation guidance, and identify where best practices and solutions are still needed. Read more

Recommended citation: https://arxiv.org/abs/1905.08674v1

Software Citation Checklist for Authors

Published in zenodo, 2019

This document provides a simple, generic checklist that authors of academic work (papers, books, conference abstracts, blog posts, etc.) can use to ensure they are following good practice when referencing and citing software they have used, both created by themselves for their research as well as obtained from other sources. It may also be used and adapted by journal editors, publishers and conference chairs as the basis of more specific guidance for their contributors and reviewers. Read more

Recommended citation: Chue Hong, Neil P., Allen, Alice, Gonzalez-Beltran, Alejandra, de Waard, Anita, Smith, Arfon M., Robinson, Carly, … Pollard, Tom. (2019, October 15). Software Citation Checklist for Authors (Version 0.9.0). Zenodo. http://doi.org/10.5281/zenodo.3479199 https://doi.org/10.5281/zenodo.3479199

Software Citation Checklist for Developers

Published in zenodo, 2019

This document provides a minimal, generic checklist that developers of software (either open or closed source) used in research can use to ensure they are following good practice around software citation. This will help developers get credit for the software they create, and improve transparency, reproducibility, and reuse. Read more

Recommended citation: Chue Hong, Neil P., Allen, Alice, Gonzalez-Beltran, de Waard, Anita, Smith, Arfon M., Robinson, Carly, Jones, Catherine, Bouquin, Daina, Katz, Daniel S., Kennedy, David, Ryder, Gerry, Hausman, Jessica, Hwang, Lorraine, Jones, Matthew B., Harrison, Melissa, Crosas, Mercè, Wu, Mingfang, Löwe, Peter, Haines, Robert, … Pollard, Tom. (2019). Software Citation Checklist for Developers (0.9.0). Zenodo. https://doi.org/10.5281/zenodo.3482769 https://doi.org/10.5281/zenodo.3482769

Special Issue on Scholarly Data Analysis (Semantics, Analytics, Visualisation)

Published in Data Science Journal, 2019

The increasing interest in analysing, describing, and improving the research process requires the development of new forms of scholarly data publication and analysis that integrates lessons and approaches from the field of Semantic Technologies, Science of Science, Digital Libraries, and Artificial Intelligence. This editorial summarises the content of the Special Issue on Scholarly Data Analysis (Semantics, Analytics, Visualisation), which aims to showcase some of the most interesting research efforts in the field. This issue includes an extended version of the best papers of the last two editions of the “Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination” (SAVE-SD 2017 and 2018) workshop at The Web Conference. Read more

The Data Tags Suite (DATS) model for discovering data access and use requirements

Published in GigaScience journal, 2020

This paper is about the representation of data access and data use requirements for the Data Tag Suite (DATS) model. Read more

Recommended citation: George Alter, Alejandra Gonzalez-Beltran, Lucila Ohno-Machado, Philippe Rocca-Serra, The Data Tags Suite (DATS) model for discovering data access and use requirements, GigaScience, Volume 9, Issue 2, February 2020, giz165, https://doi.org/10.1093/gigascience/giz165 https://doi.org/10.1101/518571

PaNOSC FAIR Research Data Policy framework

Published in Zenodo, 2020

Read more

Recommended citation: Gotz, Andy, Perrin, Jean-Francois, Fangohr, Hans, Salvat, Daniel, Gliksohn, Florian, Markvardsen, Anders, … Matthews, Brian. (2020). PaNOSC FAIR Research Data Policy framework (Version 1.1). Zenodo. https://doi.org/10.5281/zenodo.3738497

COPO: a metadata platform for brokering FAIR data in the life sciences

Published in F1000, 2020

COPO is a computational system that attempts to address some of these challenges by enabling scientists to describe their research objects (raw or processed data, publications, samples, images, etc.) using community-sanctioned metadata sets and vocabularies, and then use public or institutional repositories to share them with the wider scientific community. Read more

Recommended citation: Shaw F, Etuk A, Minotto A et al. COPO: a metadata platform for brokering FAIR data in the life sciences [version 1; peer review: awaiting peer review]. F1000Research 2020, 9:495 https://doi.org/10.12688/f1000research.23889.1

RDA COVID-19; Recommendations and Guidelines on Data Sharing, Final release 30 June 2020

Published in Research Data Alliance, 2020

This is the final version of the Recommendations and Guidelines from the RDA COVID- 19 Working Group, and has been endorsed through the official RDA process. Read more

Recommended citation: RDA COVID-19 Working Group. Recommendations and Guidelines on data sharing. Research Data Alliance. 2020. DOI: https://doi.org/10.15497/rda00052 https://doi.org/10.15497/rda00052

Draft extended data policy framework for Photon and Neutron RIs

Published in Zenodo, 2020

We review the FAIR data policy landscape at European and national levels, consider the current state of data policy adoption and implementation at ExPaNDS partner facilities, and examine existing FAIR ecosystem data policy recommendations, in particular, from the Turning FAIR into reality report and the recent FAIRsFAIR Deliverable 3.3: Policy enhancement recommendations In response, we make twenty-six recommendations of our own that serve to translate these recommendations to the local level of photon and neutron research infrastructures. Read more

Recommended citation: Matthews, Brian, McBirnie, Abigail, Vukolov, Andrei, Ashton, Alun, Collins, Stephen, Da Graca Ramos, Sylvie, Gagey, Brigitte, Gonzalez-Beltran, Alejandra, Johnsson, Maria, Krahl, Rolf, Ounsy, Majid and Van Daalen, Mirjam 2020. Draft extended data policy framework for Photon and Neutron RIs. Zenodo https://doi.org/10.5281/zenodo.4014811

Report on status, gap analysis and roadmap towards harmonised and federated metadata catalogues for EU national Photon and Neutron RIs

Published in Zenodo, 2020

The ExPaNDS project aims at deploying into EOSC Data Catalogues and data analysis services. This document describes the status, a gap analysis, and a roadmap required to achieve harmonised and federated (meta)data catalogues within EOSC of the participating national Photon and Neutron (PaN) Research Infrastructures (RIs). Read more

Recommended citation: Ashton, Alun, Da Graca Ramos, Sylvie & Gonzalez-Beltran, Alejandra. Report on status, gap analysis and roadmap towards harmonised and federated metadata catalogues for EU national Photon and Neutron RIs. (Zenodo, 2020). doi:[10.5281/zenodo.4146819](https://doi.org/10.5281/zenodo.4146819) https://doi.org/10.5281/zenodo.4146819

Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group

Published in Wellcome Open Research, 2020

The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. Read more

Recommended citation: Austin CC, Bernier A, Bezuidenhout L et al. Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group [version 2; peer review: 1 approved, 2 approved with reservations]. Wellcome Open Res 2021, 5:267 (https://doi.org/10.12688/wellcomeopenres.16378.2) https://doi.org/10.12688/wellcomeopenres.16378.1

Ten Simple Rules for making a vocabulary FAIR

Published in arXiv, 2020

We present ten simple rules that support converting a legacy vocabulary – a list of terms available in a print-based glossary or table not accessible using web standards – into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a distinct IRI for each term or concept. A standard representation of the concept should be returned when the individual IRI is de-referenced, using SKOS or OWL serialised in an RDF-based representation for machine-interchange, or in a web-page for human consumption. Guidelines for vocabulary and item metadata are provided, as well as development and maintenance considerations. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration. Read more

Recommended citation: Simon J D Cox, Alejandra N Gonzalez-Beltran, Barbara Magagna, Maria-Cristina Marinescu. "Ten Simple Rules for making a vocabulary FAIR" https://arxiv.org/abs/2012.02325 https://arxiv.org/abs/2012.02325

Draft recommendations for FAIR Photon and Neutron Data Management

Published in Zenodo, 2020

This a draft for the FAIR Photon and Neutron Data Management, which is a deliverable for the EU ExPaNDS project Read more

Recommended citation: Salvat, Daniel, Gonzalez-Beltran, Alejandra, Görzig, Heike, Matthews, Brian, McBirnie, Abigail, et al. 2020. Draft recommendations for FAIR Photon and Neutron Data Management. Zenodo. http://doi.org/10.5281/zenodo.4312825. http://doi.org/10.5281/zenodo.4312825

Nine Best Practices for Research Software Registries and Repositories: A Concise Guide

Published in ArXiv, 2020

We present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories. Read more

Recommended citation: https://arxiv.org/abs/2012.13117

ExPaNDS ontologies v1.0

Published in Zenodo, 2021

We present ontologies for the domain of photon and neutron (PaN) science. With the primary goal of supporting PaN FAIR data catalogue services, we have developed three ontologies: PaN experimental techniques (PaNET), an ontology of NeXus definitions (NeXusOntology), and a semantic integration ontology for the PaN domain (PaNmapping). The ontologies are presented as initial versions, supported by community development workflows. The work represents deliverable D3.2 of the Horizon 2020 ExPaNDS project. Read more

Recommended citation: Collins, Steve P., da Graça Ramos, Silvia, Iyayi, Daniel, Görzig, Heike, González Beltrán, Alejandra, Ashton, Alun, Egli, Stefan, and Minotti, Carlo, 2021, ExPaNDS ontologies v1.0: Zenodo, doi:10.5281/zenodo.4806026. https://doi.org/10.5281/zenodo.4806026

Radical collaboration during a global health emergency: development of the RDA COVID-19 data sharing recommendations and guidelines

Published in Open Research Europe, 2021

The purpose of the present work was to explore how the RDA succeeded in engaging the participation of its community of scientists in a rapid response to the EC request. The three constructs of radical collaboration (inclusiveness, distributed digital practices, productive and sustainable collaboration) were found to be well supported in both the quantitative and qualitative analyses of the survey data. Other social factors, such as motivation and group identity were also found to be important to the success of this extreme collaborative effort. Read more

Recommended citation: Pickering B, Biro T, Austin CC et al. Radical collaboration during a global health emergency: development of the RDA COVID-19 data sharing recommendations and guidelines [version 1; peer review: awaiting peer review]. Open Research Europe 2021, 1:69 (https://doi.org/10.12688/openreseurope.13369.1) https://doi.org/10.12688/openreseurope.13369.1

Ten Simple Rules for making a vocabulary FAIR

Published in PLoS Computational Biology, 2021

We present ten simple rules that support converting a legacy vocabulary—a list of terms available in a print-based glossary or in a table not accessible using web standards—into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a globally unique resolvable identifier for each term or concept. A standard representation of the concept should be returned when the individual web identifier is resolved, using SKOS or OWL serialised in an RDF-based representation for machine-interchange and in a web-page for human consumption. Guidelines for vocabulary and term metadata are provided, as well as development and maintenance considerations. The rules are arranged as a stepwise recipe for creating a FAIR vocabulary based on the legacy vocabulary. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration. Read more

Recommended citation: Cox SJD, Gonzalez-Beltran AN, Magagna B, Marinescu MC (2021) Ten simple rules for making a vocabulary FAIR. PLOS Computational Biology 17(6): e1009041. https://doi.org/10.1371/journal.pcbi.1009041 https://doi.org/10.1371/journal.pcbi.1009041

ExPaNDS Metadata Catalogue Release

Published in Zenodo, 2021

This document presents the milestone achieved for a metadata catalogue release in the domain of photon and neutron (PaN) science. With the primary goal of supporting PaN FAIR data catalogue services, we have developed a self-contained, stand-alone metadata catalogue release that facilities can download to test/try and play with. Read more

Recommended citation: Minotti, Carlo, da Graca Ramos, Silvia, Ashton, Alun, Egli, Stephan, Bolmsten, Fredrik, Johansson, Henrik, Novelli, Massimiliano, Gonzalez-Beltran, Alejandra, & Pullinger, Stuart. (2021). ExPaNDS Metadata Catalogue Release. Zenodo. https://doi.org/10.5281/zenodo.5205909 https://doi.org/10.5281/zenodo.5205909

FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

Published in ArXiV, 2021

Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of “following the science” are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline developed during the COVID-19 pandemic that allows easy annotation of data as they are consumed by analyses, while tracing the provenance of scientific outputs back through the analytical source code to data sources. Such a tool provides a mechanism for the public, and fellow scientists, to better assess the trust that should be placed in scientific evidence, while allowing scientists to support policy-makers in openly justifying their decisions. We believe that tools such as this should be promoted for use across all areas of policy-facing research. Read more

Recommended citation: Sonia Natalie Mitchell, Andrew Lahiff, Nathan Cummings, Jonathan Hollocombe, Bram Boskamp, Dennis Reddyhoff, Ryan Field, Kristian Zarebski, Antony Wilson, Martin Burke, Blair Archibald, Paul Bessell, Richard Blackwell, Lisa A Boden, Alys Brett, Sam Brett, Ruth Dundas, Jessica Enright, Alejandra N. Gonzalez-Beltran, Claire Harris, Ian Hinder, Christopher David Hughes, Martin Knight, Vino Mano, Ciaran McMonagle, Dominic Mellor, Sibylle Mohr, Glenn Marion, Louise Matthews, Iain J. McKendrick, Christopher Mark Pooley, Thibaud Porphyre, Aaron Reeves, Edward Townsend, Robert Turner, Jeremy Walton, Richard Reeve. "FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows" https://arxiv.org/abs/2110.07117 https://arxiv.org/abs/2110.07117

FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

Published in Phil. Trans. R. Soc. A, 2022

Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’. Read more

The Trail from Data to Policy in the COVID-19 Pandemic and Beyond

Published in Scientific Computing Department, Annual Review 2021-22, 2022

COVID-19 was an event that marked a ‘before’ and ‘after’ in our lives. We will all remember the days when daily statistics on positive cases, hospitalisations and fatalities were reported in each region of the world, providing metrics on the looming situation. And some of those statistics will continue to be collected for the foreseeable future. Read more

Machine actionable metadata models

Published in Nature Scientific Data, 2022

Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset. Read more

Recommended citation: Batista, D., Gonzalez-Beltran, A., Sansone, SA. et al. Machine actionable metadata models. Sci Data 9, 592 (2022). https://doi.org/10.1038/s41597-022-01707-6 https://doi.org/10.1038/s41597-022-01707-6

FAIR-IMPACT M5.3 Semantic artefact assessment methodology

Published in Zenodo, 2023

Semantic artefacts (i.e., ontologies, vocabularies and SKOS taxonomies, among others) define the structure, guide the construction of, and help validate many existing Knowledge Graphs. In the last years, a number of guidelines have been proposed (Poveda-Villalón et al. 2020; Garijo and Poveda-Villalón 2020; Hugo et al. 2020; Le Franc et al., 2022; Xu et al. 2023) to align semantic artefact best practices against the Findable, Accessible, Interoperable and Reusable principles (FAIR principles) (Wilkinson et al. 2016). Based on these guidelines, new validators and assistants have been developed (Garijo et al. 2021; Amdouni et al. 2022a; 2022b) in order to guide users assessing their own semantic artefacts against the FAIR principles. However, different tests are based on different interpretations of the FAIR principles, resulting in different scores and checks for semantic artefacts. To the best of our knowledge, there is no generic methodology grouping the types of tests to perform in semantic artefacts, in order to map existing assessment efforts in a consistent manner. In this document, we propose such a methodology. We do so by taking an ontology development perspective, dividing semantic artefacts into smaller parts (their code, content, ontology metadata, etc.) that can be individually assessed at different stages of their development process. We build on the Linked Open Terms (LOT) methodology (Poveda-Villalón et al. 2022), adding a “FAIR assessment” module, and, for each activity, we validate our approach by mapping to two existing semantic artefact FAIR assessment validators: FOOPS! (Garijo et al. 2021) and O’FAIRe (Amdouni et al. 2022a; 2022b). The rest of the document outlines our methodology, describes each step in detail, and maps it to existing FAIR principles and guidelines. Read more

Recommended citation: Garijo, Daniel, Poveda-Villalón, María, Flohr, Pascal, Gonzalez-Beltran, Alejandra, le Franc, Yann, & Verburg, Maaike. (2023). M5.3 Semantic artefact assessment methodology (Version 1). Zenodo. https://doi.org/10.5281/zenodo.8305173 https://doi.org/10.5281/zenodo.8305173

The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake

Published in Data Intelligence 2023, 2023

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besides demonstrating the fitness of DCAT to meet its intended purpose, helped identify existing issues and gaps. Moreover, over the last few years, additional requirements emerged in data catalogs, given the increasing practice of documenting not only datasets but also data services and APIs. This paper illustrates the new version of DCAT, explaining the rationale behind its main revisions and extensions, based on the collected use cases and requirements, and outlines the issues yet to be addressed in future versions of DCAT. Read more

Recommended citation: Riccardo Albertoni, David Browning, Simon Cox, Alejandra N. Gonzalez-Beltran, Andrea Perego, Peter Winstanley; The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake. Data Intelligence 2023; doi: https://doi.org/10.1162/dint_a_00241 https://doi.org/10.1162/dint_a_00241

Moving towards FAIR mappings and crosswalks

Published in FAIR principles for Ontologies and Metadata in Knowledge Management (FOAM) Workshop, Joint Ontology Workshops (JOWO), part of the 14th International Conference on Formal Ontology in Information Systems (FOIS 2024), 2024

Mappings and crosswalks are key elements to ensure semantic interoperability as well as metadata and data integration between different information systems. Designing FAIR compliant systems requires making sure all the elements that constitute the systems are themselves FAIR to support machine-actionability and automation. This paper describes the ongoing European and international effort to build a framework for FAIR Mappings and crosswalks. This framework aims to be generic enough to capture the diverse set of use cases and methodologies across domains and communities. It should be composed of a set of technical recommendations to aid compliance with FAIR principles, a set of models for machine actionable mappings and crosswalks as well as a practical framework with aligned good practices to support the creation of mappings by scientific communities. Developed in the context of FAIR-IMPACT, a Horizon Europe project, this work will be pursued within a more international context as a Research Data Alliance Working Group. Read more

Recommended citation: Jana Martínková, Nick Juty, Alejandra Gonzalez Beltran, Carole Goble and Yann Le Franc. Moving towards FAIR mappings and crosswalks. https://www.utwente.nl/en/eemcs/fois2024/resources/papers/martinkova-et-al-moving-towards-fair-mappings-and-crosswalks.pdf

Ontologies and vocabularies play a key role when standardising, organizing and integrating data from heterogeneous data sources into Knowledge Graphs. In order to develop ontologies, different engineering methodologies have been proposed throughout the years, whose application resulted in thousands of semantic artefacts (taxonomies, vocabularies and ontologies) in a wide range of domains. But how to ensure that ontologies follow the Findable, Accessible, Interoperable and Reusable principles (FAIR) from their inception? In this paper, we review existing guidelines to help make ontologies FAIR and map them to the ontology development lifecycle activities. Our analysis outlines the current gaps, where no guidelines exist for ontologies to become FAIRbyDesign. Read more

Recommended citation: María Poveda-Villalón, Daniel Garijo, Alejandra Gonzalez-Beltran, Clement Jonquet and Yann le Franc. Ontology Engineering and the FAIR principles: A Gap Analysis. https://www.utwente.nl/en/eemcs/fois2024/resources/papers/poveda-villalon-et-al-ontology-engineering-and-the-fair-principles.pdf

service

Research Software London & South East 2019

Venue: The Royal Society, London, Date: 2019

I was a member of the Organising and Programme Committees of the Second Research Software London & South East workshop. Read more

Research of Research Track, European Semantic Web Conference 2019

Venue: Portoroz, Slovenia, Date: 2019

I was a Co-Chair of the Extended Semantic Web Conference (ESWC19) Research of Research: Semantic Representation, Analysis, and Visualization track. Read more

Reproducibility Initiative, International Semantic Web Conference 2019

Venue: Auckland, New Zealand, Date: 2019

I was a Co-Chair of the Reproducibility Track within the International Semantic Web Conference. Read more

Research Software London & South East 2020

Venue: The Royal Society, London, Date: 2020

I am a member of the Organising and Programme Committees of the Second Research Software London & South East workshop. Read more

Research data management for Linked Open Science

Venue: , Date: 2020

I am a member of the Programme Committee of the First Workshop on Research data* management for Linked Open Science - DaMaLOS 2020. Read more

Scientific Knowlege Graphs Workshop

Venue: 24th International conference on Theory and Practice of Digital Libraries, Date: 2020

I am a member of the Programme Committee of the First Scientific Knowledge Graphs Workshop co-located with the 24th International conference on Theory and Practice of Digital Libraries (TPDL). Read more

talks

Community-standards for reproducible and reusable research

Published: September 06, 2012

I presented this talk in the Drug Discovery 2012 conference in Manchester, UK. Read more

The ISA infrastructure for the biosciences: from data curation at source to the linked data cloud

Published: February 27, 2013

I presented this talk in the Conference on Semantics in Healthcare and Life Sciences (CSHALS 2013). Read more

Embedding underpinning mechanisms for data reuse and reproducibility in bioscience - The ISA examplar behind the PDF

Published: March 20, 2013

I presented a talk within the Visions session at the Force11 Beyond the PDF 2 - 2013 conference. Read more

The ISA infrastructure: from experimental planning to data publication and case studies in toxicology

Published: March 20, 2013

The ISA infrastructure: from experimental planning to data publication

Read more

Bio-GraphIIn: a graph-based, integrative and semantically-enabled repository for life science experimental data

Published: October 16, 2013

I presented this talk at the NETTAB 2013 workshop whose topic was “Semantic, Social, and Mobile Applications for Bioinformatics and Biomedical Laboratories”. Read more

What was the plan? A role for data standards, models and computational workflows in scholarly data publishing

Published: July 15, 2014

This talk explores how principles derived from experimental design practice, data and computational models can greatly enhance data quality, data generation, data reporting, data publication and data review. For this, I presented a case study on reproducibility that was a collaboration between the GigaScience journal and the ISA-commons, Research Object and Nanopublication communities. You can read more about my presentation in Scott Edmund’s blog post for the GigaScience Journal and see my slides can be found below. Read more

EBI Metagenomics Bioinfomatics Course

Published: September 01, 2014

Within the Metagenomics Bioinformatics Course, Eamonn Maguire and I gave a tutorial on “Metagenomic Data Provenance and Management using the ISA infrastructure — overview, implementation patterns & software tools”. Read more

Towards FAIR metadata standards

Published: October 15, 2018

I attended the first Metadata for Machines (M4M) workshop organised by the Research Data Alliance and the GO FAIR International Support and Coordination Office. Read more

Machine-Actionable Metadata Models: a toolbox including JSONLDSchema python module and JSONschema-documenter web application

Published: February 07, 2019

This talk combines two abstracts that we submitted to the Research Software London & SouthEast 2019 workshop that took place at the Royal Society in London, UK, on 7th February 2019. Read more

Better software + better data = better research

Published: April 23, 2019

In April 2019, I gave this talk in CIFASIS. CIFASIS, in Spanish Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, is a research institute of the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) or the National Council for Science and Technology of Argentina. Read more

Research Reproducibility

Published: December 06, 2019

I was invited to give a talk on “Research Reproducibility” during the “Research Day” of the EPSRC- and MRC-funded Oxford-Nottingham Centre for Doctoral Training in Biomedical Imaging (ONBI CDT). Read more

SciGateway & DataGateway: the portals to facilities science and facilities data

Published: February 06, 2020

Our submission was accepted for a full talk at the Research London and South East of England workshop Read more

Towards sustainable software for the muon science computational project

Published: September 04, 2020

As part of the Ada Lovelace Centre project “Development of a sustainable and user-friendly software architecture for the Muon Spectroscopy Computational Project”, I was part of the Muon Site Calculation Meeting on 4th September 2020. Read more

Large-scale facilities experimental lifecycle & FAIRness

Published: October 02, 2020

I was invited to presented in the ExPaNDS project FAIR workshops. Read more

Data bridges for scientific & official data integration

Published: October 16, 2020

I was invited to participate in the United Nations Data Forum 2020. Read more

Guidelines for FAIR vocabularies

Published: December 02, 2020

I gave a talk on Guidelines for FAIR vocabularies in the FAIR Convergence Symposium organised by CODATA and GO-FAIR. Read more

Platforms and Tools for Data Management in Large-Scale Facilities

Published: June 10, 2021

Presentation delivered at the UK Catalysis Hub virtual workshop on data management. Read more

Good practices and guidelines for semantic interoperability

Published: October 04, 2021

I was invited to participate in the United Nations Data Forum 2021. Read more

ISIS Neutron and Muon Source - Scientific Data Management

Published: December 07, 2021

This meeting was part of the EU ExPaNDS project and aimed to go through the gap and issue assessment of each facility in detail and discuss about the direction and way forward to help facilities to overcome the difficulties they have with the integration of the metadata catalogues. Read more

Research object citation and cataloguing

Published: December 09, 2021

Gemma Poulter and I were invited to talk in the CoSeC@CIUK session on 9th December 2021 about “Research object citation and cataloguing”. This is a session organised by the Computational Science Centre for Research Communities (CoSeC) in Computing Insight UK 2021 (CIUK2021) conference. Read more

The ICAT project: A modular ecosystem of tools for large-scale facilities data management

Published: January 24, 2022

I was invited to give a talk in the session “DAPHNE4NFDI: Science driven data management solutions for the user community”, one of the satellite meetings within the European XFEL Users’ Meeting 2022, DESY Photon Science Users Meeting 2022. Read more

Towards FAIR research data management

Published: February 07, 2022

I was invited to present in the CECAM workshop on “Machine actionable data for chemical sciences: Bridging experiments, simulations, and machine learning for spectral data” (MADICES), which took place online from 7th to 9th February 2022. Read more

The Data Catalog Vocabulary (DCAT)

Published: February 17, 2022

Peter Winstanley from Semantics Arts and I were invited to give a talk for the FAIRsFAIR webinar on “Using DCAT (Data Catalogue Vocabulary) to support metadata catalogue integration”. We gave an introduction on the DCAT vocabulary, described its evolution and gave an overview on the initial steps to implement it. Read more

ExPaNDS - Towards FAIR and open photon and neutron data

Published: August 26, 2022

I was an invited speaker to the 12th International Conference on Inelastic X-ray Scattering (IXS 2022), which was held on Oxford, UK in August 2022. Read more

FAIR data pipeline

Published: March 29, 2023

The Software Sustainability Institute Fellows Community meet regularly to share activities and opportunities, and discuss topics of interest. Read more

In April 2019, I taught a Carpentries workshop at the Centro Internacional Franco-Argentino de Ciencias de la Información y Sistemas (CIFASIS) in Rosario, Argentina. The workshop was entitled “Computational tools for researchers” (in Spanish: “Herramientas computacionales para investigadores”). Read more

Alejandra Gonzalez-Beltran

Sitemap

Pages

Posts

How was this work undertaken?

What did we learn?

What are the next steps?

Conclusion

portfolio

publications

service

talks

The ISA infrastructure: from experimental planning to data publication

teaching