Posts by Collection
portfolio
Machine-Actionable Metadata Models
An approach and set of open source software tools to produce machine-actionable and FAIR metadata models. Read more
ExPaNDS
The ambitious ExPaNDS project is a collaboration between 10 national Photon and Neutron Research Infrastructures (PaN RIs) as well as EGI. The project aims to deliver standardised, interoperable, and integrated data sources and data analysis services for Photon and Neutron facilities.
Read more
FAIR vocabularies
Guidelines for FAIR vocabularies Read more
DataGateway - a portal for large-scale facilities data
Data discovery and access for large-scale science facilities. Read more
FAIR-impact
Expanding FAIR Solutions across EOSC.
Read more
OperationsGateway
Accessing operational data from large-scale facilities Read more
publications
The FAIR Guiding Principles for scientific data management and stewardship
Published in Scientific Data, 2016
This is the first formalisation of the FAIR guiding principes for data management and stewardship, which aim at making data Findable, Accessible, Interoperable and Reusable (FAIR). Read more
Recommended citation: Wilkinson, Mark D. and Dumontier, Michel and Aalbersberg, IJsbrand Jan and Appleton, Gabrielle and Axton, Myles and Baak, Arie and Blomberg, Niklas and Boiten, Jan-Willem and da Silva Santos, Luiz Bonino and Bourne, Philip E. and Bouwman, Jildau and Brookes, Anthony J. and Clark, Tim and Crosas, Mercè and Dillo, Ingrid and Dumon, Olivier and Edmunds, Scott and Evelo, Chris T. and Finkers, Richard and Gonzalez-Beltran, Alejandra and Gray, Alasdair J. G. and Groth, Paul and Goble, Carole and Grethe, Jeffrey S. and Heringa, Jaap and ’t Hoen, Peter A. C and Hooft, Rob and Kuhn, Tobias and Kok, Ruben and Kok, Joost and Lusher, Scott J. and Martone, Maryann E. and Mons, Albert and Packer, Abel L. and Persson, Bengt and Rocca-Serra, Philippe and Roos, Marco and van Schaik, Rene and Sansone, Susanna-Assunta and Schultes, Erik and Sengstag, Thierry and Slater, Ted and Strawn, George and Swertz, Morris A. and Thompson, Mark and van der Lei, Johan and van Mulligen, Erik and Velterop, Jan and Waagmeester, Andra and Wittenburg, Peter and Wolstencroft, Katherine and Zhao, Jun and Mons, Barend. "The FAIR Guiding Principles for scientific data management and stewardship", Scientific Data, https://doi.org/10.1038/sdata.2016.18 https://doi.org/10.1038/sdata.2016.18
Data discovery with DATS: exemplar adoptions and lessons learned
Published in Journal of the American Medical Informatics Association, 2017
This paper analyses the implementation of the DATS model for data discovery in a set of exemplar data sources Read more
Recommended citation: Alejandra N Gonzalez-Beltran, John Campbell, Patrick Dunn, Diana Guijarro, Sanda Ionescu, Hyeoneui Kim, Jared Lyle, Jeffrey Wiser, Susanna-Assunta Sansone, Philippe Rocca-Serra. "Data discovery with DATS: exemplar adoptions and lessons learned" Journal of the American Medical Informatics Association, Volume 25, Issue 1, 1 January 2018, Pages 13–16, https://doi.org/10.1093/jamia/ocx119 https://doi.org/10.1093/jamia/ocx119
PhenoMeNal: processing and analysis of metabolomics data in the cloud
Published in GigaScience, 2018
This paper PhenoMeNal provides a cloud e-infrastructures solution to analyse metabolomics data. It provides easy-to-use web interfaces that can be scaled to any custom public and private cloud environment.. Read more
Recommended citation: Kristian Peters, James Bradbury, Sven Bergmann, Marco Capuccini, Marta Cascante, Pedro de Atauri, Timothy M D Ebbels, Carles Foguet, Robert Glen, Alejandra Gonzalez-Beltran, Ulrich L Günther, Evangelos Handakas, Thomas Hankemeier, Kenneth Haug, Stephanie Herman, Petr Holub, Massimiliano Izzo, Daniel Jacob, David Johnson, Fabien Jourdan, Namrata Kale, Ibrahim Karaman, Bita Khalili, Payam Emami Khonsari, Kim Kultima, Samuel Lampa, Anders Larsson, Christian Ludwig, Pablo Moreno, Steffen Neumann, Jon Ander Novella, Claire O'Donovan, Jake T M Pearce, Alina Peluso, Marco Enrico Piras, Luca Pireddu, Michelle A C Reed, Philippe Rocca-Serra, Pierrick Roger, Antonio Rosato, Rico Rueedi, Christoph Ruttkies, Noureddin Sadawi, Reza M Salek, Susanna-Assunta Sansone, Vitaly Selivanov, Ola Spjuth, Daniel Schober, Etienne A Thévenot, Mattia Tomasoni, Merlijn van Rijswijk, Michael van Vliet, Mark R Viant, Ralf J M Weber, Gianluigi Zanetti, Christoph Steinbeck; PhenoMeNal: processing and analysis of metabolomics data in the cloud, GigaScience, Volume 8, Issue 2, 1 February 2019, giy149, [https://doi.org/10.1093/gigascience/giy149](https://doi.org/10.1093/gigascience/giy149) https://doi.org/10.1093/gigascience/giy149
Discovering Data Access and Use Requirements Using the Data Tag Suite (DATS)
Published in bioRxiv, 2019
This paper is about the representation of data access and data use requirements for the Data Tag Suite (DATS) model. Read more
Recommended citation: Discovering Data Access and Use Requirements Using the Data Tag Suite (DATS) Model George Alter, Alejandra Gonzalez-Beltran, Lucila Ohno-Machado, Philippe Rocca-Serra bioRxiv 518571; doi: https://doi.org/10.1101/518571 https://doi.org/10.1101/518571
The FAIR Funder pilot programme to make it easy for funders to require and for grantees to produce FAIR Data
Published in arXiv, 2019
The FAIR Funders design envisions a data-management workflow having seven essential stages, where solution providers are openly invited to participate. The initial pilot programme will launch using existing computer-based tools of those who attended the M4M Workshop. Read more
Recommended citation: https://arxiv.org/abs/1902.11162
Interoperable and scalable data analysis with microservices: applications in metabolomics
Published in Bioinformatics, 2019
Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Read more
Recommended citation: Payam Emami Khoonsari, Pablo Moreno, Sven Bergmann, Joachim Burman, Marco Capuccini, Matteo Carone, Marta Cascante, Pedro de Atauri, Carles Foguet, Alejandra N Gonzalez-Beltran, Thomas Hankemeier, Kenneth Haug, Sijin He, Stephanie Herman, David Johnson, Namrata Kale, Anders Larsson, Steffen Neumann, Kristian Peters, Luca Pireddu, Philippe Rocca-Serra, Pierrick Roger, Rico Rueedi, Christoph Ruttkies, Noureddin Sadawi, Reza M Salek, Susanna-Assunta Sansone, Daniel Schober, Vitaly Selivanov, Etienne A Thévenot, Michael van Vliet, Gianluigi Zanetti, Christoph Steinbeck, Kim Kultima, Ola Spjuth, Interoperable and scalable data analysis with microservices: applications in metabolomics, Bioinformatics, , btz160, https://doi.org/10.1093/bioinformatics/btz160 https://doi.org/10.1093/bioinformatics/btz160
FAIRsharing as a community approach to standards, repositories and policies
Published in Nature Biotechnology, 2019
Recommended citation: FAIRsharing as a community approach to standards, repositories and policies Susanna-Assunta Sansone, Peter McQuilton, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Massimiliano Izzo, Allyson L. Lister, Milo Thurston & the FAIRsharing Community Nat Biotechnol. 2019 Apr;37(4):358-367. doi: https://doi.org/10.1038/s41587-019-0080-8 https://doi.org/10.1038/s41587-019-0080-8
Software Citation Implementation Challenges
Published in arXiv, 2019
The purpose of this document is to provide an explanation of current issues impacting scholarly attribution of research software, organize updated implementation guidance, and identify where best practices and solutions are still needed. Read more
Recommended citation: https://arxiv.org/abs/1905.08674v1
Software Citation Checklist for Authors
Published in zenodo, 2019
This document provides a simple, generic checklist that authors of academic work (papers, books, conference abstracts, blog posts, etc.) can use to ensure they are following good practice when referencing and citing software they have used, both created by themselves for their research as well as obtained from other sources. It may also be used and adapted by journal editors, publishers and conference chairs as the basis of more specific guidance for their contributors and reviewers. Read more
Recommended citation: Chue Hong, Neil P., Allen, Alice, Gonzalez-Beltran, Alejandra, de Waard, Anita, Smith, Arfon M., Robinson, Carly, … Pollard, Tom. (2019, October 15). Software Citation Checklist for Authors (Version 0.9.0). Zenodo. http://doi.org/10.5281/zenodo.3479199 https://doi.org/10.5281/zenodo.3479199
Software Citation Checklist for Developers
Published in zenodo, 2019
This document provides a minimal, generic checklist that developers of software (either open or closed source) used in research can use to ensure they are following good practice around software citation. This will help developers get credit for the software they create, and improve transparency, reproducibility, and reuse. Read more
Recommended citation: Chue Hong, Neil P., Allen, Alice, Gonzalez-Beltran, de Waard, Anita, Smith, Arfon M., Robinson, Carly, Jones, Catherine, Bouquin, Daina, Katz, Daniel S., Kennedy, David, Ryder, Gerry, Hausman, Jessica, Hwang, Lorraine, Jones, Matthew B., Harrison, Melissa, Crosas, Mercè, Wu, Mingfang, Löwe, Peter, Haines, Robert, … Pollard, Tom. (2019). Software Citation Checklist for Developers (0.9.0). Zenodo. https://doi.org/10.5281/zenodo.3482769 https://doi.org/10.5281/zenodo.3482769
Special Issue on Scholarly Data Analysis (Semantics, Analytics, Visualisation)
Published in Data Science Journal, 2019
The increasing interest in analysing, describing, and improving the research process requires the development of new forms of scholarly data publication and analysis that integrates lessons and approaches from the field of Semantic Technologies, Science of Science, Digital Libraries, and Artificial Intelligence. This editorial summarises the content of the Special Issue on Scholarly Data Analysis (Semantics, Analytics, Visualisation), which aims to showcase some of the most interesting research efforts in the field. This issue includes an extended version of the best papers of the last two editions of the “Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination” (SAVE-SD 2017 and 2018) workshop at The Web Conference. Read more
Recommended citation: https://content.iospress.com/journals/data-science/2/1-2
The Data Tags Suite (DATS) model for discovering data access and use requirements
Published in GigaScience journal, 2020
This paper is about the representation of data access and data use requirements for the Data Tag Suite (DATS) model. Read more
Recommended citation: George Alter, Alejandra Gonzalez-Beltran, Lucila Ohno-Machado, Philippe Rocca-Serra, The Data Tags Suite (DATS) model for discovering data access and use requirements, GigaScience, Volume 9, Issue 2, February 2020, giz165, https://doi.org/10.1093/gigascience/giz165 https://doi.org/10.1101/518571
PaNOSC FAIR Research Data Policy framework
Published in Zenodo, 2020
Recommended citation: Gotz, Andy, Perrin, Jean-Francois, Fangohr, Hans, Salvat, Daniel, Gliksohn, Florian, Markvardsen, Anders, … Matthews, Brian. (2020). PaNOSC FAIR Research Data Policy framework (Version 1.1). Zenodo. https://doi.org/10.5281/zenodo.3738497
COPO: a metadata platform for brokering FAIR data in the life sciences
Published in F1000, 2020
COPO is a computational system that attempts to address some of these challenges by enabling scientists to describe their research objects (raw or processed data, publications, samples, images, etc.) using community-sanctioned metadata sets and vocabularies, and then use public or institutional repositories to share them with the wider scientific community. Read more
Recommended citation: Shaw F, Etuk A, Minotto A et al. COPO: a metadata platform for brokering FAIR data in the life sciences [version 1; peer review: awaiting peer review]. F1000Research 2020, 9:495 https://doi.org/10.12688/f1000research.23889.1
RDA COVID-19; Recommendations and Guidelines on Data Sharing, Final release 30 June 2020
Published in Research Data Alliance, 2020
This is the final version of the Recommendations and Guidelines from the RDA COVID- 19 Working Group, and has been endorsed through the official RDA process. Read more
Recommended citation: RDA COVID-19 Working Group. Recommendations and Guidelines on data sharing. Research Data Alliance. 2020. DOI: https://doi.org/10.15497/rda00052 https://doi.org/10.15497/rda00052
Draft extended data policy framework for Photon and Neutron RIs
Published in Zenodo, 2020
We review the FAIR data policy landscape at European and national levels, consider the current state of data policy adoption and implementation at ExPaNDS partner facilities, and examine existing FAIR ecosystem data policy recommendations, in particular, from the Turning FAIR into reality report and the recent FAIRsFAIR Deliverable 3.3: Policy enhancement recommendations In response, we make twenty-six recommendations of our own that serve to translate these recommendations to the local level of photon and neutron research infrastructures. Read more
Recommended citation: Matthews, Brian, McBirnie, Abigail, Vukolov, Andrei, Ashton, Alun, Collins, Stephen, Da Graca Ramos, Sylvie, Gagey, Brigitte, Gonzalez-Beltran, Alejandra, Johnsson, Maria, Krahl, Rolf, Ounsy, Majid and Van Daalen, Mirjam 2020. Draft extended data policy framework for Photon and Neutron RIs. Zenodo https://doi.org/10.5281/zenodo.4014811
Report on status, gap analysis and roadmap towards harmonised and federated metadata catalogues for EU national Photon and Neutron RIs
Published in Zenodo, 2020
The ExPaNDS project aims at deploying into EOSC Data Catalogues and data analysis services. This document describes the status, a gap analysis, and a roadmap required to achieve harmonised and federated (meta)data catalogues within EOSC of the participating national Photon and Neutron (PaN) Research Infrastructures (RIs). Read more
Recommended citation: Ashton, Alun, Da Graca Ramos, Sylvie & Gonzalez-Beltran, Alejandra. Report on status, gap analysis and roadmap towards harmonised and federated metadata catalogues for EU national Photon and Neutron RIs. (Zenodo, 2020). doi:[10.5281/zenodo.4146819](https://doi.org/10.5281/zenodo.4146819) https://doi.org/10.5281/zenodo.4146819
Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group
Published in Wellcome Open Research, 2020
The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. Read more
Recommended citation: Austin CC, Bernier A, Bezuidenhout L et al. Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group [version 2; peer review: 1 approved, 2 approved with reservations]. Wellcome Open Res 2021, 5:267 (https://doi.org/10.12688/wellcomeopenres.16378.2) https://doi.org/10.12688/wellcomeopenres.16378.1
Ten Simple Rules for making a vocabulary FAIR
Published in arXiv, 2020
We present ten simple rules that support converting a legacy vocabulary – a list of terms available in a print-based glossary or table not accessible using web standards – into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a distinct IRI for each term or concept. A standard representation of the concept should be returned when the individual IRI is de-referenced, using SKOS or OWL serialised in an RDF-based representation for machine-interchange, or in a web-page for human consumption. Guidelines for vocabulary and item metadata are provided, as well as development and maintenance considerations. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration. Read more
Recommended citation: Simon J D Cox, Alejandra N Gonzalez-Beltran, Barbara Magagna, Maria-Cristina Marinescu. "Ten Simple Rules for making a vocabulary FAIR" https://arxiv.org/abs/2012.02325 https://arxiv.org/abs/2012.02325
Draft recommendations for FAIR Photon and Neutron Data Management
Published in Zenodo, 2020
This a draft for the FAIR Photon and Neutron Data Management, which is a deliverable for the EU ExPaNDS project Read more
Recommended citation: Salvat, Daniel, Gonzalez-Beltran, Alejandra, Görzig, Heike, Matthews, Brian, McBirnie, Abigail, et al. 2020. Draft recommendations for FAIR Photon and Neutron Data Management. Zenodo. http://doi.org/10.5281/zenodo.4312825. http://doi.org/10.5281/zenodo.4312825
Nine Best Practices for Research Software Registries and Repositories: A Concise Guide
Published in ArXiv, 2020
We present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories. Read more
Recommended citation: https://arxiv.org/abs/2012.13117
ExPaNDS ontologies v1.0
Published in Zenodo, 2021
We present ontologies for the domain of photon and neutron (PaN) science. With the primary goal of supporting PaN FAIR data catalogue services, we have developed three ontologies: PaN experimental techniques (PaNET), an ontology of NeXus definitions (NeXusOntology), and a semantic integration ontology for the PaN domain (PaNmapping). The ontologies are presented as initial versions, supported by community development workflows. The work represents deliverable D3.2 of the Horizon 2020 ExPaNDS project. Read more
Recommended citation: Collins, Steve P., da Graça Ramos, Silvia, Iyayi, Daniel, Görzig, Heike, González Beltrán, Alejandra, Ashton, Alun, Egli, Stefan, and Minotti, Carlo, 2021, ExPaNDS ontologies v1.0: Zenodo, doi:10.5281/zenodo.4806026. https://doi.org/10.5281/zenodo.4806026
Radical collaboration during a global health emergency: development of the RDA COVID-19 data sharing recommendations and guidelines
Published in Open Research Europe, 2021
The purpose of the present work was to explore how the RDA succeeded in engaging the participation of its community of scientists in a rapid response to the EC request. The three constructs of radical collaboration (inclusiveness, distributed digital practices, productive and sustainable collaboration) were found to be well supported in both the quantitative and qualitative analyses of the survey data. Other social factors, such as motivation and group identity were also found to be important to the success of this extreme collaborative effort. Read more
Recommended citation: Pickering B, Biro T, Austin CC et al. Radical collaboration during a global health emergency: development of the RDA COVID-19 data sharing recommendations and guidelines [version 1; peer review: awaiting peer review]. Open Research Europe 2021, 1:69 (https://doi.org/10.12688/openreseurope.13369.1) https://doi.org/10.12688/openreseurope.13369.1
Ten Simple Rules for making a vocabulary FAIR
Published in PLoS Computational Biology, 2021
We present ten simple rules that support converting a legacy vocabulary—a list of terms available in a print-based glossary or in a table not accessible using web standards—into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a globally unique resolvable identifier for each term or concept. A standard representation of the concept should be returned when the individual web identifier is resolved, using SKOS or OWL serialised in an RDF-based representation for machine-interchange and in a web-page for human consumption. Guidelines for vocabulary and term metadata are provided, as well as development and maintenance considerations. The rules are arranged as a stepwise recipe for creating a FAIR vocabulary based on the legacy vocabulary. By following these rules you can achieve the outcome of converting a legacy vocabulary into a standalone FAIR vocabulary, which can be used for unambiguous data annotation. In turn, this increases data interoperability and enables data integration. Read more
Recommended citation: Cox SJD, Gonzalez-Beltran AN, Magagna B, Marinescu MC (2021) Ten simple rules for making a vocabulary FAIR. PLOS Computational Biology 17(6): e1009041. https://doi.org/10.1371/journal.pcbi.1009041 https://doi.org/10.1371/journal.pcbi.1009041
ExPaNDS Metadata Catalogue Release
Published in Zenodo, 2021
This document presents the milestone achieved for a metadata catalogue release in the domain of photon and neutron (PaN) science. With the primary goal of supporting PaN FAIR data catalogue services, we have developed a self-contained, stand-alone metadata catalogue release that facilities can download to test/try and play with. Read more
Recommended citation: Minotti, Carlo, da Graca Ramos, Silvia, Ashton, Alun, Egli, Stephan, Bolmsten, Fredrik, Johansson, Henrik, Novelli, Massimiliano, Gonzalez-Beltran, Alejandra, & Pullinger, Stuart. (2021). ExPaNDS Metadata Catalogue Release. Zenodo. https://doi.org/10.5281/zenodo.5205909 https://doi.org/10.5281/zenodo.5205909
FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows
Published in ArXiV, 2021
Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of “following the science” are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline developed during the COVID-19 pandemic that allows easy annotation of data as they are consumed by analyses, while tracing the provenance of scientific outputs back through the analytical source code to data sources. Such a tool provides a mechanism for the public, and fellow scientists, to better assess the trust that should be placed in scientific evidence, while allowing scientists to support policy-makers in openly justifying their decisions. We believe that tools such as this should be promoted for use across all areas of policy-facing research. Read more
Recommended citation: Sonia Natalie Mitchell, Andrew Lahiff, Nathan Cummings, Jonathan Hollocombe, Bram Boskamp, Dennis Reddyhoff, Ryan Field, Kristian Zarebski, Antony Wilson, Martin Burke, Blair Archibald, Paul Bessell, Richard Blackwell, Lisa A Boden, Alys Brett, Sam Brett, Ruth Dundas, Jessica Enright, Alejandra N. Gonzalez-Beltran, Claire Harris, Ian Hinder, Christopher David Hughes, Martin Knight, Vino Mano, Ciaran McMonagle, Dominic Mellor, Sibylle Mohr, Glenn Marion, Louise Matthews, Iain J. McKendrick, Christopher Mark Pooley, Thibaud Porphyre, Aaron Reeves, Edward Townsend, Robert Turner, Jeremy Walton, Richard Reeve. "FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows" https://arxiv.org/abs/2110.07117 https://arxiv.org/abs/2110.07117
FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows
Published in Phil. Trans. R. Soc. A, 2022
Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’. Read more
Recommended citation: Sonia Natalie Mitchell, Andrew Lahiff, Nathan Cummings, Jonathan Hollocombe, Bram Boskamp, Dennis Reddyhoff, Ryan Field, Kristian Zarebski, Antony Wilson, Martin Burke, Blair Archibald, Paul Bessell, Richard Blackwell, Lisa A Boden, Alys Brett, Sam Brett, Ruth Dundas, Jessica Enright, Alejandra N. Gonzalez-Beltran, Claire Harris, Ian Hinder, Christopher David Hughes, Martin Knight, Vino Mano, Ciaran McMonagle, Dominic Mellor, Sibylle Mohr, Glenn Marion, Louise Matthews, Iain J. McKendrick, Christopher Mark Pooley, Thibaud Porphyre, Aaron Reeves, Edward Townsend, Robert Turner, Jeremy Walton, Richard Reeve. "FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows" https://arxiv.org/abs/2110.07117 https://doi.org/10.1098/rsta.2021.0300
The Trail from Data to Policy in the COVID-19 Pandemic and Beyond
Published in Scientific Computing Department, Annual Review 2021-22, 2022
COVID-19 was an event that marked a ‘before’ and ‘after’ in our lives. We will all remember the days when daily statistics on positive cases, hospitalisations and fatalities were reported in each region of the world, providing metrics on the looming situation. And some of those statistics will continue to be collected for the foreseeable future. Read more
Recommended citation: https://www.scd.stfc.ac.uk/SiteAssets/SCD%20Annual%20Review%202021-2022.pdf
Machine actionable metadata models
Published in Nature Scientific Data, 2022
Community-developed minimum information checklists are designed to drive the rich and consistent reporting of metadata, underpinning the reproducibility and reuse of the data. These reporting guidelines, however, are usually in the form of narratives intended for human consumption. Modular and reusable machine-readable versions are also needed. Firstly, to provide the necessary quantitative and verifiable measures of the degree to which the metadata descriptors meet these community requirements, a requirement of the FAIR Principles. Secondly, to encourage the creation of standards-driven templates for metadata authoring, especially when describing complex experiments that require multiple reporting guidelines to be used in combination or extended. We present new functionalities to support the creation and improvements of machine-readable models. We apply the approach to an exemplar set of reporting guidelines in Life Science and discuss the challenges. Our work, targeted to developers of standards and those familiar with standards, promotes the concept of compositional metadata elements and encourages the creation of community-standards which are modular and interoperable from the onset. Read more
Recommended citation: Batista, D., Gonzalez-Beltran, A., Sansone, SA. et al. Machine actionable metadata models. Sci Data 9, 592 (2022). https://doi.org/10.1038/s41597-022-01707-6 https://doi.org/10.1038/s41597-022-01707-6
FAIR-IMPACT M5.3 Semantic artefact assessment methodology
Published in Zenodo, 2023
Semantic artefacts (i.e., ontologies, vocabularies and SKOS taxonomies, among others) define the structure, guide the construction of, and help validate many existing Knowledge Graphs. In the last years, a number of guidelines have been proposed (Poveda-Villalón et al. 2020; Garijo and Poveda-Villalón 2020; Hugo et al. 2020; Le Franc et al., 2022; Xu et al. 2023) to align semantic artefact best practices against the Findable, Accessible, Interoperable and Reusable principles (FAIR principles) (Wilkinson et al. 2016). Based on these guidelines, new validators and assistants have been developed (Garijo et al. 2021; Amdouni et al. 2022a; 2022b) in order to guide users assessing their own semantic artefacts against the FAIR principles. However, different tests are based on different interpretations of the FAIR principles, resulting in different scores and checks for semantic artefacts. To the best of our knowledge, there is no generic methodology grouping the types of tests to perform in semantic artefacts, in order to map existing assessment efforts in a consistent manner. In this document, we propose such a methodology. We do so by taking an ontology development perspective, dividing semantic artefacts into smaller parts (their code, content, ontology metadata, etc.) that can be individually assessed at different stages of their development process. We build on the Linked Open Terms (LOT) methodology (Poveda-Villalón et al. 2022), adding a “FAIR assessment” module, and, for each activity, we validate our approach by mapping to two existing semantic artefact FAIR assessment validators: FOOPS! (Garijo et al. 2021) and O’FAIRe (Amdouni et al. 2022a; 2022b). The rest of the document outlines our methodology, describes each step in detail, and maps it to existing FAIR principles and guidelines. Read more
Recommended citation: Garijo, Daniel, Poveda-Villalón, María, Flohr, Pascal, Gonzalez-Beltran, Alejandra, le Franc, Yann, & Verburg, Maaike. (2023). M5.3 Semantic artefact assessment methodology (Version 1). Zenodo. https://doi.org/10.5281/zenodo.8305173 https://doi.org/10.5281/zenodo.8305173
The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake
Published in Data Intelligence 2023, 2023
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besides demonstrating the fitness of DCAT to meet its intended purpose, helped identify existing issues and gaps. Moreover, over the last few years, additional requirements emerged in data catalogs, given the increasing practice of documenting not only datasets but also data services and APIs. This paper illustrates the new version of DCAT, explaining the rationale behind its main revisions and extensions, based on the collected use cases and requirements, and outlines the issues yet to be addressed in future versions of DCAT. Read more
Recommended citation: Riccardo Albertoni, David Browning, Simon Cox, Alejandra N. Gonzalez-Beltran, Andrea Perego, Peter Winstanley; The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake. Data Intelligence 2023; doi: https://doi.org/10.1162/dint_a_00241 https://doi.org/10.1162/dint_a_00241
Moving towards FAIR mappings and crosswalks
Published in FAIR principles for Ontologies and Metadata in Knowledge Management (FOAM) Workshop, Joint Ontology Workshops (JOWO), part of the 14th International Conference on Formal Ontology in Information Systems (FOIS 2024), 2024
Mappings and crosswalks are key elements to ensure semantic interoperability as well as metadata and data integration between different information systems. Designing FAIR compliant systems requires making sure all the elements that constitute the systems are themselves FAIR to support machine-actionability and automation. This paper describes the ongoing European and international effort to build a framework for FAIR Mappings and crosswalks. This framework aims to be generic enough to capture the diverse set of use cases and methodologies across domains and communities. It should be composed of a set of technical recommendations to aid compliance with FAIR principles, a set of models for machine actionable mappings and crosswalks as well as a practical framework with aligned good practices to support the creation of mappings by scientific communities. Developed in the context of FAIR-IMPACT, a Horizon Europe project, this work will be pursued within a more international context as a Research Data Alliance Working Group. Read more
Recommended citation: Jana Martínková, Nick Juty, Alejandra Gonzalez Beltran, Carole Goble and Yann Le Franc. Moving towards FAIR mappings and crosswalks. https://www.utwente.nl/en/eemcs/fois2024/resources/papers/martinkova-et-al-moving-towards-fair-mappings-and-crosswalks.pdf
Published in FAIR principles for Ontologies and Metadata in Knowledge Management (FOAM) Workshop, Joint Ontology Workshops (JOWO), part of the 14th International Conference on Formal Ontology in Information Systems (FOIS 2024), 2024
Ontologies and vocabularies play a key role when standardising, organizing and integrating data from heterogeneous data sources into Knowledge Graphs. In order to develop ontologies, different engineering methodologies have been proposed throughout the years, whose application resulted in thousands of semantic artefacts (taxonomies, vocabularies and ontologies) in a wide range of domains. But how to ensure that ontologies follow the Findable, Accessible, Interoperable and Reusable principles (FAIR) from their inception? In this paper, we review existing guidelines to help make ontologies FAIR and map them to the ontology development lifecycle activities. Our analysis outlines the current gaps, where no guidelines exist for ontologies to become FAIRbyDesign. Read more
Recommended citation: María Poveda-Villalón, Daniel Garijo, Alejandra Gonzalez-Beltran, Clement Jonquet and Yann le Franc. Ontology Engineering and the FAIR principles: A Gap Analysis. https://www.utwente.nl/en/eemcs/fois2024/resources/papers/poveda-villalon-et-al-ontology-engineering-and-the-fair-principles.pdf
service
Research Software London & South East 2019
Venue: The Royal Society, London, Date: 2019
I was a member of the Organising and Programme Committees of the Second Research Software London & South East workshop. Read more
Research of Research Track, European Semantic Web Conference 2019
Venue: Portoroz, Slovenia, Date: 2019
I was a Co-Chair of the Extended Semantic Web Conference (ESWC19) Research of Research: Semantic Representation, Analysis, and Visualization track. Read more
Reproducibility Initiative, International Semantic Web Conference 2019
Venue: Auckland, New Zealand, Date: 2019
I was a Co-Chair of the Reproducibility Track within the International Semantic Web Conference. Read more
Research Software London & South East 2020
Venue: The Royal Society, London, Date: 2020
I am a member of the Organising and Programme Committees of the Second Research Software London & South East workshop. Read more
Research data management for Linked Open Science
Venue: , Date: 2020
I am a member of the Programme Committee of the First Workshop on Research data* management for Linked Open Science - DaMaLOS 2020. Read more
Scientific Knowlege Graphs Workshop
Venue: 24th International conference on Theory and Practice of Digital Libraries, Date: 2020
I am a member of the Programme Committee of the First Scientific Knowledge Graphs Workshop co-located with the 24th International conference on Theory and Practice of Digital Libraries (TPDL). Read more
talks
Community-standards for reproducible and reusable research
Published:
I presented this talk in the Drug Discovery 2012 conference in Manchester, UK. Read more
The ISA infrastructure for the biosciences: from data curation at source to the linked data cloud
Published:
I presented this talk in the Conference on Semantics in Healthcare and Life Sciences (CSHALS 2013). Read more
Embedding underpinning mechanisms for data reuse and reproducibility in bioscience - The ISA examplar behind the PDF
Published:
I presented a talk within the Visions session at the Force11 Beyond the PDF 2 - 2013 conference. Read more
The ISA infrastructure: from experimental planning to data publication and case studies in toxicology
Published:
The ISA infrastructure: from experimental planning to data publication
Read moreBio-GraphIIn: a graph-based, integrative and semantically-enabled repository for life science experimental data
Published:
I presented this talk at the NETTAB 2013 workshop whose topic was “Semantic, Social, and Mobile Applications for Bioinformatics and Biomedical Laboratories”. Read more
What was the plan? A role for data standards, models and computational workflows in scholarly data publishing
Published:
This talk explores how principles derived from experimental design practice, data and computational models can greatly enhance data quality, data generation, data reporting, data publication and data review. For this, I presented a case study on reproducibility that was a collaboration between the GigaScience journal and the ISA-commons, Research Object and Nanopublication communities. You can read more about my presentation in Scott Edmund’s blog post for the GigaScience Journal and see my slides can be found below. Read more
EBI Metagenomics Bioinfomatics Course
Published:
Within the Metagenomics Bioinformatics Course, Eamonn Maguire and I gave a tutorial on “Metagenomic Data Provenance and Management using the ISA infrastructure — overview, implementation patterns & software tools”. Read more
Towards FAIR metadata standards
Published:
I attended the first Metadata for Machines (M4M) workshop organised by the Research Data Alliance and the GO FAIR International Support and Coordination Office. Read more
Machine-Actionable Metadata Models: a toolbox including JSONLDSchema python module and JSONschema-documenter web application
Published:
This talk combines two abstracts that we submitted to the Research Software London & SouthEast 2019 workshop that took place at the Royal Society in London, UK, on 7th February 2019. Read more
Better software + better data = better research
Published:
In April 2019, I gave this talk in CIFASIS. CIFASIS, in Spanish Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas, is a research institute of the Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) or the National Council for Science and Technology of Argentina. Read more
Research Reproducibility
Published:
I was invited to give a talk on “Research Reproducibility” during the “Research Day” of the EPSRC- and MRC-funded Oxford-Nottingham Centre for Doctoral Training in Biomedical Imaging (ONBI CDT). Read more
SciGateway & DataGateway: the portals to facilities science and facilities data
Published:
Our submission was accepted for a full talk at the Research London and South East of England workshop Read more
Towards sustainable software for the muon science computational project
Published:
As part of the Ada Lovelace Centre project “Development of a sustainable and user-friendly software architecture for the Muon Spectroscopy Computational Project”, I was part of the Muon Site Calculation Meeting on 4th September 2020. Read more
Large-scale facilities experimental lifecycle & FAIRness
Published:
I was invited to presented in the ExPaNDS project FAIR workshops. Read more
Data bridges for scientific & official data integration
Published:
I was invited to participate in the United Nations Data Forum 2020. Read more
Guidelines for FAIR vocabularies
Published:
I gave a talk on Guidelines for FAIR vocabularies in the FAIR Convergence Symposium organised by CODATA and GO-FAIR. Read more
Platforms and Tools for Data Management in Large-Scale Facilities
Published:
Presentation delivered at the UK Catalysis Hub virtual workshop on data management. Read more
Good practices and guidelines for semantic interoperability
Published:
I was invited to participate in the United Nations Data Forum 2021. Read more
ISIS Neutron and Muon Source - Scientific Data Management
Published:
This meeting was part of the EU ExPaNDS project and aimed to go through the gap and issue assessment of each facility in detail and discuss about the direction and way forward to help facilities to overcome the difficulties they have with the integration of the metadata catalogues. Read more
Research object citation and cataloguing
Published:
Gemma Poulter and I were invited to talk in the CoSeC@CIUK session on 9th December 2021 about “Research object citation and cataloguing”. This is a session organised by the Computational Science Centre for Research Communities (CoSeC) in Computing Insight UK 2021 (CIUK2021) conference. Read more
The ICAT project: A modular ecosystem of tools for large-scale facilities data management
Published:
I was invited to give a talk in the session “DAPHNE4NFDI: Science driven data management solutions for the user community”, one of the satellite meetings within the European XFEL Users’ Meeting 2022, DESY Photon Science Users Meeting 2022. Read more
Towards FAIR research data management
Published:
I was invited to present in the CECAM workshop on “Machine actionable data for chemical sciences: Bridging experiments, simulations, and machine learning for spectral data” (MADICES), which took place online from 7th to 9th February 2022. Read more
The Data Catalog Vocabulary (DCAT)
Published:
Peter Winstanley from Semantics Arts and I were invited to give a talk for the FAIRsFAIR webinar on “Using DCAT (Data Catalogue Vocabulary) to support metadata catalogue integration”. We gave an introduction on the DCAT vocabulary, described its evolution and gave an overview on the initial steps to implement it. Read more
ExPaNDS - Towards FAIR and open photon and neutron data
Published:
I was an invited speaker to the 12th International Conference on Inelastic X-ray Scattering (IXS 2022), which was held on Oxford, UK in August 2022. Read more
FAIR data pipeline
Published:
The Software Sustainability Institute Fellows Community meet regularly to share activities and opportunities, and discuss topics of interest. Read more
teaching
Grid and Semantic computing for Cancer databases
Supervision of summer undergraduate intern, Department of Computer Science, University College London, 2011
EBI Metagenomics Bioinfomatics Course
Training, EMBL-EBI, 2014
Within the Metagenomics Bioinformatics Course, Eamonn Maguire and I gave a tutorial on “Metagenomic Data Provenance and Management using the ISA infrastructure — overview, implementation patterns & software tools”. Read more
First UK Data Carpentry Workshop
Data Carpentry Workshop, University of Manchester, 2014
This was the first Data Carpentry workshop run in the UK, and was supported by ELIXIR-UK. Read more
The Carpentries - Online Instructor Training for Latin America
Workshop, Online, 2018
I participated as an instructor for The Carpentries online instructor training for Latin America. I taught the session on ‘Workshop Introductions’. Read more
Computational tools for researchers (Herramientas computacionales para investigadores)
Workshop, CIFASIS, 2019
In April 2019, I taught a Carpentries workshop at the Centro Internacional Franco-Argentino de Ciencias de la Información y Sistemas (CIFASIS) in Rosario, Argentina. The workshop was entitled “Computational tools for researchers” (in Spanish: “Herramientas computacionales para investigadores”). Read more