The ISA infrastructure for the biosciences: from data curation at source to the linked data cloud

Date:

I presented this talk in the Conference on Semantics in Healthcare and Life Sciences (CSHALS 2013).

The abstract and slides of the talk is available in the conference programme and also included below.

The ISA infrastructure for the biosciences: from data curation at source to the linked data cloud

Experimental metadata is crucial for the ability to share, compare, reproduce, and reuse data produced by biological experiments. The ISAtab format – a tabular format based on the concepts of Investigation/Study/Assay (ISA) – was designed to support the annotation and management of experimental data at source, with focus on multi-omics experiments. The format is accompanied with a set of open-source tools that facilitate compliance with existing checklists and ontologies, production of ISAtab metadata, validation, conversion to other formats, submission to public repositories, among other things. The ISAtab format together with the tools allow for the syntactic interoperability of the data and support the ISA commons, a growing community of international users and public or internal resources powered by one or more components of the ISA metadata tracking framework. The underlying semantics of the ISAtab format is currently left to the interpretation of biologists and/or curators. While this interpretation is assisted by the ontology-based annotations that can be included into the ISAtab files, it is currently not possible to have this information processed by machines, as in the semantic web/linked data approach. In this presentation, we will introduce our ongoing isa2owl effort to transform ISAtab files into an RDF/OWL-based (Resource Description Framework/Web Ontology Language) representation, supporting the semantic interoperability between ISAtab datasets. By using a semantic framework, we aim at: 1. making the ISAtab semantics explicit and machine-processable, 2. exploit the existing ontology-based annotations, 3. augment annotations over the native ISA syntax constructs with new elements anchored in a semantic model extending the Ontology of Biomedical Investigations (OBI) 4. facilitate the understanding and semantic querying of the experimental design 5. facilitate data integration, knowledge discovery and reasoning over ISAtab metadata and associated data. The software architecture of the isa2owl component is engineered to support multiple mappings between the ISA syntax and semantic models. Given a specific mapping, a converter takes ISAtab datasets and produces OWL ontologies, whose Tboxes are given by the mapping and the Aboxes are the ISAtab elements or derived ones. These derived elements result from the analysis of the experimental workflow, as represented in the ISAtab format and the associated graph representation. The implementation relies on the OWLAPI. As a proof of concept, we have performed a mapping between the ISA syntax and a set of interoperable ontologies anchored in the Basic Formal Ontology (BFO) version 1. These ontologies are part of the Open Biological and Biomedical Ontologies (OBO) Foundry and include OBI, the Information Artifact Ontology (IAO) and the Relations Ontology (RO). We will show how this isa2owl transformation allows users to perform richer queries over the experimental data, to link to external resources available in the linked data cloud, and to support knowledge discovery.

CSHALS 2013 from Alejandra Gonzalez-Beltran