Questions & Answers - Profile of Dr Gonzalez-Beltran for the OeRC 10th anniversary

6 minute read

Published: June 21, 2016

This blog post was published in the Oxford e-Research Centre’s website, University of Oxford, in the occasion of the Centre’s 10th Anniversary.

Dr Alejandra Gonzalez-Beltran, Research Lecturer, talks about her recent Kellogg College Junior Research Fellowship, reporting the statistical methods used in data analysis and the technical challenges around data stewardship.

When did you start at the Centre and what was your first role here?

I joined the Centre in June 2012 as a Senior Research Software Engineer. Now, I am a Research Lecturer. From the start, I have been involved projects related to enhancing and extending the Investigation/Study/Assay (ISA) infrastructure for tracking metadata about biological experiments. Furthermore, I have also contributed to the BioSharing portal of standards, databases and policies in the life, environmental and biomedical sciences *(renamed FAIRsharing since this blog post was published).

What is your background?

I am a computer scientist: after a degree in Computer Science from the Universidad Nacional de Rosario, Argentina (equivalent to BSc + MSc), I was awarded a PhD in Computer Science from Queen’s University Belfast, UK. My PhD work was about efficient ways to access information on a distributed network using probabilistic data structures. Then I worked as a post-doctoral researcher at University College London, UK, collaborating with the UK National Cancer Research Institute, the US National Cancer Institute, the UCL Cancer Institute and others, on methods to find and integrate distributed cancer data as well as best practices to record therapy experiments information.

Summarise the research you are doing / your research interests in a few sentences.

My research interests involve applying Computer Science methodologies to applications in the life, environmental and biomedical domains. In particular, I develop models, methods and software tools for data curation, data discovery, knowledge management, data publication, data analysis looking at enabling data sharing, data re-use and reproducible research.

Why is this important (to the scientific community / the world at large)?

Technological advances have propelled data generation to levels previously unimaginable. Managing the generated data in an efficient way is paramount for the advancement of science, as it would help to avoid duplication of efforts, would enable data sharing and improve data re-use, and would support reproducibility, which requires a detailed description of the methods used, as well as the availability of the software tools yielding the results.

I collaborate with scientists, technologists, service providers, journals and communities developing standards that support data sharing, interoperability, re-use and reproducibility. I hope to contribute to innovative ways of enhancing scholarly communication and the ways in which all the outcomes of research are made available as I believe it would accelerate science and discoveries.

What would you like to do next, funding permitting?

I would like to explore further the issues around reporting the statistical methods used in data analysis and how these relate to the quality of the data produced. For this purpose, I want to apply the STATistical Ontology (STATO) we have built to help reporting and reason over (logically connect) the statistical methods applied in different experiments.

Some questions that the ontology can answer:

Image of STATO questions

Image of STATO answers

Are you involved in any wider collaborations? Why are these important?

I am involved in multiple projects with collaborations in the UK, Europe, the US, China and beyond. These collaborations are crucial for the work we do, as making the data FAIR (Findable, Accessible, Interoperable and Reusable) requires the wide adoption of standards, which per definition have to be agreed by large communities of practitioners and all stakeholders.

What publication /paper are you most proud of and why?

I have published papers on the software tools I have worked on, such as Risa (a tool that bridges the information about an experiment and the data analysis using the R language) and linkedISA (a tool that converts experiments described using spreadsheets to linked data, allowing to search and establish connections between datasets).

These papers are important, but it is also great to apply these tools to specific use cases, demonstrating how the can help in making data reproducible. So, I would highlight a paper for which we collaborated with other researchers and publishers and made recommendations on how to report results to move “From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics”.

Image of Peer-Reproduced Paper

Have you received any awards or fellowships?

I have recently been elected to be a Junior Research Fellow at Kellogg College, which will start in Michaelmas Term (October 2016). This is a great opportunity to increase the links between the Centre and Kellogg College, promote my research and potentially create new collaborations. In addition, I hope to use the fellowship to raise the profile of women in Science, Technology, Engineering and Mathematics by organising seminars and outreach activities.

I have also received a Lockey grant to fund my trip to the “Semantics, Analytics, Visualisation: Enhancing Scholarly Data” workshop, which I co-chaired, and the 25th International World Wide Web Conference, which took place in Montreal, Canada, in April 2016.

Previously, I won the ORCID codefest that took place at Oxford, UK, on May 2013 and as a prize, got invited to participate in the ORCID and DataCite Interoperability Network (ODIN) codesprint and first year conference at CERN, Switzerland.

And even before that, I won a best paper award for my paper and presentation on “Ontology-based queries over cancer data” at the 3rd International Workshop on Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2010).

What do you think the most important issues/challenges in your field will be in the next decade and how is the Centre placed to address them?

There are technical challenges around data stewardship: distinguishing what data to preserve, evaluating data quality, ensuring the accessibility and privacy of confidential data, and providing easy to use tools to help in all of the above. But there are also important social challenges which need to be tackled, which revolve around highlighting the societal benefits of making the research process more transparent, as well as revising the academic credit system, so that instead of seeing it as being a disadvantage, researchers get rewarded for sharing their well-described data and methods.

What do you think the Centre does best?

The Centre’s interdisciplinary nature is definitely what makes us unique compared to other departments across the University. This interdisciplinary nature allows us to have a stimulating working environment with common interest groups (e.g. in linked data or machine learning) while we work applying these methodologies to different domains of application.

Watch Alejandra’s presentation on scholarly publishing in the life sciences at the 16th Annual BioInformatics Open Source Conference.

Image of Alejandra's presentation in Youtube

Twitter Facebook LinkedIn

Best Practices for Software Registries and Repositories

4 minute read

Published: August 04, 2021

(This post is cross-posted on the SciCodes website, the SSI blog, the ASCL blog, and the FORCE11 blog, Better Scientific Software (BSSW) website.) Read more

Evidence for the importance of research software

9 minute read

Published: June 08, 2020

(This post is cross-posted on the URSSI blog, the SSI blog and the Netherlands eScience Center blog, and is archived in Zenodo) Read more

The Research Software Alliance (ReSA) and the Community Landscape

5 minute read

Published: March 11, 2020

(This post is cross-posted on the UK Software Sustainability Institute blog, the Netherlands eScience Center blog and the US Research Software Sustainability Institute blog.) ReSA’s mission is to bring research software communities together to collaborate on the advancement of research software. Its vision is to have research software recognized and valued as a fundamental and vital component of research worldwide. Given our mission, there are multiple reasons that it’s important for us to understand the landscape of communities that are involved with software, in aspects such as preservation, citation, career paths, productivity, and sustainability. One of these reasons is that ReSA seeks to be a link between these communities, which requires identifying and understanding them. We want to be sure that there aren’t significant community organizations that we don’t know about to involve in our work. Also, identifying where there are gaps will help us create the opportunities and communities of practices as required. When thinking about these communities, it’s clear that in addition to those that focus on software, there are others for which software is just a small part of their interest. Some examples are communities that focus on open science, reproducibility, roles and careers for people who are less visible in research, publishing and review, and other types of scholarly products and digital objects. ReSA also wants to define how we fit and interact with that broader scholarly landscape.

How was this work undertaken?

In September 2019, a ReSA taskforce came together to map the software community landscape, consisting of the authors of this blog. This group distributed a survey to ReSA google group members to identify other groups interested in software. Other useful sources included:

Netherlands eScience Center: Awesome-research-software-registries by Jurriaan Spaaks
eResearch-meeting-list by James Hetherington
International RSE groups by the Research Software Engineering (RSE) Association
Open Science Grassroots Community Networks, a consortium of 120 networks
In which journals should I publish my software? by Neil Chue Hong

The taskforce then met to consider the results and how to analyze them. The ReSA list of research software communities is now publicly available as a living community resource, with the version of this list used by the ReSA taskforce in February 2020 and a copy of this post archived in Zenodo. Suggested additions or corrections are welcome by making comments in the list. Some of the issues we’ve had in assembling this list are:

How much interest in software does an organization need to have to be listed?
When is an organization sufficiently research focused to be included?
What momentum/scale does an organization need to have so that we consider it relevant in the global picture?

On the other hand, once we started adding entries to the list, for many we found that we immediately thought of other similar organizations that should be added. For example, some organizations have a geographic aspect, and this led us to think of other similar organizations with different geographic aspects, such as all the national and regional RSE associations.

What did we learn?

There were a range of interesting outcomes of the analysis:

There are many, many communities that support research software, emphasizing the need for a coordinating organization such as ReSA. The importance of community development is captured in articles such as Community Organizations: Changing the Culture in Which Research Software is Developed and Sustained by Daniel S. Katz et al., which provides an overview of key groups and discusses opportunities to leverage their synergistic activities.
There is an increasing (and wide) range of community initiatives. For example, the Open Science Grassroots Community Networks list has evolved into the Community of Open Scholarship Grassroots Networks (COSGN), whose networks communicate and coordinate on topics of common interest. COSGN has submitted an NSF proposal to formalize governance and coordination of the networks to maximize impact and establish standard practices for sustainability.
The increasing focus on open software makes it hard to separate research and non-research initiatives. As per the points above, it is very hard to define which initiatives are part of the research software community, and which aren’t.
Some organizations that were originally data-centric now include a software focus. For example, the Research Data Alliance now includes the Software Source Code Interest Group, which provides a forum to discuss issues on management, sharing, discovery, archiving, and provenance of software source code.

What are the next steps?

We invite readers to continue to add or make corrections to the ReSA list of research software communities by making comments in the list, which will continue to be curated by ReSA. We are also interested to hear from community members who would like to engage with us in writing a landscape paper based on further analysis and work. This could address questions such as what are the axes that create the space, where do the currently-known organizations fit in the space, and are there gaps where no organization is currently working? We also invite readers to consider involvement in other ReSA activities, including Taskforces.

Conclusion

The ever-growing number of constituents of the research software community both reflects and demonstrates the increasing recognition of research software. The research software community is now a complex ecosystem comprised of a wide variety of organizations and initiatives, some of which are community networks themselves. Collaboration and coordination across these initiatives is important, to enable the broader community to work together to achieve bigger goals. ReSA aims to coordinate across these efforts to leverage investments, to achieve the shared long-term goal of research software valued as a fundamental and vital component of research worldwide. Join the ReSA google group to stay up-to-date on our activities. Read more

Web standards for describing datasets and profiles

6 minute read

Published: February 14, 2019

This is blog post was published on the Software Sustainability Institute’s website. Read more