Research Software Engineers and Data Scientists: More in Common

6 minute read

Published: April 05, 2018

This blog post was published in the Software Sustainability Institute’ website, and includes the conclusions of our discussions at the Research Software Engineers for Data Science (RSE4DataScience) meeting at the Alan Turing Institute in London.

By Matthew Archer, Stephen Dowsland, Rosa Filgueira, R. Stuart Geiger, Alejandra Gonzalez-Beltran, Robert Haines, James Hetherington, Christopher Holdgraf, Sanaz Jabbari Bayandor, David Mawdsley, Heiko Mueller, Tom Redfern, Martin O’Reilly, Valentina Staneva, Mark Turner, Jake VanderPlas, Kirstie Whitaker (authors in alphabetical order)

In our institutions, we employ multidisciplinary research staff who work with colleagues across many research fields to use and create software to understand and exploit research data. These researchers collaborate with others across the academy to create software and models to understand, predict and classify data not just as a service to advance the research of others, but also as scholars with opinions about computational research as a field, making supportive interventions to advance the practice of science.

Some of us use the term “data scientist” to refer to our team members, in others we use “research software engineer” (RSE), and in some both. Where both terms are used, the difference seems to be that data scientists in an academic context focus more on using software to understand data, while research software engineers more often make software libraries for others to use. However, in some places, one or other term is used to cover both, according to local tradition.

What we have in common

Regardless of job title, we hold in common many of the skills involved and the goal of driving the use of open and reproducible research practices.

Shared skill focuses include:

Literate programming: writing code to be read by humans.
Performant programming: the time or memory used by the code really matters
Algorithmic understanding: you need to know what the maths of the code you’re working with actually does.
Coding for a product: software and scripts need to live beyond the author, being used by others.
Verification and testing: it’s important that the script does what you think it does.
Scaling beyond the laptop: because performance matters, cloud and HPC skills are important.
Data wrangling: parsing, managing, linking and cleaning research data in an arcane variety of file formats.
Interactivity: the visual display of quantitative information.

Shared attitudes and approaches to work are also important commonalities:

Multidisciplinary agility: the ability to learn what you need from a new research domain as you begin a collaboration.
Navigating the research landscape: learning the techniques, languages, libraries and algorithms you need as you need them.
Managing impostor syndrome: as generalists, we know we don’t know the detail of our methods quite as well as the focused specialists, and we know how to work with experts when we need to.

Our differences emerge from historical context

The very close relationship thus seen between the two professional titles is not an accident. In different places, different tactics have been tried to resolve a common set of frustrations seen as scholars struggle to make effective use of information technology.

In the UK, the RSE Groups have tried to move computational research forward by embracing a service culture while retaining participation in the academic community, sometimes described as being both a “craftsperson and a scholar”, or science-as-a-service. We believe we make a real difference to computational research as a discipline by helping individual research groups use and create software more effectively for research, and that this helps us to create genuine value for researchers rather than to build and publish tools that are not used by researchers to do research.

The Moore-Sloan Data Science Environments (MSDSE) in the US are working to establish Data Science as a new academic interdisciplinary field, bringing together researchers from domain and methodology fields to collectively develop best practices and software for academic research. While these institutes also facilitate collaboration across academia, their funding models are less based on a service model than in UKRSE groups and more based on bringing together graduate students, postdocs, research staff, and faculty across academia together in a shared environment.

Although these approaches differ strongly, we nevertheless see that the skills, behaviours and attitudes used by the people struggling to make this work are very similar. Both movements are tackling similar issues, but in different institutional contexts. We took diverging paths from a common starting point, but now find ourselves envisaging a shared future.

The Alan Turing Institute in the UK straddles the two models, with both a Research Engineering Group following a science-as-a-service model and comprising both Data Scientists and RSEs, and a wider collaborative academic data science engagement across eleven partner universities.

Recommendations

Observing this convergence, we recommend:

Create adverts and job descriptions that are welcoming to people who identify as one or the other title: the important thing is to attract and retain the right people.
Standardised nomenclature is important, but over-specification is harmful. Don’t try too hard to delineate the exact differences in the responsibilities of the two roles: people can and will move between projects and focuses, and this is a good thing.
These roles, titles, groups, and fields are emerging and defined differently across institutions. It is important to have clear messaging to various stakeholders about the responsibilities and expectations of people in these roles.
Be open to evolving roles for team members, and ensure that stable, long-term career paths exist to support those who have taken the risk to work in emerging roles.
Don’t restrict your recruitment drive to people who have worked with one or other of these titles: the skills you need could be found in someone whose earlier roles used the other term.
Don’t be afraid to embrace service models to allow financial and institutional sustainability, but always maintain the genuine academic collaboration needed for research to flourish.

Twitter Facebook LinkedIn

Best Practices for Software Registries and Repositories

4 minute read

Published: August 04, 2021

(This post is cross-posted on the SciCodes website, the SSI blog, the ASCL blog, and the FORCE11 blog, Better Scientific Software (BSSW) website.) Read more

Evidence for the importance of research software

9 minute read

Published: June 08, 2020

(This post is cross-posted on the URSSI blog, the SSI blog and the Netherlands eScience Center blog, and is archived in Zenodo) Read more

The Research Software Alliance (ReSA) and the Community Landscape

5 minute read

Published: March 11, 2020

(This post is cross-posted on the UK Software Sustainability Institute blog, the Netherlands eScience Center blog and the US Research Software Sustainability Institute blog.) ReSA’s mission is to bring research software communities together to collaborate on the advancement of research software. Its vision is to have research software recognized and valued as a fundamental and vital component of research worldwide. Given our mission, there are multiple reasons that it’s important for us to understand the landscape of communities that are involved with software, in aspects such as preservation, citation, career paths, productivity, and sustainability. One of these reasons is that ReSA seeks to be a link between these communities, which requires identifying and understanding them. We want to be sure that there aren’t significant community organizations that we don’t know about to involve in our work. Also, identifying where there are gaps will help us create the opportunities and communities of practices as required. When thinking about these communities, it’s clear that in addition to those that focus on software, there are others for which software is just a small part of their interest. Some examples are communities that focus on open science, reproducibility, roles and careers for people who are less visible in research, publishing and review, and other types of scholarly products and digital objects. ReSA also wants to define how we fit and interact with that broader scholarly landscape.

How was this work undertaken?

In September 2019, a ReSA taskforce came together to map the software community landscape, consisting of the authors of this blog. This group distributed a survey to ReSA google group members to identify other groups interested in software. Other useful sources included:

Netherlands eScience Center: Awesome-research-software-registries by Jurriaan Spaaks
eResearch-meeting-list by James Hetherington
International RSE groups by the Research Software Engineering (RSE) Association
Open Science Grassroots Community Networks, a consortium of 120 networks
In which journals should I publish my software? by Neil Chue Hong

The taskforce then met to consider the results and how to analyze them. The ReSA list of research software communities is now publicly available as a living community resource, with the version of this list used by the ReSA taskforce in February 2020 and a copy of this post archived in Zenodo. Suggested additions or corrections are welcome by making comments in the list. Some of the issues we’ve had in assembling this list are:

How much interest in software does an organization need to have to be listed?
When is an organization sufficiently research focused to be included?
What momentum/scale does an organization need to have so that we consider it relevant in the global picture?

On the other hand, once we started adding entries to the list, for many we found that we immediately thought of other similar organizations that should be added. For example, some organizations have a geographic aspect, and this led us to think of other similar organizations with different geographic aspects, such as all the national and regional RSE associations.

What did we learn?

There were a range of interesting outcomes of the analysis:

There are many, many communities that support research software, emphasizing the need for a coordinating organization such as ReSA. The importance of community development is captured in articles such as Community Organizations: Changing the Culture in Which Research Software is Developed and Sustained by Daniel S. Katz et al., which provides an overview of key groups and discusses opportunities to leverage their synergistic activities.
There is an increasing (and wide) range of community initiatives. For example, the Open Science Grassroots Community Networks list has evolved into the Community of Open Scholarship Grassroots Networks (COSGN), whose networks communicate and coordinate on topics of common interest. COSGN has submitted an NSF proposal to formalize governance and coordination of the networks to maximize impact and establish standard practices for sustainability.
The increasing focus on open software makes it hard to separate research and non-research initiatives. As per the points above, it is very hard to define which initiatives are part of the research software community, and which aren’t.
Some organizations that were originally data-centric now include a software focus. For example, the Research Data Alliance now includes the Software Source Code Interest Group, which provides a forum to discuss issues on management, sharing, discovery, archiving, and provenance of software source code.

What are the next steps?

We invite readers to continue to add or make corrections to the ReSA list of research software communities by making comments in the list, which will continue to be curated by ReSA. We are also interested to hear from community members who would like to engage with us in writing a landscape paper based on further analysis and work. This could address questions such as what are the axes that create the space, where do the currently-known organizations fit in the space, and are there gaps where no organization is currently working? We also invite readers to consider involvement in other ReSA activities, including Taskforces.

Conclusion

The ever-growing number of constituents of the research software community both reflects and demonstrates the increasing recognition of research software. The research software community is now a complex ecosystem comprised of a wide variety of organizations and initiatives, some of which are community networks themselves. Collaboration and coordination across these initiatives is important, to enable the broader community to work together to achieve bigger goals. ReSA aims to coordinate across these efforts to leverage investments, to achieve the shared long-term goal of research software valued as a fundamental and vital component of research worldwide. Join the ReSA google group to stay up-to-date on our activities. Read more

Web standards for describing datasets and profiles

6 minute read

Published: February 14, 2019

This is blog post was published on the Software Sustainability Institute’s website. Read more