2017-03-15

Better software leads to better science

Software and computing skills have become essential in most fields of research, but researchers are often struggling with software that is unnecessarily complex and therefore not sustainable. But help is at hand, in the shape of workshops that promote better software development practices in scientific communities across the Nordic countries.

“The quality of scientific software is increasingly becoming critical for the advancement of knowledge. We firmly believe that developing better software leads to better science”, says Radovan Bast. He leads the CodeRefinery project and is a Senior Engineer in the High Performance Computing Group at the University of Tromsø – The Arctic University of Norway.

“This project is not only about educating the people who come to the workshops, in order to learn how to develop and use better software. We also hope that they can go back to their institutions and share what they have learnt and contribute to developing a better culture. In addition, we are planning to set up a platform for version control of public and private code, to simplify collaboration and enable code review in order to improve quality and knowledge transfer”, adds Thor Wikfeldt. He is an application expert at the Center for High Performance Computing (PDC) at KTH Royal Institute of Technology in Stockholm.

Reinventing the wheel is a bad idea

Software lies at the heart of research projects across a wide range of disciplines, but many common practices in the development and maintenance of scientific software are inefficient and/or outdated. Radovan Bast and Thor Wikfeldt have experienced that many researchers struggle reusing and adapting software modules written by others for their own projects. Researchers are also struggling when trying to make their modules and solutions available for others to reuse. Ideally, research software should build on existing solutions in the same way research papers build on published articles.

“Today’s practice is sometimes like reinventing the wheel time and time again. We know several examples of PhD students who have written their own software, but when the student leaves the group after three or four years, nobody understands the code anymore. It would of course be very nice if the next person entering the group could build on software that had already been developed, without having to invest a lot of time understanding software or adapting it to their own need”, says Bast.

“The reason for these problems is that code developers have, in many cases, never received training in modern software development methodologies. Their main training and interests are of course in their respective scientific domains”, adds Wikfeldt.

Enthusiasts on a mission

According to Bast and Wikfeldt, many researchers develop software that is too complex, not reproducible, not modular, not documented, not tested, or not discoverable for other researchers. All of these problems are addressed in the workshops that will be held in each of the Nordic countries in 2017. The first workshop was held over three days in Espoo in Finland in December 2016, and was packed with mostly PhD students or postdoctoral researchers from various scientific disciplines, ranging from mathematics and computer science to the physical and biological sciences, engineering and psychology.

Radovan Bast took his PhD in chemistry before he moved into programming and computing, and Thor Wikfeldt took his PhD in physics, involving a lot of coding and computer simulations. They describe themselves as enthusiasts on a mission to help other researchers to become more productive when developing and using software.

“Physicists and chemists have been writing software for quite some time, in order to simulate for instance the behaviour of atoms and molecules. Biology is another field that has become very dependent on computing. We have so far had more participants from the natural sciences than from the humanities and social sciences, but we would really like to involve more researchers also from those fields. Researchers in both the humanities and the social sciences generate a lot of data that should be processed and analysed with the best tools available”, says Bast.

It is never too late

The workshops and the educational material provided in the project is suitable for people with a broad range of programming experience.

“In the first workshops, we met some researchers who had been writing programs for years, and some who were just starting. The bottom line is that even if people have been writing code for a long time, it is never too late to adopt better practices and learn to use new tools. But it is also a good idea to start early”, says Wikfeldt.

Radovan Bast adds that the complete novices should start somewhere else. “We don’t teach programming languages, and we have no intention of replacing initiatives like for instance the established and influential [https://software-carpentry.org/ Software Carpentry project] and the [https://www.software.ac.uk/ Software Sustainability Institute]», he says.

Creating Nordic added value

The two enthusiasts hope that their efforts will create more value for the Nordic research community than the cost of the projects itself.

“This project grew from a course that was given for the first time at KTH in 2014. The budget is for two years, but the need for collaboration and education is going to be there also in the future. Therefore, we are planning to build a community that can sustain the ideas behind the initiative and bring good coding practices into the scientists’ curriculum”, concludes Bast.

Sessions of direct use for the daily work

Verena Kutschera is a postdoc at the Department of Ecology and Genetics at Uppsala University. She is happy to have participated in the CodeRefinery workshop in Stockholm in February 2017.

“In my research in evolutionary genomics and bioinformatics, I combine existing software with my own code to develop data analysis pipelines. I don’t have a background in computer sciences, so I developed my own system to write and document code and pipelines”, Kutschera explains.

Kutschera describes herself as a scientist who is a research software user rather than a developer, so the workshop provided a quite intense introduction into software development. “However, several sessions turned out to be of direct use for my own daily work, we learned for example how to set up automated testing of code and how to make complex code easier to understand. This knowledge will probably make it easier for other members of my research group or projects to know exactly how the data was processed, and for anyone else who would like to reproduce my research results once they are published”, she adds.

Facts about the project

  • CodeRefinery is a project within the Nordic e-Infrastructure Collaboration (NeIC). The kickoff was in September 2016
  • The project is aimed at teaching scientists to use better tools and become more productive when developing and using software
  • The CodeRefinery website: http://coderefinery.org
  • Twitter: @coderefine