2018-08-01

DeepDive - Improving Biodiversity Research e-Infrastructures by Collaboration

Research questions within biodiversity research and nature conservation often focus on geographical areas and ecosystems that extend across national borders, depending on the temporal or geographical scale of the research hypotheses. But biodiversity and ecosystem e-Infrastructures, on the other hand, are developed individually in each country, which makes access to regional data difficult. The Nordic-Baltic Collaboration on e-Infrastructures for Biodiversity Informatics (DeepDive) research project is seeking to find solutions to this challenge and enable access to data across countries by facilitating and intensifying collaboration in the Nordic-Baltic region. The project was started in 2017 with partners from Norway, Denmark, Estonia, Finland, Iceland and Sweden.

The purpose of DeepDive research e-Infrastructures is to host data and make it available. In the context of biodiversity, data may comprise a variety of different things. It can be information about species, characteristic biological features, or an overview of animals and plants or habitat information that illustrates how the species live together. Often biodiversity data are genetic data that are associated with the species. DeepDive’s project manager, Matthias Obst, points to the importance of taxonomy, or the practice of naming and classifying species into groups according to their similarities. Even in scientific descriptions of species, which are written in Latin, a species often has several names.

“The names and the connections between the identities and names of the species are constantly changing, which makes collecting and associating data with a species a highly dynamic process,” Obst clarifies.

The role of the research e-Infrastructures is to keep track of this process and manage the change of the names and the identities of the species.

“That is a very complex issue and it is usually taken care of in each country, but not so much across countries. If I would want to obtain all relevant information associated with a certain bird, for example which type of tree it lives in, what it eats, when it mates and how many eggs it usually lays, I could get a great deal of information in each individual country, but not for all countries where the species occurs. One might think that querying the national databases separately takes only a little extra time, but we’ve come to a point where we don’t want to analyse single species anymore – we want to analyse thousands of them in ecoregions shared by Nordic countries, such as the Baltic Sea and the Baltic shield,” Obst states.

Producing software and community-building

The goal of the DeepDive project is to explore synergies in research e-Infrastructure development among the Nordic and Baltic countries, and establish common services based on best practice and technical interoperability to support biodiversity and ecosystem research. In otherwords, the aim of the project is to produce software that enables researchers to access data about different species and ecosystems also outside their country of residence.

The project sets out to reach these goals by addressing three topics. The first and general aim of the project is to build up a community and unite the system developers and the users (scientists) of the research e-infrastructures. This is done by looking at ways to improve interoperability among services and research e-Infrastructures that up to now have been emerging individually in each country, on a national scale. Unification is achieved by making it possible for the research e-Infrastructures to communicate with each other.

“We are not physically moving them, we are just making the machines able to talk to each other. In other words, we are improving the interfaces and the standards of the information systems so that they can easily get same type of information from different countries,” Obst explains.

In addition to his administrative and practical responsibilities, Obst is responsible for helping the system developers to see where the potential for synergies lies. Obst himself is a marine biologist.

“I have always thought that it’s a very strong combination, that if you mix technical people with scientists, the technical person gets an understanding of what is possible and what is not, and the scientist gets a really good understanding of the deeper purpose of what they are doing – the software they are developing. I think you need that to create good software,” Obst notes

Working with big data

The second topic addresses the challenges with big data. New technology brings with it new kinds of data along with new ways of collecting them. Looking back just 10-15 years, information about species was collected manually. One person or a small group of scientists collected the data and stored it on their local drive in spreadsheet applications — or perhaps merely on paper. Thanks to new technology this is changing, and the scientists of today don’t necessarily go out in the field themselves. Instead, they use robots, drones, remote sensing devices and other kinds of new systems to scan the environment and to recognise the species and habitats. According to Obst, the new technologies both allow and force scientists to work with big data.

“But we are not trained to do that. Instead, we are trained to have our data on local hard drives and to work with our data in an Excel table format. Now we need to train scientists to move away from working with their own data and to include other people’s data, including data generated by robots and sensors. And that automatically means working with big data,” Obst says.

The incoming large amount of data puts pressure on the research e-Infrastructures. Currently, these research e-Infrastructures are databases that each country has for certain topics, for example marine data. Traditional forms of data, such as table formats with names, dates and depth are replaced by or at least accompanied by new kinds of forms such as images generated by a camera system. It is possible to recognise species and draw other conclusions based on these images. Obst explains that this means that there are different versions of data: raw data and the data already digested to make it more useful for the scientists.

“So, in addition to the ability to archive and provide data, the infrastructures are required to be able to document how the offered data were generated,” Obst summarises.

Training and capacity-building - learning from each other

As its third topic the project focuses on educating the scientists.

“You cannot work on your desktop anymore. You have to be able to go into databases and identify the relevant criteria and construct the queries that will produce the relevant data for your research question,” Obst states.

The training is not only offered to scientists, but to research e-Infrastructure providers as well. The e-Infrastructures exist to serve the scientists. Hence, according to Obst, it is crucial to make sure that the research e-Infrastructures are useful for the scientists. This is achieved by learning from each other.

“In order to make sure our research e-Infrastructures are useful for the scientists we not only have to acquire technical competence - we also need to be able to understand the scientific problems that our infrastructure can help address. Therefore, we try to “breedhybrid people” in DeepDive’s educational approach, i.e. people with both technical and scientific insight. So, we organise workshops where people with either science or technical backgrounds come together and learn from each other. “

Training workshops are held three times during 2018. The implementation of ideas and results is also mainly done in the workshops.

“Our main medium is workshops where people meet physically. In addition, we have established functional, stable communication channels for our group, such as e-mail and telephone conferences”, Obst says.

Research reports also form part of the implementation work. There is usually more than one country involved in each task and report. When it comes to producing reports, Obst has picked up a trend that recognises the benefits of teamwork.

“What I have found out, for example, is that the people involved in a task see the potential of working as a group from the start. Reports are produced in a much shorter time when the group has worked together from the beginning. If the group members haven’t communicated and worked together from the beginning, it is a much more longer process to create and finalise a report,” he notes.

Results and future goals

During the process the goals of the project have been narrowed down and become more targeted, but the main themes remain. In its first year DeepDive has analysed the research e-Infrastructure landscape within biodiversity and found its key partners. It has established a community whose members are aware of each other’s expertise. In addition, it has identified some “sweet spots”, the areas where the synergy can be produced. Obst is convinced of the benefits of working in a small group.

“The beauty of NeIC to me is that you are creating these small groups. DeepDive is a very small group, there are about ten of us. It allows us to focus on becoming a really connected community that trusts each other. The problem with building up trust is that it takes time.” Obst says.

During the next two years the DeepDive project will continue to unite users as well as solidify and consolidate the community.

“What I really want to have at the end of the project is a very small, close-knit community from what a year ago was a very loose network of people who had heard of each other. Now we are already in the stage where everyone is well informed about what the others are doing. Next stage could be that they actually work together: that they are informing and educating each other, and contacting each other when they need help,” Obst says.

After the second year Obst is expecting to see both technical and scientific results. In addition, the project will set up long term strategic goals. One long-term goal for the project is to share its expertise outside the group.

“When we have a success trail, in one or two years, we can go out and offer our network to a wider community of biodiversity informatics practitioners in the north,” Obst concludes.

FACTS

Biodiversity informatics is a methodological discipline that helps biodiversity research overcome issues related to the whole value chain of data from data capture to analyses and data products regarding vocabularies, ontologies, digitisation of collections, data sharing, data integration, data reliability (fitness for use), data quality, visualisation, analysis and long-term archival.

Text: Iiris Tarvonen, NordForsk / Photo of Mattias Obst: the interviewee’s own