2020-05-04

The unique Nordic section of the world’s largest computing grid

Written by Arne Vollertsen

“The Nordic Tier-1 may deliver the blueprint for handling the data tsunami coming from the upgraded Large Hadron Collider”

Churning out the largest amount of data ever produced in a scientific experiment, the Large Hadron Collider at CERN near Geneva is truly a mighty machine. The world’s most powerful particle accelerator is revealing secrets about the fundamental building blocks of the universe and how they interact. It has enabled the discovery of the Higgs boson, and may help reveal the nature of dark matter, so no wonder high-energy physicists are eager to tap into its unique data stream.

However, storing and processing this data stream in itself poses a huge challenge, because even after filtering out 99 per cent of the LHC data, its volume is still enormous. Thus in 2002 CERN turned to grid computing to create the most sophisticated storage and analysis system ever built for science. Enter the largest computing grid in the world: the Worldwide LHC Computing Grid (WLCG), an interconnected system of 170 computing centres in 42 countries. The network is as great a technological achievement as the collider itself, and preventing the project from drowning in its own data.

Came as a bombshell

Farid Ould-Saada, high energy physics professor at the University of Oslo and a veteran within the LHC community, recalls:

“Around 2000 everything was clear in terms of the accelerator and the experiments, but nobody had thought in detail about computing. When we realized the cost of processing and storing the data it came as a bombshell. We the scientists were surprised, and so were the funding agencies. We had to find a way out, and that way was to ask “Who has what?”, and then try to piece it all together. That is how the concept of the WLCG was born. Just like the creation of the World Wide Web, it came out of a need to collaborate and share information. With the advent of the grid we started to share not only information, but tools, storage and computing capacity as well.”

“In the Nordic research community we looked around to see how we could contribute to the WLCG, and I remember, that we considered ourselves lucky. We had good contacts to computing centres across the region, and to local scientists and engineers, so with a bottom-up approach we were able to stitch together our resources to make it work. That is how it started, and that bottom-up approach enabled us to establish a Nordic Tier-1 site, which I think is quite an achievement compared to the size of the Nordic countries.”

Nordic contribution

From early on the Nordic countries have played an important part in the collaborative effort of making LHC data accessible to the global high-energy physics community. Managed by NeIC, the Nordic part of the WLCG has held on the initial concept of being distributed across four countries, which is a unique setup in the WLGC fabric. And precisely this may now serve as blueprint for solving future problems, because soon the WLGC will be bursting at the seams, and even its current capacity for storing 88 petabytes of data per year, equalling 22 million HD movies, will not be sufficient. A powerful upgrade of the LHC is lurking on the horizon, and with it comes a data stream much more powerful than the current one.

Four layers

The WLCG is made up of four layers or “tiers”. The CERN Data Centre is Tier-0. Next in line are 13 Tier-1 sites, among them the Nordic site. The Tier-1 sites take their individual share of the data coming from CERN and distribute it further to around 170 Tier-2 sites covering most of the globe. On the edge of the WLCG are local Tier-3 computing resources, allowing individual scientists to access the network.

There are four experiments connected to the LHC, and the Nordic Tier-1 supports two of them, ALICE and ATLAS. ALICE uses seven Tier-1 centres, while the largest LHC experiment, ATLAS, uses 10 Tier-1 centres across the world. On top of providing storage on disc and tape the Nordic Tier-1 offers the dCache middleware for storing and retrieving large amount of data distributed among numerous servers.

Furthermore, a third LHC experiment, the CMS, is utilizing the NT-1 infrastructure, although the Nordic CMS site is a Tier-2.

New forces of nature

“We are looking for new forces of nature”, Farid Ould-Saada explains. “We are trying to find out what the smallest building blocks of matter are. After discovering the Higgs boson the most urgent thing for us now is to understand the origin of Dark Matter. Understanding the dark side of the world, that is the priority. And also, there is another thing I would like to add, and that one is an even stronger headache: Gravity. Why is gravity so much weaker than the other fundamental forces?”

“In our experiments we are approaching smaller and smaller scales, and at some point something might happen that could tell us about the behaviour of gravity on the level of micro cosmos. At least we hope so, but you never know. When you search for something, if you’re lucky you’ll find what you are searching for. But just as important, you may get surprises, and you’ll discover something you hadn’t thought of.”

Precise measurements

“Some people might say: “They spent so much money just to find the Higgs boson”. But that is not the way it works. The way it works is that you explore a new domain of nature never seen before, and you want to understand the rules that govern that cosmos. You do that by making precision measurements. Let us for a moment look at the atom. When scientists arrived at the atom and were able to see things with their microscopes, it was important to make as precise measurements as possible. That is how nanotechnology came into being.”

“Now we are exploring nature at subatomic scale, and how we in the future will be applying that knowledge nobody knows. But measuring is the starting point. When you go from exploring new territory to using the knowledge you have gained, you do that by making as precise measurements as possible. And for that we need as much data as possible.”

Initial scepticism

As mentioned, the Nordic Tier-1 site is special. It is designed differently compared to the other 12 Tier-1 sites, as it is built as a grid itself with data stored in 6 different locations: Bergen, Oslo, Copenhagen, Espoo, Umeå and Linköping.

In the beginning, the other countries involved in the Worldwide LHC computing Grid met the Nordic decision to build a site distributed across four countries with scepticism. The grid itself being distributed, why build a distributed Tier1 also?

But the Nordic Tier-1 soon proved stable and dependable. And moreover, it is now frequently highlighted as a prime example of how to effectively build and maintain this type of computing centre. And as the WLCG is preparing for the data tsunami coming from the upgraded LHC, the Nordic setup might provide efficient solutions for dealing with it.

New technology

“Soon the LHC will produce much more data than the current grid can handle”, Farid Ould-Saada explains. “At the LHC we work in terms of periods. End of 2018 we completed the four-year Run 2. Run 3 will go from 2021 to 2024, Run 4 is scheduled for 2027-30, and Run 5 is expected to start 2032.”

“In Run 3 we are using slightly improved detectors, but in Run 4, which is the start of the High Luminosity LHC, most of the detectors will be new and more advanced. We are putting completely new technology into the detectors and the accelerator as well, which means more particle collisions per second. This gives us a multi-dimensional challenge, and the grid needs to adjust to that.”

The High Luminosity LHC

Nordic Tier-1 project lead Mattias Wadenstein explains:

“With the High Luminosity LHC the demands for data and compute will go up with a factor of 40. In the global WLCG community we are currently discussing how to handle that steep increase. Adding to the pressure is the fact that funding will remain on the current level. Funding agencies are not willing to increase their contributions, just because the accelerator produces so much more data. So we have to do more with the same amount of money, otherwise it will impact the physics programmes, and the research output will suffer.”

Copying the Nordic model

“There are many ideas on the table right now. One of the hot tracks in this discussion is copying the Nordic model of collaborative federated resources. Running large-scale storage is quite personnel intense and in some ways trickier to run than computing. So, maybe we should have fewer larger storage elements in the worldwide fabric. Instead of around 100 different storage endpoints there could be maybe a dozen, and we could federate many sites. This means you could to some extent centralise operations while still having distributed hardware. In this way we could possibly save money and people and buy hard drives instead to make room for more petabytes of data.”

The future WLCG

According to Mattias Wadenstein, the WLCG is looking into a future that will look more like the Nordic present.

“So, in a very good way we are a forerunner. And if the WLCG decides on doing a big re-organising into a model that is closer to ours, we’re in a good position to show our strengths. We can share our experience, both success stories and warning stories as well. So, although not everybody might want to be an exact copy of us, hopefully they can find inspiration in the Nordic model of collaborative federated resources.”

Collaboration is king

The word “collaboration” pops up time and again, when you are talking to people in the particle physics community. In high-energy physics, collaboration is king. Just take a look at the enormous author lists on scientific publications based on LHC data. In case of the ATLAS experiment, one of the four main LHC experiments, you’ll find close to three thousand authors per research paper. That is to underline that not just the people who have conducted the actual analysis, but everyone who has been part of that particular experiment, is standing behind the publication. So the collaboration as a whole should get credit for it.

The same goes for the data infrastructure providing access to the experimental data. The WLCG community is a tight-knit group of people, working together closely, continually exchanging ideas and experience. That includes all four components of the WLCG: networking, hardware, middleware, and physics analysis software.

Mattias Wadenstein explains:

“We are working with the other Tier-1 sites on many levels. For instance, we are in close cooperation in regards to developing the open source software underlying both our storage and our computing. We in the Nordics have developed technology that some of our partners are using, while some innovations coming from our partners are deployed by us.”

“As an example, the ARC software comes from the Nordics. We use it heavily, in particular because our data is distributed, so we need a layer of caching in order to reduce latency for data access. And due to the on-going discussions about re-organising the WLCG this is something the other sites are looking at as well, to globally meet the demands of the upgraded LHC.”

Ready for future challenges

According to Mattias Wadenstein the Nordic Tier-1 is ready to face the challenges lying ahead. Having over the last few years worked mostly inward-facing, focusing on efficiency and well-oiled internal procedures as well as improving reliability and availability of services, the NT-1 is in a good position to meet the challenges of the unprecedented data volumes coming soon from the upgraded LHC.

Moreover, there may be other challenges on the horizon. One of them could be providing services to other large-scale scientific instruments that are coming online in the near future, and that have similar large data needs as the LHC. One of these is a radar system for the study of the Earth’s atmosphere and ionosphere, EISCAT_3D, located in the Northern parts of Norway, Finland and Sweden. The facility is under construction, and the EISCAT scientific association is currently exploring the compute, data and networking infrastructure to support it, including services provided by the NT-1 team.

The Nordic WLGC tier-1 facility is the longest-running project in the NeIC portfolio. It is set to continue until 2050, which is the entire lifetime of the LHC, plus an additional 15 years of storing the data from the experiments.