The Nordic Language Processing Laboratory
The Nordic Language Processing Laboratory (NLPL) is a Nordic collaborative research project under the Nordic e-Infrastructure Collaboration (NeIC). The project is seeking to train the next generation of scientists to work on the language technology on which services such as Google Translate and Siri (Apple’s speech-recognition programme) are based.
A three-year project conducted under the Nordic e-Infrastructure Collaboration (NeIC), the Nordic Language Processing Laboratory, focuses on language technology or “Natural Language Processing” (NLP).
Language technology is an interdisciplinary field, bringing together computer science, artificial intelligence and linguistics to use computers to process natural languages such as Danish or English, for example. The NLPL is working on the e-Infrastructure that underlies language technology research.
The project’s vision is to implement a virtual Nordic language technology laboratory by developing innovative methods for sharing High Performance Computing (HPC) resources among the Nordic countries. This will be achieved by bringing together expertise from the respective “user communities” and specialists in the field to carry out data-intensive experiments on a scale which would not be possible using ordinary computer resources.
“The individuals working on this infrastructure have a technical background, and are not linguists in the traditional sense. Instead, they are focused on developing software tools for use in linguistics,” explains Bjørn Lindi, Project Manager of NLPL and Research Software Engineer in the IT Development Section at the Norwegian University of Science and Technology.
“We are using language technology when we use Siri or Google Translate, for example. To be effective, these tools not only need to be capable of speech recognition using signal processing, they have to understand the content of what’s being said. In the NLPL project we are working to improve the infrastructure used for language technology,” he says.
E-Infrastructure Needs in Language Technology
The Nordic NLPL project grew out of the need to build an integrated e-Infrastructure in the field. The objective of the project is to train language technology researchers to apply better tools and become more productive when developing or using software. Once the University of Oslo started using High Performance Computing (HPC), it became evident that more cohesive infrastructure in the field would promote greater collaboration and improve language technology research throughout the Nordic region.
“The structure of the NLPL project is targeted primarily towards doctoral fellows, associate professors and professors. Our task is to
ensure that those working with language technology have the tools they need readily at their disposal,” says Bjørn Lindi.
“Our work with the infrastructure revolves around corpora,” he continues. “Corpora here means collections of texts comprising words in context. These are used among other things for machine translation and parsing, or sentence analysis, where a sentence can be broken down into individual elements that are classified as words or compound segments. At the practical level, this is used, for instance, in Customer Relationship Management and customer service systems where you want to find out whether a customer is satisfied or not. The goal is to develop technical language tools that can aid in understanding what the customer is feeling or whether he or she is completely neutral.”
Creating a More Integrated Environment
One of the objectives of the project is to increase the international competitiveness of the Nordic countries. Technological giants such as Facebook, Google and Amazon have launched initiatives and projects in this field, and they have far greater resources than most national researchers.
“The Nordic countries are relatively well equipped in terms of HPC resources. What we are trying to do is to organise computer resources, software and data to make our language technology groups stronger. That will in turn make it easier for those working with this on a practical, day-to-day basis, and enable them to find even more constructive solutions to problems,” says Bjørn Lindi.
“Our goal is to create a more integrated environment where not everything has to be installed locally on a laptop or a local server. In other words, it should be possible to work flexibly whether your office is in Oslo or Helsinki, for example, by gathering together the research and making it available in a single location instead of storing it on Dropbox or sharing it via a link,” Bjørn Lindi concludes.
In conjunction with the annual NeIC All Hands Meeting in January 2018, the Nordic Language Processing Laboratory organised a “winter school” on the subject of “E-Infrastructure and Scientific Computing for Nordic Natural Language Processing Research”. The approximately 25 participants, representing six different universities, included NLPL team members, doctoral fellows and external research partners – all with a background in IT and/or computer science.
“This year the winter school involved a variety of activities, for example workshops on programming for Graphics Processing Units with a focus on Taito and Abel and other scientific programming and HPC techniques. We also gave tutorials in different operations based on the NLPL infrastructure, such as translation, parsing, corpora-related topics and other issues relevant to the project,” Bjørn Lindi explains.
The school programme also included visits from two instructors, André Martins and Ramon Fernandez Astudillo, from the large, popular Lisbon Machine Learning School (LxMLS) which is held in the summer. They provided instruction on new methods in machine learning, i.e. technology for improving the capabilities of computers to learn how to process text, which in turn can be used in Natural Language Processing.
FACTS ABOUT THE PROJECT
- NLPL Project Manager: Bjørn Lindi. Affiliated with the Norwegian University of Science and Technology in Trondheim. Has worked with HPC since 2004.
- Project period: 1 January 2017 – 31 December 2019.
- University of Oslo
- University of Copenhagen
- Uppsala University
- University of Helsinki
- University of Turku
- CSC-IT Center for Science Ltd (CSC)
- UNINETT Sigma2
- IT University of Copenhagen (ITU)
- Nordic e-Infrastructure Collaboration (NeIC)