-
ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation
Authors:
Jaap Jumelet,
Michael Hanna,
Marianne de Heer Kloots,
Anna Langedijk,
Charlotte Pouw,
Oskar van der Wal
Abstract:
We present the submission of the ILLC at the University of Amsterdam to the BabyLM challenge (Warstadt et al., 2023), in the strict-small track. Our final model, ChapGTP, is a masked language model that was trained for 200 epochs, aided by a novel data augmentation technique called Automatic Task Formation. We discuss in detail the performance of this model on the three evaluation suites: BLiMP, (…
▽ More
We present the submission of the ILLC at the University of Amsterdam to the BabyLM challenge (Warstadt et al., 2023), in the strict-small track. Our final model, ChapGTP, is a masked language model that was trained for 200 epochs, aided by a novel data augmentation technique called Automatic Task Formation. We discuss in detail the performance of this model on the three evaluation suites: BLiMP, (Super)GLUE, and MSGS. Furthermore, we present a wide range of methods that were ultimately not included in the model, but may serve as inspiration for training LMs in low-resource settings.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Authors:
Anna Langedijk,
Hosein Mohebbi,
Gabriele Sarti,
Willem Zuidema,
Jaap Jumelet
Abstract:
In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representation…
▽ More
In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representations of intermediate encoder layers instead of using the final encoder output, as is normally done in encoder-decoder models. The method thus maps previously uninterpretable vector representations to human-interpretable sequences of words or symbols. We report results from the DecoderLens applied to models trained on question answering, logical reasoning, speech recognition and machine translation. The DecoderLens reveals several specific subtasks that are solved at low or intermediate layers, shedding new light on the information flow inside the encoder component of this important class of models.
△ Less
Submitted 3 April, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Meta-Learning for Fast Cross-Lingual Adaptation in Dependency Parsing
Authors:
Anna Langedijk,
Verna Dankers,
Phillip Lippe,
Sander Bos,
Bryan Cardenas Guevara,
Helen Yannakoudakis,
Ekaterina Shutova
Abstract:
Meta-learning, or learning to learn, is a technique that can help to overcome resource scarcity in cross-lingual NLP problems, by enabling fast adaptation to new tasks. We apply model-agnostic meta-learning (MAML) to the task of cross-lingual dependency parsing. We train our model on a diverse set of languages to learn a parameter initialization that can adapt quickly to new languages. We find tha…
▽ More
Meta-learning, or learning to learn, is a technique that can help to overcome resource scarcity in cross-lingual NLP problems, by enabling fast adaptation to new tasks. We apply model-agnostic meta-learning (MAML) to the task of cross-lingual dependency parsing. We train our model on a diverse set of languages to learn a parameter initialization that can adapt quickly to new languages. We find that meta-learning with pre-training can significantly improve upon the performance of language transfer and standard supervised learning baselines for a variety of unseen, typologically diverse, and low-resource languages, in a few-shot learning setup.
△ Less
Submitted 23 March, 2022; v1 submitted 10 April, 2021;
originally announced April 2021.
-
Secure Platform for Processing Sensitive Data on Shared HPC Systems
Authors:
Michel Scheerman,
Narges Zarrabi,
Martijn Kruiten,
Maxime Mogé,
Lykle Voort,
Annette Langedijk,
Ruurd Schoonhoven,
Tom Emery
Abstract:
High performance computing clusters operating in shared and batch mode pose challenges for processing sensitive data. In the meantime, the need for secure processing of sensitive data on HPC system is growing. In this work we present a novel method for creating secure computing environments on traditional multi-tenant high-performance computing clusters. Our platform as a service provides a custom…
▽ More
High performance computing clusters operating in shared and batch mode pose challenges for processing sensitive data. In the meantime, the need for secure processing of sensitive data on HPC system is growing. In this work we present a novel method for creating secure computing environments on traditional multi-tenant high-performance computing clusters. Our platform as a service provides a customizable, virtualized solution using PCOCC and SLURM to meet strict security requirements without modifying the exist-ing HPC infrastructure. We show how this platform has been used in real-world research applications from different research domains. The solution is scalable by design with low performance overhead and can be generalized for processing sensitive data on shared HPC systems imposing high security criteria
△ Less
Submitted 26 March, 2021;
originally announced March 2021.