Search | arXiv e-print repository

ChapGTP, ILLC's Attempt at Raising a BabyLM: Improving Data Efficiency by Automatic Task Formation

Authors: Jaap Jumelet, Michael Hanna, Marianne de Heer Kloots, Anna Langedijk, Charlotte Pouw, Oskar van der Wal

Abstract: We present the submission of the ILLC at the University of Amsterdam to the BabyLM challenge (Warstadt et al., 2023), in the strict-small track. Our final model, ChapGTP, is a masked language model that was trained for 200 epochs, aided by a novel data augmentation technique called Automatic Task Formation. We discuss in detail the performance of this model on the three evaluation suites: BLiMP, (… ▽ More We present the submission of the ILLC at the University of Amsterdam to the BabyLM challenge (Warstadt et al., 2023), in the strict-small track. Our final model, ChapGTP, is a masked language model that was trained for 200 epochs, aided by a novel data augmentation technique called Automatic Task Formation. We discuss in detail the performance of this model on the three evaluation suites: BLiMP, (Super)GLUE, and MSGS. Furthermore, we present a wide range of methods that were ultimately not included in the model, but may serve as inspiration for training LMs in low-resource settings. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Part of the BabyLM challenge at CoNLL

arXiv:2310.03686 [pdf, other]

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Authors: Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

Abstract: In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representation… ▽ More In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representations of intermediate encoder layers instead of using the final encoder output, as is normally done in encoder-decoder models. The method thus maps previously uninterpretable vector representations to human-interpretable sequences of words or symbols. We report results from the DecoderLens applied to models trained on question answering, logical reasoning, speech recognition and machine translation. The DecoderLens reveals several specific subtasks that are solved at low or intermediate layers, shedding new light on the information flow inside the encoder component of this important class of models. △ Less

Submitted 3 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Accepted to Findings of NAACL 2024

arXiv:2104.04736 [pdf, other]

Meta-Learning for Fast Cross-Lingual Adaptation in Dependency Parsing

Authors: Anna Langedijk, Verna Dankers, Phillip Lippe, Sander Bos, Bryan Cardenas Guevara, Helen Yannakoudakis, Ekaterina Shutova

Abstract: Meta-learning, or learning to learn, is a technique that can help to overcome resource scarcity in cross-lingual NLP problems, by enabling fast adaptation to new tasks. We apply model-agnostic meta-learning (MAML) to the task of cross-lingual dependency parsing. We train our model on a diverse set of languages to learn a parameter initialization that can adapt quickly to new languages. We find tha… ▽ More Meta-learning, or learning to learn, is a technique that can help to overcome resource scarcity in cross-lingual NLP problems, by enabling fast adaptation to new tasks. We apply model-agnostic meta-learning (MAML) to the task of cross-lingual dependency parsing. We train our model on a diverse set of languages to learn a parameter initialization that can adapt quickly to new languages. We find that meta-learning with pre-training can significantly improve upon the performance of language transfer and standard supervised learning baselines for a variety of unseen, typologically diverse, and low-resource languages, in a few-shot learning setup. △ Less

Submitted 23 March, 2022; v1 submitted 10 April, 2021; originally announced April 2021.

Comments: - Add additional results (Appendix D) - Cosmetic updates for camera-ready version ACL 2022

arXiv:2103.14679 [pdf]

Secure Platform for Processing Sensitive Data on Shared HPC Systems

Authors: Michel Scheerman, Narges Zarrabi, Martijn Kruiten, Maxime Mogé, Lykle Voort, Annette Langedijk, Ruurd Schoonhoven, Tom Emery

Abstract: High performance computing clusters operating in shared and batch mode pose challenges for processing sensitive data. In the meantime, the need for secure processing of sensitive data on HPC system is growing. In this work we present a novel method for creating secure computing environments on traditional multi-tenant high-performance computing clusters. Our platform as a service provides a custom… ▽ More High performance computing clusters operating in shared and batch mode pose challenges for processing sensitive data. In the meantime, the need for secure processing of sensitive data on HPC system is growing. In this work we present a novel method for creating secure computing environments on traditional multi-tenant high-performance computing clusters. Our platform as a service provides a customizable, virtualized solution using PCOCC and SLURM to meet strict security requirements without modifying the exist-ing HPC infrastructure. We show how this platform has been used in real-world research applications from different research domains. The solution is scalable by design with low performance overhead and can be generalized for processing sensitive data on shared HPC systems imposing high security criteria △ Less

Submitted 26 March, 2021; originally announced March 2021.

Showing 1–4 of 4 results for author: Langedijk, A