Search | arXiv e-print repository

Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations

Authors: Matthias Lindemann, Alexander Koller, Ivan Titov

Abstract: Models need appropriate inductive biases to effectively learn from small amounts of data and generalize systematically outside of the training distribution. While Transformers are highly versatile and powerful, they can still benefit from enhanced structural inductive biases for seq2seq tasks, especially those involving syntactic transformations, such as converting active to passive voice or seman… ▽ More Models need appropriate inductive biases to effectively learn from small amounts of data and generalize systematically outside of the training distribution. While Transformers are highly versatile and powerful, they can still benefit from enhanced structural inductive biases for seq2seq tasks, especially those involving syntactic transformations, such as converting active to passive voice or semantic parsing. In this paper, we propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training to perform synthetically generated syntactic transformations of dependency trees given a description of the transformation. Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking, and also improves structural generalization for semantic parsing. Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token, and that the model can leverage these attention heads on downstream tasks. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2310.13561 [pdf, other]

Cache & Distil: Optimising API Calls to Large Language Models

Authors: Guillem Ramírez, Matthias Lindemann, Alexandra Birch, Ivan Titov

Abstract: Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can employ a smaller language model -- a student -- which is continuously trained on the responses of the LLM. This student gradually gains proficiency in independently handling an increasing number of user requests, a… ▽ More Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can employ a smaller language model -- a student -- which is continuously trained on the responses of the LLM. This student gradually gains proficiency in independently handling an increasing number of user requests, a process we term neural caching. The crucial element in neural caching is a policy that decides which requests should be processed by the student alone and which should be redirected to the LLM, subsequently aiding the student's learning. In this study, we focus on classification tasks, and we consider a range of classic active learning-based selection criteria as the policy. Our experiments suggest that Margin Sampling and Query by Committee bring consistent benefits across tasks and budgets. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.00796 [pdf, other]

SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation

Authors: Matthias Lindemann, Alexander Koller, Ivan Titov

Abstract: Strong inductive biases enable learning from little data and help generalization outside of the training distribution. Popular neural architectures such as Transformers lack strong structural inductive biases for seq2seq NLP tasks on their own. Consequently, they struggle with systematic generalization beyond the training distribution, e.g. with extrapolating to longer inputs, even when pre-traine… ▽ More Strong inductive biases enable learning from little data and help generalization outside of the training distribution. Popular neural architectures such as Transformers lack strong structural inductive biases for seq2seq NLP tasks on their own. Consequently, they struggle with systematic generalization beyond the training distribution, e.g. with extrapolating to longer inputs, even when pre-trained on large amounts of text. We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training it to simulate structural transformations on synthetic data. Specifically, we inject an inductive bias towards Finite State Transducers (FSTs) into a Transformer by pre-training it to simulate FSTs given their descriptions. Our experiments show that our method imparts the desired inductive bias, resulting in improved systematic generalization and better few-shot learning for FST-like tasks. Our analysis shows that fine-tuned models accurately capture the state dynamics of the unseen underlying FSTs, suggesting that the simulation process is internalized by the fine-tuned model. △ Less

Submitted 10 July, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: ACL 2024 camera-ready

arXiv:2305.16954 [pdf, other]

Compositional Generalization without Trees using Multiset Tagging and Latent Permutations

Authors: Matthias Lindemann, Alexander Koller, Ivan Titov

Abstract: Seq2seq models have been shown to struggle with compositional generalization in semantic parsing, i.e. generalizing to unseen compositions of phenomena that the model handles correctly in isolation. We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens. Then we arrange the tokens into an output sequence using a new way of parameterizing… ▽ More Seq2seq models have been shown to struggle with compositional generalization in semantic parsing, i.e. generalizing to unseen compositions of phenomena that the model handles correctly in isolation. We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens. Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations. We formulate predicting a permutation as solving a regularized linear program and we backpropagate through the solver. In contrast to prior work, our approach does not place a priori restrictions on possible permutations, making it very expressive. Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks that require generalization to longer examples. We also outperform non-tree-based models on structural generalization on the COGS benchmark. For the first time, we show that a model without an inductive bias provided by trees achieves high accuracy on generalization to deeper recursion. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2210.03183 [pdf, other]

Compositional Generalisation with Structured Reordering and Fertility Layers

Authors: Matthias Lindemann, Alexander Koller, Ivan Titov

Abstract: Seq2seq models have been shown to struggle with compositional generalisation, i.e. generalising to new and potentially more complex structures than seen during training. Taking inspiration from grammar-based models that excel at compositional generalisation, we present a flexible end-to-end differentiable neural model that composes two structural operations: a fertility step, which we introduce in… ▽ More Seq2seq models have been shown to struggle with compositional generalisation, i.e. generalising to new and potentially more complex structures than seen during training. Taking inspiration from grammar-based models that excel at compositional generalisation, we present a flexible end-to-end differentiable neural model that composes two structural operations: a fertility step, which we introduce in this work, and a reordering step based on previous work (Wang et al., 2021). To ensure differentiability, we use the expected value of each step. Our model outperforms seq2seq models by a wide margin on challenging compositional splits of realistic semantic parsing tasks that require generalisation to longer examples. It also compares favourably to other models targeting compositional generalisation. △ Less

Submitted 15 February, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: EACL 2023 camera-ready

ACM Class: I.2.7

arXiv:2202.10195 [pdf, other]

Efficient computation of oriented vertex and arc colorings of special digraphs

Authors: Frank Gurski, Dominique Komander, Marvin Lindemann

Abstract: In this paper we study the oriented vertex and arc coloring problem on edge series-parallel digraphs (esp-digraphs) which are related to the well known series-parallel graphs. Series-parallel graphs are graphs with two distinguished vertices called terminals, formed recursively by parallel and series composition. These graphs have applications in modeling series and parallel electric circuits and… ▽ More In this paper we study the oriented vertex and arc coloring problem on edge series-parallel digraphs (esp-digraphs) which are related to the well known series-parallel graphs. Series-parallel graphs are graphs with two distinguished vertices called terminals, formed recursively by parallel and series composition. These graphs have applications in modeling series and parallel electric circuits and also play an important role in theoretical computer science. The oriented class of series-parallel digraphs is recursively defined from pairs of vertices connected by a single arc and applying the parallel and series composition, which leads to specific orientations of undirected series-parallel graphs. Further we consider the line digraphs of edge series-parallel digraphs, which are known as minimal series-parallel digraphs (msp-digraphs). We show tight upper bounds for the oriented chromatic number and the oriented chromatic index of edge series-parallel digraphs and minimal series-parallel digraphs. Furthermore, we introduce first linear time solutions for computing the oriented chromatic number of edge series-parallel digraphs and the oriented chromatic index of minimal series-parallel digraphs. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Comments: 21 pages, 8 figures. arXiv admin note: text overlap with arXiv:2012.13764

arXiv:2103.09171 [pdf, other]

Interpretable Deep Learning for the Remote Characterisation of Ambulation in Multiple Sclerosis using Smartphones

Authors: Andrew P. Creagh, Florian Lipsmeier, Michael Lindemann, Maarten De Vos

Abstract: The emergence of digital technologies such as smartphones in healthcare applications have demonstrated the possibility of developing rich, continuous, and objective measures of multiple sclerosis (MS) disability that can be administered remotely and out-of-clinic. In this work, deep convolutional neural networks (DCNN) applied to smartphone inertial sensor data were shown to better distinguish hea… ▽ More The emergence of digital technologies such as smartphones in healthcare applications have demonstrated the possibility of developing rich, continuous, and objective measures of multiple sclerosis (MS) disability that can be administered remotely and out-of-clinic. In this work, deep convolutional neural networks (DCNN) applied to smartphone inertial sensor data were shown to better distinguish healthy from MS participant ambulation, compared to standard Support Vector Machine (SVM) feature-based methodologies. To overcome the typical limitations associated with remotely generated health data, such as low subject numbers, sparsity, and heterogeneous data, a transfer learning (TL) model from similar large open-source datasets was proposed. Our TL framework utilised the ambulatory information learned on Human Activity Recognition (HAR) tasks collected from similar smartphone-based sensor data. A lack of transparency of "black-box" deep networks remains one of the largest stumbling blocks to the wider acceptance of deep learning for clinical applications. Ensuing work therefore aimed to visualise DCNN decisions attributed by relevance heatmaps using Layer-Wise Relevance Propagation (LRP). Through the LRP framework, the patterns captured from smartphone-based inertial sensor data that were reflective of those who are healthy versus persons with MS (PwMS) could begin to be established and understood. Interpretations suggested that cadence-based measures, gait speed, and ambulation-related signal perturbations were distinct characteristics that distinguished MS disability from healthy participants. Robust and interpretable outcomes, generated from high-frequency out-of-clinic assessments, could greatly augment the current in-clinic assessment picture for PwMS, to inform better disease management techniques, and enable the development of better therapeutic interventions. △ Less

Submitted 22 June, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

arXiv:2012.13764 [pdf, other]

Efficient computation of the oriented chromatic number of recursively defined digraphs

Authors: Frank Gurski, Dominique Komander, Marvin Lindemann

Abstract: In this paper we consider colorings of oriented graphs, i.e. digraphs without cycles of length 2. Given some oriented graph $G=(V,E)$, an oriented $r$-coloring for $G$ is a partition of the vertex set $V$ into $r$ independent sets, such that all the arcs between two of these sets have the same direction. The oriented chromatic number of $G$ is the smallest integer $r$ such that $G$ permits an orie… ▽ More In this paper we consider colorings of oriented graphs, i.e. digraphs without cycles of length 2. Given some oriented graph $G=(V,E)$, an oriented $r$-coloring for $G$ is a partition of the vertex set $V$ into $r$ independent sets, such that all the arcs between two of these sets have the same direction. The oriented chromatic number of $G$ is the smallest integer $r$ such that $G$ permits an oriented $r$-coloring. In this paper we consider the Oriented Chromatic Number problem on classes of recursively defined oriented graphs. Oriented co-graphs (short for oriented complement reducible graphs) can be recursively defined defined from the single vertex graph by applying the disjoint union and order composition. This recursive structure allows to compute an optimal oriented coloring and the oriented chromatic number in linear time. We generalize this result using the concept of perfect orderable graphs. Therefore, we show that for acyclic transitive digraphs every greedy coloring along a topological ordering leads to an optimal oriented coloring. Msp-digraphs (short for minimal series-parallel digraphs) can be defined from the single vertex graph by applying the parallel composition and series composition. We prove an upper bound of $7$ for the oriented chromatic number for msp-digraphs and we give an example to show that this is bound best possible. We apply this bound and the recursive structure of msp-digraphs to obtain a linear time solution for computing the oriented chromatic number of msp-digraphs. In order to generalize the results on computing the oriented chromatic number of special graph classes, we consider the parameterized complexity of the Oriented Chromatic Number problem by so-called structural parameters, which are measuring the difficulty of decomposing a graph into a special tree-structure △ Less

Submitted 12 March, 2021; v1 submitted 26 December, 2020; originally announced December 2020.

Comments: 25 pages. arXiv admin note: text overlap with arXiv:2006.13911

arXiv:2009.07365 [pdf, other]

Fast semantic parsing with well-typedness guarantees

Authors: Matthias Lindemann, Jonas Groschwitz, Alexander Koller

Abstract: AM dependency parsing is a linguistically principled method for neural semantic parsing with high accuracy across multiple graphbanks. It relies on a type system that models semantic valency but makes existing parsers slow. We describe an A* parser and a transition-based parser for AM dependency parsing which guarantee well-typedness and improve parsing speed by up to 3 orders of magnitude, while… ▽ More AM dependency parsing is a linguistically principled method for neural semantic parsing with high accuracy across multiple graphbanks. It relies on a type system that models semantic valency but makes existing parsers slow. We describe an A* parser and a transition-based parser for AM dependency parsing which guarantee well-typedness and improve parsing speed by up to 3 orders of magnitude, while maintaining or improving accuracy. △ Less

Submitted 6 October, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

Comments: Accepted at EMNLP 2020, camera-ready version

arXiv:2004.14236 [pdf, other]

Normalizing Compositional Structures Across Graphbanks

Authors: Lucia Donatelli, Jonas Groschwitz, Alexander Koller, Matthias Lindemann, Pia Weißenhorn

Abstract: The emergence of a variety of graph-based meaning representations (MRs) has sparked an important conversation about how to adequately represent semantic structure. These MRs exhibit structural differences that reflect different theoretical and design considerations, presenting challenges to uniform linguistic analysis and cross-framework semantic parsing. Here, we ask the question of which design… ▽ More The emergence of a variety of graph-based meaning representations (MRs) has sparked an important conversation about how to adequately represent semantic structure. These MRs exhibit structural differences that reflect different theoretical and design considerations, presenting challenges to uniform linguistic analysis and cross-framework semantic parsing. Here, we ask the question of which design differences between MRs are meaningful and semantically-rooted, and which are superficial. We present a methodology for normalizing discrepancies between MRs at the compositional level (Lindemann et al., 2019), finding that we can normalize the majority of divergent phenomena using linguistically-grounded rules. Our work significantly increases the match in compositional structure between MRs and improves multi-task learning (MTL) in a low-resource setting, demonstrating the usefulness of careful MR design analysis and comparison. △ Less

Submitted 30 April, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: 16 pages, 6 figures

arXiv:1906.11746 [pdf, other]

Compositional Semantic Parsing Across Graphbanks

Authors: Matthias Lindemann, Jonas Groschwitz, Alexander Koller

Abstract: Most semantic parsers that map sentences to graph-based meaning representations are hand-designed for specific graphbanks. We present a compositional neural semantic parser which achieves, for the first time, competitive accuracies across a diverse range of graphbanks. Incorporating BERT embeddings and multi-task learning improves the accuracy further, setting new states of the art on DM, PAS, PSD… ▽ More Most semantic parsers that map sentences to graph-based meaning representations are hand-designed for specific graphbanks. We present a compositional neural semantic parser which achieves, for the first time, competitive accuracies across a diverse range of graphbanks. Incorporating BERT embeddings and multi-task learning improves the accuracy further, setting new states of the art on DM, PAS, PSD, AMR 2015 and EDS. △ Less

Submitted 13 July, 2019; v1 submitted 27 June, 2019; originally announced June 2019.

Comments: Accepted at ACL 2019

arXiv:1805.11465 [pdf, other]

doi 10.18653/v1/P18-1170

AMR Dependency Parsing with a Typed Semantic Algebra

Authors: Jonas Groschwitz, Matthias Lindemann, Meaghan Fowlie, Mark Johnson, Alexander Koller

Abstract: We present a semantic parser for Abstract Meaning Representations which learns to parse strings into tree representations of the compositional structure of an AMR graph. This allows us to use standard neural techniques for supertagging and dependency tree parsing, constrained by a linguistically principled type system. We present two approximative decoding algorithms, which achieve state-of-the-ar… ▽ More We present a semantic parser for Abstract Meaning Representations which learns to parse strings into tree representations of the compositional structure of an AMR graph. This allows us to use standard neural techniques for supertagging and dependency tree parsing, constrained by a linguistically principled type system. We present two approximative decoding algorithms, which achieve state-of-the-art accuracy and outperform strong baselines. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Comments: This paper will be presented at ACL 2018 (see https://acl2018.org/programme/papers/)

Journal ref: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018

Showing 1–12 of 12 results for author: Lindemann, M