Search | arXiv e-print repository

arXiv:2406.11977 [pdf, other]

Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models

Authors: Eva Portelance, Siva Reddy, Timothy J. O'Donnell

Abstract: Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to help later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different learning strategies, where one may precede the other. Here, we argue that they are instead both contingent o… ▽ More Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to help later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different learning strategies, where one may precede the other. Here, we argue that they are instead both contingent on a more general learning strategy for language acquisition: joint learning. Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously. Joint learning results in better grammar induction, realistic lexical category learning, and better interpretations of novel sentence and verb meanings. Joint learning makes language acquisition easier for learners by mutually constraining the hypotheses spaces for both syntax and semantics. Studying the dynamics of joint inference over many input sources and modalities represents an important new direction for language modeling and learning research in both cognitive sciences and AI, as it may help us explain how language can be acquired in more constrained learning settings. △ Less

Submitted 17 June, 2024; originally announced June 2024.

ACM Class: I.2.7; I.2.10; I.2.6; F.4.2

arXiv:2406.05186 [pdf, other]

Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon

Authors: Amanda Doucette, Ryan Cotterell, Morgan Sonderegger, Timothy J. O'Donnell

Abstract: It has been claimed that within a language, morphologically irregular words are more likely to be phonotactically simple and morphologically regular words are more likely to be phonotactically complex. This inverse correlation has been demonstrated in English for a small sample of words, but has yet to be shown for a larger sample of languages. Furthermore, frequency and word length are known to i… ▽ More It has been claimed that within a language, morphologically irregular words are more likely to be phonotactically simple and morphologically regular words are more likely to be phonotactically complex. This inverse correlation has been demonstrated in English for a small sample of words, but has yet to be shown for a larger sample of languages. Furthermore, frequency and word length are known to influence both phonotactic complexity and morphological irregularity, and they may be confounding factors in this relationship. Therefore, we examine the relationships between all pairs of these four variables both to assess the robustness of previous findings using improved methodology and as a step towards understanding the underlying causal relationship. Using information-theoretic measures of phonotactic complexity and morphological irregularity (Pimentel et al., 2020; Wu et al., 2019) on 25 languages from UniMorph, we find that there is evidence of a positive relationship between morphological irregularity and phonotactic complexity within languages on average, although the direction varies within individual languages. We also find weak evidence of a negative relationship between word length and morphological irregularity that had not been previously identified, and that some existing findings about the relationships between these four variables are not as robust as previously thought. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: To appear in Proceedings of the Society for Computation in Linguistics 2024

arXiv:2403.01187 [pdf, ps, other]

A Compositional Typed Semantics for Universal Dependencies

Authors: Laurestine Bradford, Timothy John O'Donnell, Siva Reddy

Abstract: Languages may encode similar meanings using different sentence structures. This makes it a challenge to provide a single set of formal rules that can derive meanings from sentences in many languages at once. To overcome the challenge, we can take advantage of language-general connections between meaning and syntax, and build on cross-linguistically parallel syntactic structures. We introduce UD Ty… ▽ More Languages may encode similar meanings using different sentence structures. This makes it a challenge to provide a single set of formal rules that can derive meanings from sentences in many languages at once. To overcome the challenge, we can take advantage of language-general connections between meaning and syntax, and build on cross-linguistically parallel syntactic structures. We introduce UD Type Calculus, a compositional, principled, and language-independent system of semantic types and logical forms for lexical items which builds on a widely-used language-general dependency syntax framework. We explain the essential features of UD Type Calculus, which all involve giving dependency relations denotations just like those of words. These allow UD-TC to derive correct meanings for sentences with a wide range of syntactic structures by making use of dependency labels. Finally, we present evaluation results on a large existing corpus of sentences and their logical forms, showing that UD-TC can produce meanings comparable with our baseline. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 10 pages, 6 figures, 1 table. For related code, see https://github.com/McGill-NLP/ud-to-meaning

arXiv:2302.06784 [pdf, other]

The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation

Authors: Kushal Arora, Timothy J. O'Donnell, Doina Precup, Jason Weston, Jackie C. K. Cheung

Abstract: State-of-the-art language generation models can degenerate when applied to open-ended generation problems such as text completion, story generation, or dialog modeling. This degeneration usually shows up in the form of incoherence, lack of vocabulary diversity, and self-repetition or copying from the context. In this paper, we postulate that ``human-like'' generations usually lie in a narrow and n… ▽ More State-of-the-art language generation models can degenerate when applied to open-ended generation problems such as text completion, story generation, or dialog modeling. This degeneration usually shows up in the form of incoherence, lack of vocabulary diversity, and self-repetition or copying from the context. In this paper, we postulate that ``human-like'' generations usually lie in a narrow and nearly flat entropy band, and violation of these entropy bounds correlates with degenerate behavior. Our experiments show that this stable narrow entropy zone exists across models, tasks, and domains and confirm the hypothesis that violations of this zone correlate with degeneration. We then use this insight to propose an entropy-aware decoding algorithm that respects these entropy bounds resulting in less degenerate, more contextual, and "human-like" language generation in open-ended text generation settings. △ Less

Submitted 13 February, 2023; originally announced February 2023.

arXiv:2203.12788 [pdf, other]

Evaluating Distributional Distortion in Neural Language Modeling

Authors: Benjamin LeBrun, Alessandro Sordoni, Timothy J. O'Donnell

Abstract: A fundamental characteristic of natural language is the high rate at which speakers produce novel expressions. Because of this novelty, a heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language (Baayen, 2001). Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate. As a resul… ▽ More A fundamental characteristic of natural language is the high rate at which speakers produce novel expressions. Because of this novelty, a heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language (Baayen, 2001). Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate. As a result, we have relatively little understanding of whether neural LMs accurately estimate the probability of sequences in this heavy-tail of rare events. To address this gap, we develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages from which we can exactly compute sequence probabilities. Training LMs on generations from these artificial languages, we compare the sequence-level probability estimates given by LMs to the true probabilities in the target language. Our experiments reveal that LSTM and Transformer language models (i) systematically underestimate the probability of sequences drawn from the target language, and (ii) do so more severely for less-probable sequences. Investigating where this probability mass went, (iii) we find that LMs tend to overestimate the probability of ill formed (perturbed) sequences. In addition, we find that this underestimation behaviour (iv) is weakened, but not eliminated by greater amounts of training data, and (v) is exacerbated for target distributions with lower entropy. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Journal ref: International Conference on Learning Representations. 2022

arXiv:2112.00578 [pdf, other]

Systematic Generalization with Edge Transformers

Authors: Leon Bergen, Timothy J. O'Donnell, Dzmitry Bahdanau

Abstract: Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector state… ▽ More Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector states with every edge, that is, with every pair of input nodes -- as opposed to just every node, as it is done in the Transformer model. The second major innovation is a triangular attention mechanism that updates edge representations in a way that is inspired by unification from logic programming. We evaluate Edge Transformer on compositional generalization benchmarks in relational reasoning, semantic parsing, and dependency parsing. In all three settings, the Edge Transformer outperforms Relation-aware, Universal and classical Transformer baselines. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: Accepted as a conference paper at NeurIPS 2021

arXiv:2110.06843 [pdf, other]

Compositional Generalization in Dependency Parsing

Authors: Emily Goodwin, Siva Reddy, Timothy J. O'Donnell, Dzmitry Bahdanau

Abstract: Compositionality -- the ability to combine familiar units like words into novel phrases and sentences -- has been the focus of intense interest in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over… ▽ More Compositionality -- the ability to combine familiar units like words into novel phrases and sentences -- has been the focus of intense interest in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behavior of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser's lower performance on the most challenging splits. △ Less

Submitted 15 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: 12 pages 7 figures

arXiv:2104.08685 [pdf, other]

doi 10.18653/v1/2021.emnlp-main.234

Linguistic Dependencies and Statistical Dependence

Authors: Jacob Louis Hoover, Alessandro Sordoni, Wenyu Du, Timothy J. O'Donnell

Abstract: Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce t… ▽ More Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce the use of large pretrained language models to compute contextualized estimates of the pointwise mutual information between words (CPMI). For multiple models and languages, we extract dependency trees which maximize CPMI, and compare to gold standard linguistic dependencies. Overall, we find that CPMI dependencies achieve an unlabelled undirected attachment score of at most $\approx 0.5$. While far above chance, and consistently above a non-contextualized PMI baseline, this score is generally comparable to a simple baseline formed by connecting adjacent words. We analyze which kinds of linguistic dependencies are best captured in CPMI dependencies, and also find marked differences between the estimates of the large pretrained language models, illustrating how their different training schemes affect the type of dependencies they capture. △ Less

Submitted 29 April, 2022; v1 submitted 17 April, 2021; originally announced April 2021.

Comments: EMNLP2021 camera-ready version. 9 pages, plus references and appendices

Report number: 2021.emnlp-main.234

Journal ref: Proceedings EMNLP (2021), 2941--2963

arXiv:2104.08664 [pdf, other]

Characterizing Idioms: Conventionality and Contingency

Authors: Michaela Socolof, Jackie Chi Kit Cheung, Michael Wagner, Timothy J. O'Donnell

Abstract: Idioms are unlike most phrases in two important ways. First, the words in an idiom have non-canonical meanings. Second, the non-canonical meanings of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define tw… ▽ More Idioms are unlike most phrases in two important ways. First, the words in an idiom have non-canonical meanings. Second, the non-canonical meanings of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define two measures that correspond to the properties above, and we implement them using BERT (Devlin et al., 2019) and XLNet(Yang et al., 2019). We show that idioms fall at the expected intersection of the two dimensions, but that the dimensions themselves are not correlated. Our results suggest that special machinery to handle idioms may not be warranted. △ Less

Submitted 14 September, 2022; v1 submitted 17 April, 2021; originally announced April 2021.

arXiv:2104.06645 [pdf, other]

Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention

Authors: Leon Bergen, Dzmitry Bahdanau, Timothy J. O'Donnell

Abstract: We present a model that jointly learns the denotations of words together with their groundings using a truth-conditional semantics. Our model builds on the neurosymbolic approach of Mao et al. (2019), learning to ground objects in the CLEVR dataset (Johnson et al., 2017) using a novel parallel attention mechanism. The model achieves state of the art performance on visual question answering, learni… ▽ More We present a model that jointly learns the denotations of words together with their groundings using a truth-conditional semantics. Our model builds on the neurosymbolic approach of Mao et al. (2019), learning to ground objects in the CLEVR dataset (Johnson et al., 2017) using a novel parallel attention mechanism. The model achieves state of the art performance on visual question answering, learning to detect and ground objects with question performance as the only training signal. We also show that the model is able to learn flexible non-canonical groundings just by adjusting answers to questions in the training set. △ Less

Submitted 14 April, 2021; originally announced April 2021.

arXiv:2010.04704 [pdf, other]

Recursive Top-Down Production for Sentence Generation with Latent Trees

Authors: Shawn Tan, Yikang Shen, Timothy J. O'Donnell, Alessandro Sordoni, Aaron Courville

Abstract: We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves, allowing us to compute the likelihood of a sequence of $N$ tokens under a latent tree model, which we maximise to train a recursive neural function. We demonstrate perfo… ▽ More We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves, allowing us to compute the likelihood of a sequence of $N$ tokens under a latent tree model, which we maximise to train a recursive neural function. We demonstrate performance on two synthetic tasks: SCAN (Lake and Baroni, 2017), where it outperforms previous models on the LENGTH split, and English question formation (McCoy et al., 2020), where it performs comparably to decoders with the ground-truth tree structure. We also present experimental results on German-English translation on the Multi30k dataset (Elliott et al., 2016), and qualitatively analyse the induced tree structures our model learns for the SCAN tasks and the German-English translation task. △ Less

Submitted 9 October, 2020; originally announced October 2020.

arXiv:2005.05864 [pdf]

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

Authors: Wenyu Du, Zhouhan Lin, Yikang Shen, Timothy J. O'Donnell, Yoshua Bengio, Yue Zhang

Abstract: It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally efficiently incorporating syntactic structure into neural language models has been a challenging topic. In this paper, we make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "synta… ▽ More It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally efficiently incorporating syntactic structure into neural language models has been a challenging topic. In this paper, we make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances", where information between these two separate objectives shares the same intermediate representation. Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: ACL20

arXiv:2005.04315 [pdf, other]

Probing Linguistic Systematicity

Authors: Emily Goodwin, Koustuv Sinha, Timothy J. O'Donnell

Abstract: Recently, there has been much interest in the question of whether deep natural language understanding models exhibit systematicity; generalizing such that units like words make consistent contributions to the meaning of the sentences in which they appear. There is accumulating evidence that neural models often generalize non-systematically. We examined the notion of systematicity from a linguistic… ▽ More Recently, there has been much interest in the question of whether deep natural language understanding models exhibit systematicity; generalizing such that units like words make consistent contributions to the meaning of the sentences in which they appear. There is accumulating evidence that neural models often generalize non-systematically. We examined the notion of systematicity from a linguistic perspective, defining a set of probes and a set of metrics to measure systematic behaviour. We also identified ways in which network architectures can generalize non-systematically, and discuss why such forms of generalization may be unsatisfying. As a case study, we performed a series of experiments in the setting of natural language inference (NLI), demonstrating that some NLU systems achieve high overall performance despite being non-systematic. △ Less

Submitted 25 August, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: To appear at ACL2020, 9 pages, 2 figures

arXiv:1912.05783 [pdf, other]

CLOSURE: Assessing Systematic Generalization of CLEVR Models

Authors: Dzmitry Bahdanau, Harm de Vries, Timothy J. O'Donnell, Shikhar Murty, Philippe Beaudoin, Yoshua Bengio, Aaron Courville

Abstract: The CLEVR dataset of natural-looking questions about 3D-rendered scenes has recently received much attention from the research community. A number of models have been proposed for this task, many of which achieved very high accuracies of around 97-99%. In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations… ▽ More The CLEVR dataset of natural-looking questions about 3D-rendered scenes has recently received much attention from the research community. A number of models have been proposed for this task, many of which achieved very high accuracies of around 97-99%. In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs. To this end, we test models' understanding of referring expressions based on matching object properties (such as e.g. "another cube that is the same size as the brown cube") in novel contexts. Our experiments on the thereby constructed CLOSURE benchmark show that state-of-the-art models often do not exhibit systematicity after being trained on CLEVR. Surprisingly, we find that an explicitly compositional Neural Module Network model also generalizes badly on CLOSURE, even when it has access to the ground-truth programs at test time. We improve the NMN's systematic generalization by developing a novel Vector-NMN module architecture with vector-valued inputs and outputs. Lastly, we investigate how much few-shot transfer learning can help models that are pretrained on CLEVR to adapt to CLOSURE. Our few-shot learning experiments contrast the adaptation behavior of the models with intermediate discrete programs with that of the end-to-end continuous models. △ Less

Submitted 17 October, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

Comments: Technical report

arXiv:1906.11483 [pdf, other]

Morphological Irregularity Correlates with Frequency

Authors: Shijie Wu, Ryan Cotterell, Timothy J. O'Donnell

Abstract: We present a study of morphological irregularity. Following recent work, we define an information-theoretic measure of irregularity based on the predictability of forms in a language. Using a neural transduction model, we estimate this quantity for the forms in 28 languages. We first present several validatory and exploratory analyses of irregularity. We then show that our analyses provide evidenc… ▽ More We present a study of morphological irregularity. Following recent work, we define an information-theoretic measure of irregularity based on the predictability of forms in a language. Using a neural transduction model, we estimate this quantity for the forms in 28 languages. We first present several validatory and exploratory analyses of irregularity. We then show that our analyses provide evidence for a correlation between irregularity and frequency: higher frequency items are more likely to be irregular and irregular items are more likely be highly frequent. To our knowledge, this result is the first of its breadth and confirms longstanding proposals from the linguistics literature. The correlation is more robust when aggregated at the level of whole paradigms--providing support for models of linguistic structure in which inflected forms are unified by abstract underlying stems or lexemes. Code is available at https://github.com/shijie-wu/neural-transducer. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Comments: ACL 2019

arXiv:1710.11350 [pdf, other]

Grammar Induction for Minimalist Grammars using Variational Bayesian Inference : A Technical Report

Authors: Eva Portelance, Amelia Bruno, Daniel Harasim, Leon Bergen, Timothy J. O'Donnell

Abstract: The following technical report presents a formal approach to probabilistic minimalist grammar parameter estimation. We describe a formalization of a minimalist grammar. We then present an algorithm for the application of variational Bayesian inference to this formalization. The following technical report presents a formal approach to probabilistic minimalist grammar parameter estimation. We describe a formalization of a minimalist grammar. We then present an algorithm for the application of variational Bayesian inference to this formalization. △ Less

Submitted 28 August, 2019; v1 submitted 31 October, 2017; originally announced October 2017.

arXiv:1710.11301 [pdf, other]

A generalized parsing framework for Abstract Grammars

Authors: Daniel Harasim, Chris Bruno, Eva Portelance, Martin Rohrmeier, Timothy J. O'Donnell

Abstract: This technical report presents a general framework for parsing a variety of grammar formalisms. We develop a grammar formalism, called an Abstract Grammar, which is general enough to represent grammars at many levels of the hierarchy, including Context Free Grammars, Minimalist Grammars, and Generalized Context-free Grammars. We then develop a single parsing framework which is capable of parsing g… ▽ More This technical report presents a general framework for parsing a variety of grammar formalisms. We develop a grammar formalism, called an Abstract Grammar, which is general enough to represent grammars at many levels of the hierarchy, including Context Free Grammars, Minimalist Grammars, and Generalized Context-free Grammars. We then develop a single parsing framework which is capable of parsing grammars which are at least up to GCFGs on the hierarchy. Our parsing framework exposes a grammar interface, so that it can parse any particular grammar formalism that can be reduced to an Abstract Grammar. △ Less

Submitted 19 January, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

Comments: Technical Report [v2: added Martin Rohrmeier as author.] [v3: fixed error stating that AGs are equivalent to GCFGS. this is in fact not known yet. other minor typos fixed.]

Showing 1–17 of 17 results for author: O'Donnell, T J