Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: DuSell, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11606  [pdf, ps, other

    cs.CL cs.AI cs.LG

    The Foundations of Tokenization: Statistical and Computational Concerns

    Authors: Juan Luis Gastaldi, John Terilla, Luca Malagutti, Brian DuSell, Tim Vieira, Ryan Cotterell

    Abstract: Tokenization - the practice of converting strings of characters over an alphabet into sequences of tokens over a vocabulary - is a critical yet under-theorized step in the NLP pipeline. Notably, it remains the only major step not fully integrated into widely used end-to-end neural models. This paper aims to address this theoretical gap by laying the foundations of tokenization from a formal perspe… ▽ More

    Submitted 8 August, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2404.16341  [pdf, other

    cs.CL

    PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin

    Authors: Stephen Bothwell, Brian DuSell, David Chiang, Brian Krostenko

    Abstract: Computational historical linguistics seeks to systematically understand processes of sound change, including during periods at which little to no formal recording of language is attested. At the same time, few computational resources exist which deeply explore phonological and morphological connections between proto-languages and their descendants. This is particularly true for the family of Itali… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 12 pages, 1 figure, 9 tables. Accepted at LREC-COLING 2024

    ACM Class: I.2.7

  3. arXiv:2310.01749  [pdf, other

    cs.CL

    Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns

    Authors: Brian DuSell, David Chiang

    Abstract: Attention, specifically scaled dot-product attention, has proven effective for natural language, but it does not have a mechanism for handling hierarchical patterns of arbitrary nesting depth, which limits its ability to recognize certain syntactic structures. To address this shortcoming, we propose stack attention: an attention operator that incorporates stacks, inspired by their theoretical conn… ▽ More

    Submitted 24 January, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 20 pages, 4 figures. Published as a spotlight paper at ICLR 2024

  4. Nondeterministic Stacks in Neural Networks

    Authors: Brian DuSell

    Abstract: Human language is full of compositional syntactic structures, and although neural networks have contributed to groundbreaking improvements in computer systems that process language, widely-used neural network architectures still exhibit limitations in their ability to process syntax. To address this issue, prior work has proposed adding stack data structures to neural networks, drawing inspiration… ▽ More

    Submitted 17 May, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: 158 pages, 24 figures. PhD thesis

  5. arXiv:2210.06884  [pdf, other

    cs.CL

    Algorithms for Weighted Pushdown Automata

    Authors: Alexandra Butoi, Brian DuSell, Tim Vieira, Ryan Cotterell, David Chiang

    Abstract: Weighted pushdown automata (WPDAs) are at the core of many natural language processing tasks, like syntax-based statistical machine translation and transition-based dependency parsing. As most existing dynamic programming algorithms are designed for context-free grammars (CFGs), algorithms for PDAs often resort to a PDA-to-CFG conversion. In this paper, we develop novel algorithms that operate dir… ▽ More

    Submitted 18 November, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: 12 pages, 7 figures. Accepted at EMNLP 2022

  6. arXiv:2210.01343  [pdf, other

    cs.CL

    The Surprising Computational Power of Nondeterministic Stack RNNs

    Authors: Brian DuSell, David Chiang

    Abstract: Traditional recurrent neural networks (RNNs) have a fixed, finite number of memory cells. In theory (assuming bounded range and precision), this limits their formal language recognition power to regular languages, and in practice, RNNs have been shown to be unable to learn many context-free languages (CFLs). In order to expand the class of languages RNNs recognize, prior work has augmented RNNs wi… ▽ More

    Submitted 10 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: 21 pages, 8 figures. Published at ICLR 2023

  7. arXiv:2109.01982  [pdf, other

    cs.CL

    Learning Hierarchical Structures with Differentiable Nondeterministic Stacks

    Authors: Brian DuSell, David Chiang

    Abstract: Learning hierarchical structures in sequential data -- from simple algorithmic patterns to natural language -- in a reliable, generalizable way remains a challenging problem for neural language models. Past work has shown that recurrent neural networks (RNNs) struggle to generalize on held-out algorithmic or syntactic patterns without supervision or some inductive bias. To remedy this, many papers… ▽ More

    Submitted 29 November, 2022; v1 submitted 4 September, 2021; originally announced September 2021.

    Comments: 17 pages, 4 figures. Published as a spotlight paper at ICLR 2022. This revision fixes typos and minor errors

  8. arXiv:2010.04674  [pdf, other

    cs.CL

    Learning Context-Free Languages with Nondeterministic Stack RNNs

    Authors: Brian DuSell, David Chiang

    Abstract: We present a differentiable stack data structure that simultaneously and tractably encodes an exponential number of stack configurations, based on Lang's algorithm for simulating nondeterministic pushdown automata. We call the combination of this data structure with a recurrent neural network (RNN) controller a Nondeterministic Stack RNN. We compare our model against existing stack RNNs on various… ▽ More

    Submitted 29 November, 2022; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: 13 pages, 5 figures. Published at CoNLL 2020. This revision fixes a typo

  9. arXiv:1910.07134  [pdf, other

    cs.CL

    Efficiency through Auto-Sizing: Notre Dame NLP's Submission to the WNGT 2019 Efficiency Task

    Authors: Kenton Murray, Brian DuSell, David Chiang

    Abstract: This paper describes the Notre Dame Natural Language Processing Group's (NDNLP) submission to the WNGT 2019 shared task (Hayashi et al., 2019). We investigated the impact of auto-sizing (Murray and Chiang, 2015; Murray et al., 2019) to the Transformer network (Vaswani et al., 2017) with the goal of substantially reducing the number of parameters in the model. Our method was able to eliminate more… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

    Comments: The 3rd Workshop on Neural Generation and Translation (WNGT 2019)