Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Schuler, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10851  [pdf, other

    cs.CL

    Leading Whitespaces of Language Models' Subword Vocabulary Poses a Confound for Calculating Word Probabilities

    Authors: Byung-Doh Oh, William Schuler

    Abstract: Word-by-word conditional probabilities from Transformer-based language models are increasingly being used to evaluate their predictions over minimal pairs or to model the incremental processing difficulty of human readers. In this paper, we argue that there is a confound posed by the subword tokenization scheme of such language models, which has gone unaddressed thus far. This is due to the fact t… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  2. arXiv:2402.02255  [pdf, other

    cs.CL cs.LG

    Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal's Fit to Reading Times

    Authors: Byung-Doh Oh, Shisen Yue, William Schuler

    Abstract: Recent studies have shown that as Transformer-based language models become larger and are trained on very large amounts of data, the fit of their surprisal estimates to naturalistic human reading times degrades. The current work presents a series of analyses showing that word frequency is a key explanatory factor underlying these two trends. First, residual errors from four language model families… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: EACL 2024

  3. arXiv:2305.10614  [pdf, other

    cs.CL cs.AI

    Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions

    Authors: Byung-Doh Oh, William Schuler

    Abstract: While there is much recent interest in studying why Transformer-based large language models make predictions the way they do, the complex computations performed within each layer have made their behavior somewhat opaque. To mitigate this opacity, this work presents a linear decomposition of final hidden states from autoregressive language models based on each initial input token, which is exact fo… ▽ More

    Submitted 2 June, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  4. arXiv:2304.11389  [pdf, other

    cs.CL

    Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens

    Authors: Byung-Doh Oh, William Schuler

    Abstract: Recent psycholinguistic studies have drawn conflicting conclusions about the relationship between the quality of a language model and the ability of its surprisal estimates to predict human reading times, which has been speculated to be due to the large gap in both the amount of training data and model capacity across studies. The current work aims to consolidate these findings by evaluating surpr… ▽ More

    Submitted 22 October, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

    Comments: Findings of the Association for Computational Linguistics: EMNLP 2023

  5. arXiv:2212.12131  [pdf, other

    cs.CL

    Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

    Authors: Byung-Doh Oh, William Schuler

    Abstract: This work presents a detailed linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly monotonic, positive log-linear relationship between perplexity and fit to reading times for the more recently releas… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: Transactions of the Association for Computational Linguistics (pre-MIT Press publication version)

  6. arXiv:2212.11185  [pdf, other

    cs.CL

    Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal

    Authors: Byung-Doh Oh, William Schuler

    Abstract: Transformer-based large language models are trained to make predictions about the next word by aggregating representations of previous tokens through their self-attention mechanism. In the field of cognitive modeling, such attention patterns have recently been interpreted as embodying the process of cue-based retrieval, in which attention over multiple targets is taken to generate interference and… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Comments: EMNLP 2022

  7. arXiv:2209.12128  [pdf, other

    cs.LG cs.NE stat.ME stat.ML

    A Deep Learning Approach to Analyzing Continuous-Time Systems

    Authors: Cory Shain, William Schuler

    Abstract: Scientists often use observational time series data to study complex natural processes, but regression analyses often assume simplistic dynamics. Recent advances in deep learning have yielded startling improvements to the performance of models of complex processes, but deep learning is generally not used for scientific analysis. Here we show that deep learning can be used to analyze complex proces… ▽ More

    Submitted 19 April, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

    Comments: Main article: 12 pages, 1 table, 3 figures; Supplementary Information: 54 pages, 6 tables, 30 figures

  8. arXiv:2006.11646  [pdf, other

    cs.CL

    The Importance of Category Labels in Grammar Induction with Child-directed Utterances

    Authors: Lifeng Jin, William Schuler

    Abstract: Recent progress in grammar induction has shown that grammar induction is possible without explicit assumptions of language-specific knowledge. However, evaluation of induced grammars usually has ignored phrasal labels, an essential part of a grammar. Experiments in this work using a labeled evaluation metric, RH, show that linguistically motivated predictions about grammar sparsity and use of cate… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

    Comments: The 16th International Conference on Parsing Technologies (IWPT 2020)

  9. arXiv:1809.03112  [pdf, other

    cs.CL

    Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction

    Authors: Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz

    Abstract: There have been several recent attempts to improve the accuracy of grammar induction systems by bounding the recursive complexity of the induction model (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016; Jin et al., 2018). Modern depth-bounded grammar inducers have been shown to be more accurate than early unbounded PCFG inducers, but this technique has never been compared against… ▽ More

    Submitted 9 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018

  10. arXiv:1802.08545  [pdf, ps, other

    cs.CL cs.AI

    Unsupervised Grammar Induction with Depth-bounded PCFG

    Authors: Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz

    Abstract: There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016). This work extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchical sequence mod… ▽ More

    Submitted 25 February, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: Accepted by Transactions of the Association for Computational Linguistics

  11. arXiv:cs/0206026  [pdf, ps, other

    cs.CL cs.HC

    Interleaved semantic interpretation in environment-based parsing

    Authors: William Schuler

    Abstract: This paper extends a polynomial-time parsing algorithm that resolves structural ambiguity in input to a speech-based user interface by calculating and comparing the denotations of rival constituents, given some model of the interfaced application environment (Schuler 2001). The algorithm is extended to incorporate a full set of logical operators, including quantifiers and conjunctions, into this… ▽ More

    Submitted 18 June, 2002; originally announced June 2002.

    ACM Class: I.2.7; H.2.5

    Journal ref: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002)

  12. arXiv:cs/0106011  [pdf, ps, other

    cs.CL cs.HC

    Computational properties of environment-based disambiguation

    Authors: William Schuler

    Abstract: The standard pipeline approach to semantic processing, in which sentences are morphologically and syntactically resolved to a single tree before they are interpreted, is a poor fit for applications such as natural language interfaces. This is because the environment information, in the form of the objects and events in the application's run-time environment, cannot be used to inform parsing deci… ▽ More

    Submitted 7 June, 2001; originally announced June 2001.

    Comments: 8 pages, published in Proceedings of the 39th Annual Meeting of the ACL 2001

    ACM Class: I.2.7; H.5.2

  13. arXiv:cs/9810015  [pdf, ps, other

    cs.CL

    Restrictions on Tree Adjoining Languages

    Authors: Giorgio Satta, William Schuler

    Abstract: Several methods are known for parsing languages generated by Tree Adjoining Grammars (TAGs) in O(n^6) worst case running time. In this paper we investigate which restrictions on TAGs and TAG derivations are needed in order to lower this O(n^6) time complexity, without introducing large runtime constants, and without losing any of the generative power needed to capture the syntactic constructions… ▽ More

    Submitted 13 October, 1998; originally announced October 1998.

    Comments: 7 pages LaTeX + 5 eps figures

    ACM Class: I.2.7

    Journal ref: Proceedings of COLING-ACL'98