Skip to main content

Showing 1–50 of 94 results for author: Sachan, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09136  [pdf, other

    cs.CL cs.AI cs.LG

    Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

    Authors: Nico Daheim, Jakub Macina, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan

    Abstract: Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. A promising approach towards this means is to build dialog tutoring models that scaffold students' problem-solving. However, even though existing LLMs perform well in solving reasoning questions, they struggle to precisely detect student's errors and tailor their feedback to these errors. Inspi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Preprint. Nico Daheim and Jakub Macina contributed equally. Code and dataset can be found under: https://github.com/eth-lre/verify-then-generate

  2. arXiv:2407.02273  [pdf, other

    cs.CL

    Multilingual Trolley Problems for Language Models

    Authors: Zhijing Jin, Sydney Levine, Max Kleiman-Weiner, Giorgio Piatti, Jiarui Liu, Fernando Gonzalez Adauto, Francesco Ortu, András Strausz, Mrinmaya Sachan, Rada Mihalcea, Yejin Choi, Bernhard Schölkopf

    Abstract: As large language models (LLMs) are deployed in more and more real-world situations, it is crucial to understand their decision-making when faced with moral dilemmas. Inspired by a large-scale cross-cultural study of human moral preferences, "The Moral Machine Experiment", we set up the same set of moral choices for LLMs. We translate 1K vignettes of moral dilemmas, parametrically varied across ke… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2406.16254  [pdf, other

    cs.LG cs.AI cs.CL

    Confidence Regulation Neurons in Language Models

    Authors: Alessandro Stolfo, Ben Wu, Wes Gurnee, Yonatan Belinkov, Xingyi Song, Mrinmaya Sachan, Neel Nanda

    Abstract: Despite their widespread use, the mechanisms by which large language models (LLMs) represent and regulate uncertainty in next-token predictions remain largely unexplored. This study investigates two critical components believed to influence this uncertainty: the recently discovered entropy neurons and a new set of components that we term token frequency neurons. Entropy neurons are characterized b… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 25 pages, 14 figures

  4. arXiv:2406.14162  [pdf, other

    cs.IR cs.AI cs.CL

    DIRAS: Efficient LLM-Assisted Annotation of Document Relevance in Retrieval Augmented Generation

    Authors: Jingwei Ni, Tobias Schimanski, Meihong Lin, Mrinmaya Sachan, Elliott Ash, Markus Leippold

    Abstract: Retrieval Augmented Generation (RAG) is widely employed to ground responses to queries on domain-specific documents. But do RAG implementations leave out important information or excessively include irrelevant information? To allay these concerns, it is necessary to annotate domain-specific benchmarks to evaluate information retrieval (IR) performance, as relevance definitions vary across queries… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.12419  [pdf, other

    cs.CL

    AI-Assisted Human Evaluation of Machine Translation

    Authors: Vilém Zouhar, Tom Kocmi, Mrinmaya Sachan

    Abstract: Annually, research teams spend large amounts of money to evaluate the quality of machine translation systems (WMT, inter alia). This is expensive because it requires detailed human labor. The recently proposed annotation protocol, Error Span Annotation (ESA), has annotators marking erroneous parts of the translation. In our work, we help the annotators by pre-filling the span annotations with auto… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  6. arXiv:2406.11580  [pdf, other

    cs.CL

    Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation

    Authors: Tom Kocmi, Vilém Zouhar, Eleftherios Avramidis, Roman Grundkiewicz, Marzena Karpinska, Maja Popović, Mrinmaya Sachan, Mariya Shmatova

    Abstract: High-quality Machine Translation (MT) evaluation relies heavily on human judgments. Comprehensive error classification methods, such as Multidimensional Quality Metrics (MQM), are expensive as they are time-consuming and can only be done by experts, whose availability may be limited especially for low-resource languages. On the other hand, just assigning overall scores, like Direct Assessment (DA)… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  7. arXiv:2406.04216  [pdf, other

    cs.CL cs.LG

    What Do Language Models Learn in Context? The Structured Task Hypothesis

    Authors: Jiaoda Li, Yifan Hou, Mrinmaya Sachan, Ryan Cotterell

    Abstract: Large language models (LLMs) exhibit an intriguing ability to learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL). Understandably, a swath of research has been dedicated to uncovering the theories underpinning ICL. One popular hypothesis explains ICL by task selection. LLMs identify the task based on the demonstration and generalize it to the… ▽ More

    Submitted 8 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: This work is published in ACL 2024

  8. arXiv:2406.02329  [pdf, other

    cs.CL cs.LG

    On Affine Homotopy between Language Encoders

    Authors: Robin SM Chan, Reda Boumasmoud, Anej Svete, Yuxin Ren, Qipeng Guo, Zhijing Jin, Shauli Ravfogel, Mrinmaya Sachan, Bernhard Schölkopf, Mennatallah El-Assady, Ryan Cotterell

    Abstract: Pre-trained language encoders -- functions that represent text as vectors -- are an integral component of many NLP tasks. We tackle a natural question in language encoder analysis: What does it mean for two encoders to be similar? We contend that a faithful measure of similarity needs to be \emph{intrinsic}, that is, task-independent, yet still be informative of \emph{extrinsic} similarity -- the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages

  9. arXiv:2405.20318  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    CausalQuest: Collecting Natural Causal Questions for AI Agents

    Authors: Roberto Ceraolo, Dmitrii Kharlapenko, Amélie Reymond, Rada Mihalcea, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin

    Abstract: Humans have an innate drive to seek out causality. Whether fuelled by curiosity or specific goals, we constantly question why things happen, how they are interconnected, and many other related phenomena. To develop AI agents capable of addressing this natural human quest for causality, we urgently need a comprehensive dataset of natural causal questions. Unfortunately, existing datasets either con… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  10. arXiv:2405.14808  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    Implicit Personalization in Language Models: A Systematic Study

    Authors: Zhijing Jin, Nils Heil, Jiarui Liu, Shehzaad Dhuliawala, Yahang Qi, Bernhard Schölkopf, Rada Mihalcea, Mrinmaya Sachan

    Abstract: Implicit Personalization (IP) is a phenomenon of language models inferring a user's background from the implicit cues in the input prompts and tailoring the response based on this inference. While previous work has touched upon various instances of this problem, there lacks a unified framework to study this behavior. This work systematically studies IP through a rigorous mathematical formulation,… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.04515  [pdf, other

    cs.CL

    A Transformer with Stack Attention

    Authors: Jiaoda Li, Jennifer C. White, Mrinmaya Sachan, Ryan Cotterell

    Abstract: Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in the modeling power of transformer-based language models, we propose augmenting them with a differentiable, stack-based attention mechanism. Our stack-based atten… ▽ More

    Submitted 13 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: NAACL 2024 Findings

  12. arXiv:2405.02318  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    NL2FOL: Translating Natural Language to First-Order Logic for Logical Fallacy Detection

    Authors: Abhinav Lalwani, Lovish Chopra, Christopher Hahn, Caroline Trippel, Zhijing Jin, Mrinmaya Sachan

    Abstract: Logical fallacies are common errors in reasoning that undermine the logic of an argument. Automatically detecting logical fallacies has important applications in tracking misinformation and validating claims. In this paper, we design a process to reliably detect logical fallacies by translating natural language to First-order Logic (FOL) step-by-step using Large Language Models (LLMs). We then uti… ▽ More

    Submitted 17 April, 2024; originally announced May 2024.

  13. arXiv:2404.16698  [pdf, other

    cs.CL

    Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents

    Authors: Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, Rada Mihalcea

    Abstract: As AI systems pervade human life, ensuring that large language models (LLMs) make safe decisions remains a significant challenge. We introduce the Governance of the Commons Simulation (GovSim), a generative simulation platform designed to study strategic interactions and cooperative decision-making in LLMs. In GovSim, a society of AI agents must collectively balance exploiting a common resource wi… ▽ More

    Submitted 10 July, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Revised version

  14. arXiv:2404.11055  [pdf, other

    cs.CL

    On the Causal Nature of Sentiment Analysis

    Authors: Zhiheng Lyu, Zhijing Jin, Fernando Gonzalez, Rada Mihalcea, Bernhard Schoelkopf, Mrinmaya Sachan

    Abstract: Sentiment analysis (SA) aims to identify the sentiment expressed in a text, such as a product review. Given a review and the sentiment associated with it, this paper formulates SA as a combination of two tasks: (1) a causal discovery task that distinguishes whether a review "primes" the sentiment (Causal Hypothesis C1), or the sentiment "primes" the review (Causal Hypothesis C2); and (2) the tradi… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: An enhanced version of our previous exploration in arXiv:2305.01764

  15. arXiv:2403.03307  [pdf, other

    cs.CL

    Book2Dial: Generating Teacher-Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots

    Authors: Junling Wang, Jakub Macina, Nico Daheim, Sankalan Pal Chowdhury, Mrinmaya Sachan

    Abstract: Educational chatbots are a promising tool for assisting student learning. However, the development of effective chatbots in education has been challenging, as high-quality data is seldom available in this domain. In this paper, we propose a framework for generating synthetic teacher-student interactions grounded in a set of textbooks. Our approaches capture one aspect of learning interactions wher… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 24 pages, 19 tables, 2 figures

  16. arXiv:2402.13904  [pdf, other

    cs.CL

    Calibrating Large Language Models with Sample Consistency

    Authors: Qing Lyu, Kumar Shridhar, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Marianna Apidianaki, Mrinmaya Sachan, Chris Callison-Burch

    Abstract: Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional calibration techniques due to their proprietary nature and massive scale. In this work, we explore the potential of deriving confidence from the distribution of multiple randomly sampled model generati… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  17. arXiv:2402.11655  [pdf, other

    cs.CL

    Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals

    Authors: Francesco Ortu, Zhijing Jin, Diego Doimo, Mrinmaya Sachan, Alberto Cazzaniga, Bernhard Schölkopf

    Abstract: Interpretability research aims to bridge the gap between empirical success and our scientific understanding of the inner workings of large language models (LLMs). However, most existing research focuses on analyzing a single mechanism, such as how models copy or recall factual knowledge. In this work, we propose a formulation of competition of mechanisms, which focuses on the interplay of multiple… ▽ More

    Submitted 6 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  18. arXiv:2402.11073  [pdf, other

    cs.CL cs.AI

    AFaCTA: Assisting the Annotation of Factual Claim Detection with Reliable LLM Annotators

    Authors: Jingwei Ni, Minjing Shi, Dominik Stammbach, Mrinmaya Sachan, Elliott Ash, Markus Leippold

    Abstract: With the rise of generative AI, automated fact-checking methods to combat misinformation are becoming more and more important. However, factual claim detection, the first step in a fact-checking pipeline, suffers from two key issues that limit its scalability and generalizability: (1) inconsistency in definitions of the task and what a claim is, and (2) the high cost of manual annotation. To addre… ▽ More

    Submitted 2 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ACL2024 Main Conference

  19. arXiv:2402.09216  [pdf, other

    cs.CL cs.HC

    AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails

    Authors: Sankalan Pal Chowdhury, Vilém Zouhar, Mrinmaya Sachan

    Abstract: Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providin… ▽ More

    Submitted 25 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: To be presented at Learning@Scale 2024

  20. arXiv:2401.18070  [pdf, other

    cs.CL cs.AI cs.LG

    Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?

    Authors: Andreas Opedal, Alessandro Stolfo, Haruki Shirakami, Ying Jiao, Ryan Cotterell, Bernhard Schölkopf, Abulhair Saparov, Mrinmaya Sachan

    Abstract: There is increasing interest in employing large language models (LLMs) as cognitive models. For such purposes, it is central to understand which properties of human cognition are well-modeled by LLMs, and which are not. In this work, we study the biases of LLMs in relation to those known in children when solving arithmetic word problems. Surveying the learning science literature, we posit that the… ▽ More

    Submitted 17 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted at ICML 2024

  21. arXiv:2312.04350  [pdf, other

    cs.CL cs.AI cs.LG

    CLadder: Assessing Causal Reasoning in Language Models

    Authors: Zhijing Jin, Yuen Chen, Felix Leeb, Luigi Gresele, Ojasv Kamal, Zhiheng Lyu, Kevin Blin, Fernando Gonzalez Adauto, Max Kleiman-Weiner, Mrinmaya Sachan, Bernhard Schölkopf

    Abstract: The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordan… ▽ More

    Submitted 17 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023; updated with CLadder dataset v1.5

  22. RELIC: Investigating Large Language Model Responses using Self-Consistency

    Authors: Furui Cheng, Vilém Zouhar, Simran Arora, Mrinmaya Sachan, Hendrik Strobelt, Mennatallah El-Assady

    Abstract: Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To address this challenge, we propose an interactive system that helps users gain insight into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence… ▽ More

    Submitted 4 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  23. arXiv:2311.08605  [pdf, other

    cs.CL cs.AI cs.CY cs.SI

    Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis

    Authors: David F. Jenny, Yann Billeter, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin

    Abstract: The rapid advancement of Large Language Models (LLMs) has sparked intense debate regarding the prevalence of bias in these models and its mitigation. Yet, as exemplified by both results on debiasing methods in the literature and reports of alignment-related defects from the wider community, bias remains a poorly understood topic despite its practical relevance. To enhance the understanding of the… ▽ More

    Submitted 12 May, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  24. arXiv:2311.07961  [pdf, other

    cs.CL

    The ART of LLM Refinement: Ask, Refine, and Trust

    Authors: Kumar Shridhar, Koustuv Sinha, Andrew Cohen, Tianlu Wang, Ping Yu, Ram Pasunuru, Mrinmaya Sachan, Jason Weston, Asli Celikyilmaz

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their own generations? A popular concept, referred to as self-refinement, postulates that LLMs can detect and correct the errors in their generations when asked to do so. However, recent empirical evidence points in the opposite direction, suggesting that LLMs often st… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  25. arXiv:2311.02790  [pdf, other

    cs.CL cs.AI cs.CY cs.IR cs.LG

    CausalCite: A Causal Formulation of Paper Citations

    Authors: Ishan Kumar, Zhijing Jin, Ehsan Mokhtarian, Siyuan Guo, Yuen Chen, Mrinmaya Sachan, Bernhard Schölkopf

    Abstract: Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a… ▽ More

    Submitted 27 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: ACL 2024 Findings

  26. arXiv:2310.14491  [pdf, other

    cs.CL

    Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

    Authors: Yifan Hou, Jiaoda Li, Yu Fei, Alessandro Stolfo, Wangchunshu Zhou, Guangtao Zeng, Antoine Bosselut, Mrinmaya Sachan

    Abstract: Recent work has shown that language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. However, it is unclear whether LMs perform these tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. In this paper, we try to answer this question by exploring a mechanistic interpretation of LMs for multi-step reasoning tasks. C… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: This work is published in EMNLP 2023

  27. arXiv:2310.13671  [pdf, other

    cs.CL cs.AI

    Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

    Authors: Ruida Wang, Wangchunshu Zhou, Mrinmaya Sachan

    Abstract: *Data Synthesis* is a promising way to train a small model with very little labeled data. One approach for data synthesis is to leverage the rich knowledge from large language models to synthesize pseudo training examples for small models, making it possible to achieve both data and compute efficiency at the same time. However, a key challenge in data synthesis is that the synthesized dataset ofte… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023(Findings)

  28. arXiv:2310.13544  [pdf, other

    cs.CL cs.HC

    A Diachronic Perspective on User Trust in AI under Uncertainty

    Authors: Shehzaad Dhuliawala, Vilém Zouhar, Mennatallah El-Assady, Mrinmaya Sachan

    Abstract: In a human-AI collaboration, users build a mental model of the AI system based on its reliability and how it presents its decision, e.g. its presentation of system confidence and an explanation of the output. Modern NLP systems are often uncalibrated, resulting in confidently incorrect predictions that undermine user trust. In order to build trustworthy AI, we must understand how user trust is dev… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023, 14 pages (8+6)

  29. arXiv:2309.07870  [pdf, other

    cs.CL

    Agents: An Open-source Framework for Autonomous Language Agents

    Authors: Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li, Jialong Wu, Tiannan Wang, Shi Qiu, Jintian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, Shiding Zhu, Jiyu Chen, Wentao Zhang, Xiangru Tang, Ningyu Zhang, Huajun Chen, Peng Cui, Mrinmaya Sachan

    Abstract: Recent advances on large language models (LLMs) enable researchers and developers to build autonomous language agents that can automatically solve various tasks and interact with environments, humans, and other agents using natural language interfaces. We consider language agents as a promising direction towards artificial general intelligence and release Agents, an open-source library with the go… ▽ More

    Submitted 11 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Code available at https://github.com/aiwaves-cn/agents

  30. arXiv:2306.16842  [pdf, other

    cs.CL cs.IT

    Tokenization and the Noiseless Channel

    Authors: Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Mrinmaya Sachan, Ryan Cotterell

    Abstract: Subword tokenization is a key part of many NLP pipelines. However, little is known about why some tokenizer and hyperparameter combinations lead to better downstream model performance than others. We propose that good tokenizers lead to \emph{efficient} channel usage, where the channel is the means by which some input is conveyed to the model and efficiency can be quantified in information-theoret… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: ACL 2023

  31. arXiv:2306.16837  [pdf, other

    cs.CL math.OC

    A Formal Perspective on Byte-Pair Encoding

    Authors: Vilém Zouhar, Clara Meister, Juan Luis Gastaldi, Li Du, Tim Vieira, Mrinmaya Sachan, Ryan Cotterell

    Abstract: Byte-Pair Encoding (BPE) is a popular algorithm used for tokenizing data in NLP, despite being devised initially as a compression method. BPE appears to be a greedy algorithm at face value, but the underlying optimization problem that BPE seeks to solve has not yet been laid down. We formalize BPE as a combinatorial optimization problem. Via submodular functions, we prove that the iterative greedy… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: ACL 2023

  32. arXiv:2306.05836  [pdf, other

    cs.CL cs.AI cs.LG

    Can Large Language Models Infer Causation from Correlation?

    Authors: Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, Bernhard Schölkopf

    Abstract: Causal inference is one of the hallmarks of human intelligence. While the field of CausalNLP has attracted much interest in the recent years, existing causal inference datasets in NLP primarily rely on discovering causality from empirical knowledge (e.g., commonsense knowledge). In this work, we propose the first benchmark dataset to test the pure causal inference skills of large language models (… ▽ More

    Submitted 17 April, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: ICLR 2024

  33. arXiv:2306.04347  [pdf, other

    cs.CL

    World Models for Math Story Problems

    Authors: Andreas Opedal, Niklas Stoehr, Abulhair Saparov, Mrinmaya Sachan

    Abstract: Solving math story problems is a complex task for students and NLP models alike, requiring them to understand the world as described in the story and reason over it to compute an answer. Recent years have seen impressive performance on automatically solving these problems with large pre-trained language models and innovative techniques to prompt them. However, it remains unclear if these models po… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: ACL Findings 2023

  34. arXiv:2306.03175  [pdf, other

    cs.AI cs.LG stat.ML

    Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric Reasoning

    Authors: Mattia Atzeni, Mrinmaya Sachan, Andreas Loukas

    Abstract: The Abstraction and Reasoning Corpus (ARC) (Chollet, 2019) and its most recent language-complete instantiation (LARC) has been postulated as an important step towards general AI. Yet, even state-of-the-art machine learning models struggle to achieve meaningful performance on these problems, falling behind non-learning based approaches. We argue that solving these tasks requires extreme generalizat… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted for publication at the International Conference on Machine Learning, ICML 2023

  35. arXiv:2306.02457  [pdf, other

    cs.CL cs.AI

    Adaptive and Personalized Exercise Generation for Online Language Learning

    Authors: Peng Cui, Mrinmaya Sachan

    Abstract: Adaptive learning aims to provide customized educational activities (e.g., exercises) to address individual learning needs. However, manual construction and delivery of such activities is a laborious process. Thus, in this paper, we study a novel task of adaptive and personalized exercise generation for online language learning. To this end, we combine a knowledge tracing model that estimates each… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: To appear at ACL 2023

  36. arXiv:2305.18462  [pdf, other

    cs.CL cs.CR cs.LG

    Membership Inference Attacks against Language Models via Neighbourhood Comparison

    Authors: Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan, Taylor Berg-Kirkpatrick

    Abstract: Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not, and are widely used for assessing the privacy risks of language models. Most existing attacks rely on the observation that models tend to assign higher probabilities to their training samples than non-training points. However, simple thresholding of the mode… ▽ More

    Submitted 7 August, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  37. arXiv:2305.15057  [pdf, other

    cs.CL

    Linear-Time Modeling of Linguistic Structure: An Order-Theoretic Perspective

    Authors: Tianyu Liu, Afra Amini, Mrinmaya Sachan, Ryan Cotterell

    Abstract: Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language. Such tasks, in general, require exhaustive pair-wise comparisons of tokens, thus having a quadratic runtime complexity in the length of the string. We show that these exhaustive comparisons can be avoided, and, moreover, the complexity of such tasks can be reduced to linear by cast… ▽ More

    Submitted 12 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023, 23 pages

  38. arXiv:2305.15054  [pdf, other

    cs.CL cs.LG

    A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis

    Authors: Alessandro Stolfo, Yonatan Belinkov, Mrinmaya Sachan

    Abstract: Mathematical reasoning in large language models (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture. In order to improve our understanding of this aspect of language models, we present a mechanistic interpretation of Transformer-based LMs on arithmetic q… ▽ More

    Submitted 20 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023. 18 pages, 19 figures

  39. arXiv:2305.14555  [pdf, other

    cs.CL cs.AI cs.LG

    All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations

    Authors: Yuxin Ren, Qipeng Guo, Zhijing Jin, Shauli Ravfogel, Mrinmaya Sachan, Bernhard Schölkopf, Ryan Cotterell

    Abstract: Transformer models bring propelling advances in various NLP tasks, thus inducing lots of interpretability research on the learned representations of the models. However, we raise a fundamental question regarding the reliability of the representations. Specifically, we investigate whether transformers learn essentially isomorphic representation spaces, or those that are sensitive to the random seed… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  40. arXiv:2305.14536  [pdf, other

    cs.CL

    MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems

    Authors: Jakub Macina, Nico Daheim, Sankalan Pal Chowdhury, Tanmay Sinha, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan

    Abstract: While automatic dialogue tutors hold great potential in making education personalized and more accessible, research on such systems has been hampered by a lack of sufficiently large and high-quality datasets. Collecting such datasets remains challenging, as recording tutoring sessions raises privacy concerns and crowdsourcing leads to insufficient data quality. To address this, we propose a framew… ▽ More

    Submitted 23 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Jakub Macina, Nico Daheim, and Sankalan Pal Chowdhury contributed equally to this work. Accepted at EMNLP2023 Findings. Code and dataset available: https://github.com/eth-nlped/mathdial

  41. arXiv:2305.14007  [pdf, other

    cs.CL

    When Does Aggregating Multiple Skills with Multi-Task Learning Work? A Case Study in Financial NLP

    Authors: Jingwei Ni, Zhijing Jin, Qian Wang, Mrinmaya Sachan, Markus Leippold

    Abstract: Multi-task learning (MTL) aims at achieving a better model by leveraging data and knowledge from multiple tasks. However, MTL does not always work -- sometimes negative transfer occurs between tasks, especially when aggregating loosely related skills, leaving it an open question when MTL works. Previous studies show that MTL performance can be improved by algorithmic tricks. However, what tasks an… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  42. arXiv:2305.13304  [pdf, other

    cs.CL cs.LG

    RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text

    Authors: Wangchunshu Zhou, Yuchen Eleanor Jiang, Peng Cui, Tiannan Wang, Zhenxin Xiao, Yifan Hou, Ryan Cotterell, Mrinmaya Sachan

    Abstract: The fixed-size context of Transformer makes GPT models incapable of generating arbitrarily long text. In this paper, we introduce RecurrentGPT, a language-based simulacrum of the recurrence mechanism in RNNs. RecurrentGPT is built upon a large language model (LLM) such as ChatGPT and uses natural language to simulate the Long Short-Term Memory mechanism in an LSTM. At each timestep, RecurrentGPT g… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Under review

  43. arXiv:2305.12152  [pdf, other

    cs.CL

    Revisiting Automated Topic Model Evaluation with Large Language Models

    Authors: Dominik Stammbach, Vilém Zouhar, Alexander Hoyle, Mrinmaya Sachan, Elliott Ash

    Abstract: Topic models are used to make sense of large text collections. However, automatically evaluating topic model output and determining the optimal number of topics both have been longstanding challenges, with no effective automated solutions to date. This paper proposes using large language models to evaluate such output. We find that large language models appropriately assess the resulting topics, c… ▽ More

    Submitted 22 October, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Journal ref: Forthcoming in EMNLP 2023

  44. arXiv:2305.11170  [pdf, other

    cs.CL cs.AI cs.LG

    Efficient Prompting via Dynamic In-Context Learning

    Authors: Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan

    Abstract: The primary way of building AI applications is shifting from training specialist models to prompting generalist models. A common practice for prompting generalist models, often referred to as in-context learning, is to append a few examples (demonstrations) to the prompt to help the model better understand the task. While effective, in-context learning can be inefficient because it makes the input… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  45. arXiv:2305.11142  [pdf, other

    cs.CL

    Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

    Authors: Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell

    Abstract: Several recent papers claim human parity at sentence-level Machine Translation (MT), especially in high-resource languages. Thus, in response, the MT community has, in part, shifted its focus to document-level translation. Translating documents requires a deeper understanding of the structure and meaning of text, which is often captured by various kinds of discourse phenomena such as consistency,… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 9 pages. arXiv admin note: substantial text overlap with arXiv:2210.14667

    Journal ref: ACL 2023

  46. arXiv:2305.10406  [pdf, other

    cs.LG cs.AI cs.CV

    Variational Classification

    Authors: Shehzaad Dhuliawala, Mrinmaya Sachan, Carl Allen

    Abstract: We present a latent variable model for classification that provides a novel probabilistic interpretation of neural network softmax classifiers. We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders, that generalises the softmax cross-entropy loss. Treating inputs to the softmax layer as samples of a latent variabl… ▽ More

    Submitted 9 January, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to TMLR: https://openreview.net/forum?id=EWv9XGOpB3

  47. arXiv:2305.05471  [pdf, other

    cs.CL

    Beyond Good Intentions: Reporting the Research Landscape of NLP for Social Good

    Authors: Fernando Gonzalez, Zhijing Jin, Bernhard Schölkopf, Tom Hope, Mrinmaya Sachan, Rada Mihalcea

    Abstract: With the recent advances in natural language processing (NLP), a vast number of applications have emerged across various use cases. Among the plethora of NLP applications, many academic researchers are motivated to do work that has a positive social impact, in line with the recent initiatives of NLP for Social Good (NLP4SG). However, it is not always obvious to researchers how their research effor… ▽ More

    Submitted 21 October, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Findings

  48. arXiv:2305.01764  [pdf, other

    cs.CL cs.AI cs.LG stat.ME

    Psychologically-Inspired Causal Prompts

    Authors: Zhiheng Lyu, Zhijing Jin, Justus Mattern, Rada Mihalcea, Mrinmaya Sachan, Bernhard Schoelkopf

    Abstract: NLP datasets are richer than just input-output pairs; rather, they carry causal relations between the input and output variables. In this work, we take sentiment classification as an example and look into the causal relations between the review (X) and sentiment (Y). As psychology studies show that language can affect emotion, different psychological processes are evoked when a person first makes… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  49. arXiv:2304.14293  [pdf, other

    cs.CL cs.AI cs.LG

    Controlled Text Generation with Natural Language Instructions

    Authors: Wangchunshu Zhou, Yuchen Eleanor Jiang, Ethan Wilcox, Ryan Cotterell, Mrinmaya Sachan

    Abstract: Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training. Nevertheless, it is notoriously difficult to control their generation to satisfy the various constraints required by different applications. In this work, we present InstructCTG, a controlled text generation framework that incorporates different co… ▽ More

    Submitted 8 June, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: ICML 2023

  50. arXiv:2304.08931  [pdf, other

    cs.CV cs.CL

    Enhancing Textbooks with Visuals from the Web for Improved Learning

    Authors: Janvijay Singh, Vilém Zouhar, Mrinmaya Sachan

    Abstract: Textbooks are one of the main mediums for delivering high-quality education to students. In particular, explanatory and illustrative visuals play a key role in retention, comprehension and general transfer of knowledge. However, many textbooks lack these interesting visuals to support student learning. In this paper, we investigate the effectiveness of vision-language models to automatically enhan… ▽ More

    Submitted 20 October, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: EMNLP 2023; 14 pages (8+6)