Skip to main content

Showing 1–24 of 24 results for author: Saxon, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01863  [pdf, other

    cs.CL

    VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

    Authors: Qiucheng Wu, Handong Zhao, Michael Saxon, Trung Bui, William Yang Wang, Yang Zhang, Shiyu Chang

    Abstract: Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warrant direct investigation. One understudied capability in VLMs is visual spatial planning -- the ability to comprehend the spatial arrangements of obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.16851  [pdf, other

    cs.CL cs.AI cs.CV

    Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

    Authors: Aditya Sharma, Michael Saxon, William Yang Wang

    Abstract: We present LoCoVQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (VLMs). LoCoVQA augments test examples for mathematical reasoning, VQA, and character recognition tasks with increasingly long visual contexts composed of both in-distribution and out-of-distribution distractor images. Across these tasks, a diverse set of VLMs rapidly lose… ▽ More

    Submitted 2 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Under review. Minor errata correction in revision

  3. arXiv:2406.08656  [pdf, other

    cs.CV cs.AI cs.CL

    TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

    Authors: Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William Yang Wang

    Abstract: Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluating simple actions and argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world v… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2404.04251  [pdf, other

    cs.CV cs.AI cs.CL

    Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)

    Authors: Michael Saxon, Fatima Jahara, Mahsa Khoshnoodi, Yujie Lu, Aditya Sharma, William Yang Wang

    Abstract: With advances in the quality of text-to-image (T2I) models has come interest in benchmarking their prompt faithfulness-the semantic coherence of generated images to the prompts they were conditioned on. A variety of T2I faithfulness metrics have been proposed, leveraging advances in cross-modal embeddings and vision-language models (VLMs). However, these metrics are not rigorously compared and ben… ▽ More

    Submitted 22 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: 10 pages main, 12 pages appendices, 13 figures, 3 tables

  5. arXiv:2403.11092  [pdf, other

    cs.CL cs.AI cs.CV cs.CY eess.IV

    Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

    Authors: Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang

    Abstract: Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set. One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroLa), assesses the tangible noun inventory of T2I models by prompting them to generate pictures from a concept list translated to seven languages and co… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: NAACL 2024 Main Conference

  6. arXiv:2308.03188  [pdf, other

    cs.CL cs.AI cs.LG

    Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

    Authors: Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang

    Abstract: Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent behaviors, including hallucination, unfaithful reasoning, and toxic content. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. Technique… ▽ More

    Submitted 29 August, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: Work in Progress. Version 2

  7. arXiv:2306.01735  [pdf, other

    cs.CL cs.AI cs.CV eess.IV

    Multilingual Conceptual Coverage in Text-to-Image Models

    Authors: Michael Saxon, William Yang Wang

    Abstract: We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: ACL 2023 main conference; 16 pages, 13 figures

  8. arXiv:2305.13903  [pdf, other

    cs.CL cs.CV

    Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought

    Authors: Vaishnavi Himakunthala, Andy Ouyang, Daniel Rose, Ryan He, Alex Mei, Yujie Lu, Chinmay Sonar, Michael Saxon, William Yang Wang

    Abstract: Despite exciting recent results showing vision-language systems' capacity to reason about images using natural language, their capacity for video reasoning remains under-explored. We motivate framing video reasoning as the sequential understanding of a small number of keyframes, thereby leveraging the power and robustness of vision-language while alleviating the computational complexities of proce… ▽ More

    Submitted 9 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)

  9. arXiv:2305.10684  [pdf, other

    eess.AS cs.SD

    Data Augmentation for Diverse Voice Conversion in Noisy Environments

    Authors: Avani Tanna, Michael Saxon, Amr El Abbadi, William Yang Wang

    Abstract: Voice conversion (VC) models have demonstrated impressive few-shot conversion quality on the clean, native speech populations they're trained on. However, when source or target speech accents, background noise conditions, or microphone characteristics differ from training, quality voice conversion is not guaranteed. These problems are often left unexamined in VC research, giving rise to frustratio… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023 Show and Tell, 2 pp

  10. arXiv:2305.02317  [pdf, other

    cs.CL cs.CV

    Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

    Authors: Daniel Rose, Vaishnavi Himakunthala, Andy Ouyang, Ryan He, Alex Mei, Yujie Lu, Michael Saxon, Chinmay Sonar, Diba Mirza, William Yang Wang

    Abstract: Recent advances in large language models elicit reasoning in a chain-of-thought that allows models to decompose problems in a human-like fashion. Though this paradigm improves multi-step reasoning ability in language models, it is limited by being unimodal and applied mainly to question-answering tasks. We claim that incorporating visual augmentation into reasoning is essential, especially for com… ▽ More

    Submitted 22 January, 2024; v1 submitted 3 May, 2023; originally announced May 2023.

  11. arXiv:2303.05500  [pdf, ps, other

    cs.CY cs.AI cs.HC

    Users are the North Star for AI Transparency

    Authors: Alex Mei, Michael Saxon, Shiyu Chang, Zachary C. Lipton, William Yang Wang

    Abstract: Despite widespread calls for transparent artificial intelligence systems, the term is too overburdened with disparate meanings to express precise policy aims or to orient concrete lines of research. Consequently, stakeholders often talk past each other, with policymakers expressing vague demands and practitioners devising solutions that may not address the underlying concerns. Part of why this hap… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

    Comments: 9 pages, 3 tables

  12. arXiv:2301.11916  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

    Authors: Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang Wang

    Abstract: In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. However, existing literature has highlighted the sensitivity of this capability to the selection of few-shot demonstrations. Current understandings of the underlying mechanisms by which this capability arises fro… ▽ More

    Submitted 12 February, 2024; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: code at: https://github.com/WANGXinyiLinda/concept-based-demonstration-selection Accepted to NeurIPS 2023

  13. arXiv:2212.10515  [pdf, other

    cs.CL

    CausalDialogue: Modeling Utterance-level Causality in Conversations

    Authors: Yi-Lin Tuan, Alon Albalak, Wenda Xu, Michael Saxon, Connor Pryor, Lise Getoor, William Yang Wang

    Abstract: Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans. In this research, we examine user utterances as causes and generated responses as effects, recognizing that changes in a cause should produce a different effect. To further explore this concept, we have compiled and expanded upon a new dataset called CausalDialogue through crowd… ▽ More

    Submitted 8 July, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL-Findings 2023

  14. arXiv:2210.12152  [pdf, other

    cs.CL cs.AI

    WikiWhy: Answering and Explaining Cause-and-Effect Questions

    Authors: Matthew Ho, Aditya Sharma, Justin Chang, Michael Saxon, Sharon Levy, Yujie Lu, William Yang Wang

    Abstract: As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging. Recent question answering (QA) benchmarks that attempt to assess reasoning are often limited by a narrow scope of covered situations and subject matters. We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an ans… ▽ More

    Submitted 30 November, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

  15. arXiv:2210.05035  [pdf, other

    cs.CL cs.AI

    Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis

    Authors: Wenda Xu, Yilin Tuan, Yujie Lu, Michael Saxon, Lei Li, William Yang Wang

    Abstract: Is it possible to build a general and automatic natural language generation (NLG) evaluation metric? Existing learned metrics either perform unsatisfactorily or are restricted to tasks where large human rating data is already available. We introduce SESCORE, a model-based metric that is highly correlated with human judgements without requiring human annotation, by utilizing a novel, iterative erro… ▽ More

    Submitted 25 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: EMNLP2022

  16. arXiv:2206.05263  [pdf, other

    cs.LG cs.AI cs.CV

    Causal Balancing for Domain Generalization

    Authors: Xinyi Wang, Michael Saxon, Jiachen Li, Hongyang Zhang, Kun Zhang, William Yang Wang

    Abstract: While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations. We propose a balanced mini-batch sampling strategy to transform a biased data distribution into a spurious-free balanced distribution, based on the invariance of the underly… ▽ More

    Submitted 19 February, 2023; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: Published at ICLR 2023

  17. arXiv:2112.09237  [pdf, other

    cs.CL

    PECO: Examining Single Sentence Label Leakage in Natural Language Inference Datasets through Progressive Evaluation of Cluster Outliers

    Authors: Michael Saxon, Xinyi Wang, Wenda Xu, William Yang Wang

    Abstract: Building natural language inference (NLI) benchmarks that are both challenging for modern techniques, and free from shortcut biases is difficult. Chief among these biases is "single sentence label leakage," where annotator-introduced spurious correlations yield datasets where the logical relation between (premise, hypothesis) pairs can be accurately predicted from only a single sentence, something… ▽ More

    Submitted 11 February, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: EACL 2023 14 pages, 8 figures, 5 tables

  18. arXiv:2110.02950  [pdf, other

    cs.CL cs.CY cs.LG

    Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

    Authors: Wenda Xu, Michael Saxon, Misha Sra, William Yang Wang

    Abstract: Expert-layman text style transfer technologies have the potential to improve communication between members of scientific communities and the general public. High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand. This is a particularly notable issue in the medical domain, where layman are often confused by medical text online. At present… ▽ More

    Submitted 18 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

    Comments: 12 pages, 8 tables, 3 figures. AAAI 2022 Conference Paper

  19. arXiv:2106.09009  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    End-to-End Spoken Language Understanding for Generalized Voice Assistants

    Authors: Michael Saxon, Samridhi Choudhary, Joseph P. McKenna, Athanasios Mouchtaris

    Abstract: End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model. Previous work in this area has focused on targeted tasks in fixed domains, where the output semantic structure is assumed a priori and the input speech is of limited complexity. In this work we present our approach to developing an E2E model for generalized SLU in com… ▽ More

    Submitted 19 July, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021; 5 pages, 2 tables, 1 figure

    Journal ref: Proc. Interspeech 2021, 4738-4742

  20. arXiv:2106.03831  [pdf, other

    cs.LG cs.CL stat.ML

    Counterfactual Maximum Likelihood Estimation for Training Deep Networks

    Authors: Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang

    Abstract: Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to spurious correlations that should not be learned as predictive clues. To mitigate this problem, we propose a causality-based training framework to reduce the spurious correlations caused by observed confounders. We give theoretical analysis on the underlying general Structural Causal… ▽ More

    Submitted 26 October, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: 10 pages, 2 figures, accepted to NeurIPS 2021

  21. Modeling Disclosive Transparency in NLP Application Descriptions

    Authors: Michael Saxon, Sharon Levy, Xinyi Wang, Alon Albalak, William Yang Wang

    Abstract: Broader disclosive transparency$-$truth and clarity in communication regarding the function of AI systems$-$is widely considered desirable. Unfortunately, it is a nebulous concept, difficult to both define and quantify. This is problematic, as previous work has demonstrated possible trade-offs and negative consequences to disclosive transparency, such as a confusion effect, where "too much informa… ▽ More

    Submitted 10 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: To appear at EMNLP 2021. 15 pages, 10 figures, 7 tables

    Journal ref: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 2023-2037

  22. arXiv:2101.00379  [pdf, other

    cs.CL cs.CY

    Investigating Memorization of Conspiracy Theories in Text Generation

    Authors: Sharon Levy, Michael Saxon, William Yang Wang

    Abstract: The adoption of natural language generation (NLG) models can leave individuals vulnerable to the generation of harmful information memorized by the models, such as conspiracy theories. While previous studies examine conspiracy theories in the context of social media, they have not evaluated their presence in the new space of generative language models. In this work, we investigate the capability o… ▽ More

    Submitted 8 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: ACL 2021 Findings

  23. arXiv:2008.02858  [pdf, other

    cs.CL cs.SD eess.AS

    Semantic Complexity in End-to-End Spoken Language Understanding

    Authors: Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris

    Abstract: End-to-end spoken language understanding (SLU) models are a class of model architectures that predict semantics directly from speech. Because of their input and output types, we refer to them as speech-to-interpretation (STI) models. Previous works have successfully applied STI models to targeted use cases, such as recognizing home automation commands, however no study has yet addressed how these… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: Accepted at Interspeech, 2020

  24. arXiv:1911.11360  [pdf, other

    eess.AS cs.SD eess.SP

    Robust Estimation of Hypernasality in Dysarthria with Acoustic Model Likelihood Features

    Authors: Michael Saxon, Ayush Tripathi, Yishan Jiao, Julie Liss, Visar Berisha

    Abstract: Hypernasality is a common characteristic symptom across many motor-speech disorders. For voiced sounds, hypernasality introduces an additional resonance in the lower frequencies and, for unvoiced sounds, there is reduced articulatory precision due to air escaping through the nasal cavity. However, the acoustic manifestation of these symptoms is highly variable, making hypernasality estimation very… ▽ More

    Submitted 5 August, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: 12 pages, 9 figures, 2 tables

    Journal ref: IEEE/ACM Trans. on Audio, Speech, and Language Proc. 28 (2020) 2511-2522