Zum Hauptinhalt springen

Showing 1–50 of 99 results for author: Black, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.14609  [pdf, other

    cs.CL

    Adversarial Databases Improve Success in Retrieval-based Large Language Models

    Authors: Sean Wu, Michael Koo, Li Yo Kao, Andy Black, Lesley Blum, Fabien Scalzo, Ira Kurtz

    Abstract: Open-source LLMs have shown great potential as fine-tuned chatbots, and demonstrate robust abilities in reasoning and surpass many existing benchmarks. Retrieval-Augmented Generation (RAG) is a technique for improving the performance of LLMs on tasks that the models weren't explicitly trained on, by leveraging external knowledge databases. Numerous studies have demonstrated the effectiveness of RA… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 24 pages, 3 figures, 11 tables

  2. arXiv:2402.19119  [pdf, other

    cs.CV cs.CL

    VIXEN: Visual Text Comparison Network for Image Difference Captioning

    Authors: Alexander Black, Jing Shi, Yifei Fan, Tu Bui, John Collomosse

    Abstract: We present VIXEN - a technique that succinctly summarizes in text the visual differences between a pair of images in order to highlight any content manipulation present. Our proposed network linearly maps image features in a pairwise manner, constructing a soft prompt for a pretrained large language model. We address the challenge of low volume of training data and lack of manipulation variety in… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: AAAI 2024

  3. arXiv:2310.12544  [pdf, other

    stat.ML cs.LG

    Neural Likelihood Approximation for Integer Valued Time Series Data

    Authors: Luke O'Loughlin, John Maclean, Andrew Black

    Abstract: Stochastic processes defined on integer valued state spaces are popular within the physical and biological sciences. These models are necessary for capturing the dynamics of small systems where the individual nature of the populations cannot be ignored and stochastic effects are important. The inference of the parameters of such models, from time series data, is challenging due to intractability o… ▽ More

    Submitted 12 April, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  4. arXiv:2310.10803  [pdf, other

    cs.CL eess.AS

    SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT

    Authors: Cheol Jun Cho, Abdelrahman Mohamed, Shang-Wen Li, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Data-driven unit discovery in self-supervised learning (SSL) of speech has embarked on a new era of spoken language processing. Yet, the discovered units often remain in phonetic space and the units beyond phonemes are largely underexplored. Here, we demonstrate that a syllabic organization emerges in learning sentence-level representation of speech. In particular, we adopt "self-distillation" obj… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  5. arXiv:2310.10788  [pdf, other

    eess.AS cs.CL

    Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

    Authors: Cheol Jun Cho, Abdelrahman Mohamed, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Self-Supervised Learning (SSL) based models of speech have shown remarkable performance on a range of downstream tasks. These state-of-the-art models have remained blackboxes, but many recent studies have begun "probing" models like HuBERT, to correlate their internal representations to different aspects of speech. In this paper, we show "inference of articulatory kinematics" as fundamental proper… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  6. arXiv:2309.14400  [pdf, other

    cs.CR cs.LG eess.IV

    DECORAIT -- DECentralized Opt-in/out Registry for AI Training

    Authors: Kar Balan, Alex Black, Simon Jenni, Andrew Gilbert, Andy Parsons, John Collomosse

    Abstract: We present DECORAIT; a decentralized registry through which content creators may assert their right to opt in or out of AI training as well as receive reward for their contributions. Generative AI (GenAI) enables images to be synthesized using AI models trained on vast amounts of data scraped from public sources. Model and content creators who may wish to share their work openly without sanctionin… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Proc. of the 20th ACM SIGGRAPH European Conference on Visual Media Production

  7. arXiv:2308.04709  [pdf, other

    cs.CL

    A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology

    Authors: Sean Wu, Michael Koo, Lesley Blum, Andy Black, Liyo Kao, Fabien Scalzo, Ira Kurtz

    Abstract: In recent years, there have been significant breakthroughs in the field of natural language processing, particularly with the development of large language models (LLMs). These LLMs have showcased remarkable capabilities on various benchmarks. In the healthcare field, the exact role LLMs and other future AI models will play remains unclear. There is a potential for these models in the future to be… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 7 pages, 3 figures, 1 table

  8. arXiv:2304.05755  [pdf, other

    cs.CV

    ALADIN-NST: Self-supervised disentangled representation learning of artistic style through Neural Style Transfer

    Authors: Dan Ruta, Gemma Canet Tarres, Alexander Black, Andrew Gilbert, John Collomosse

    Abstract: Representation learning aims to discover individual salient features of a domain in a compact and descriptive form that strongly identifies the unique characteristics of a given sample respective to its domain. Existing works in visual style representation literature have tried to disentangle style from content during training explicitly. A complete separation between these has yet to be fully ach… ▽ More

    Submitted 17 August, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

  9. arXiv:2303.13193  [pdf, other

    cs.CV

    VADER: Video Alignment Differencing and Retrieval

    Authors: Alexander Black, Simon Jenni, Tu Bui, Md. Mehrab Tanjim, Stefano Petrangeli, Ritwik Sinha, Viswanathan Swaminathan, John Collomosse

    Abstract: We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos. VADER matches and coarsely aligns partial video fragments to candidate videos using a robust visual descriptor and scalable search over adaptively chunked video content. A transformer-based alignment module then refines the temporal localization of th… ▽ More

    Submitted 25 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

  10. arXiv:2302.07702  [pdf, other

    cs.CV

    Audio-Visual Contrastive Learning with Temporal Self-Supervision

    Authors: Simon Jenni, Alexander Black, John Collomosse

    Abstract: We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision. In contrast to images that capture the static scene appearance, videos also contain sound and temporal scene dynamics. To leverage the temporal and aural dimension inherent to videos, our method extends temporal self-supervision to the a… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: AAAI-23

  11. arXiv:2302.06774  [pdf, other

    eess.AS cs.SD

    Speaker-Independent Acoustic-to-Articulatory Speech Inversion

    Authors: Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

    Abstract: To build speech processing methods that can handle speech as naturally as humans, researchers have explored multiple ways of building an invertible mapping from speech to an interpretable space. The articulatory space is a promising inversion target, since this space captures the mechanics of speech production. To this end, we build an acoustic-to-articulatory inversion (AAI) model that leverages… ▽ More

    Submitted 24 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  12. arXiv:2210.16498  [pdf, other

    eess.AS cs.SD

    Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization

    Authors: Jiachen Lian, Alan W Black, Yijing Lu, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

    Abstract: Articulatory representation learning is the fundamental research in modeling neural speech production system. Our previous work has established a deep paradigm to decompose the articulatory kinematics data into gestures, which explicitly model the phonological and linguistic structure encoded with human speech production mechanism, and corresponding gestural scores. We continue with this line of w… ▽ More

    Submitted 20 February, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: Accepted to 2023 ICASSP. Camera Ready

  13. arXiv:2210.15734  [pdf, other

    cs.CL cs.SD eess.AS

    Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

    Authors: Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe

    Abstract: End-to-end spoken language understanding (SLU) systems are gaining popularity over cascaded approaches due to their simplicity and ability to avoid error propagation. However, these systems model sequence labeling as a sequence prediction task causing a divergence from its well-established token-level tagging formulation. We build compositional end-to-end SLU systems that explicitly separate the a… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022 Findings. Our code and models will be publicly available as part of the ESPnet-SLU toolkit: https://github.com/espnet/espnet and the release can be followed here: https://github.com/espnet/espnet/pull/4735

  14. arXiv:2210.15272  [pdf, ps, other

    eess.AS cs.SD eess.SP

    A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution

    Authors: Yisi Liu, Peter Wu, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Estimation of fundamental frequency (F0) in voiced segments of speech signals, also known as pitch tracking, plays a crucial role in pitch synchronous speech analysis, speech synthesis, and speech manipulation. In this paper, we capitalize on the high time and frequency resolution of the pseudo Wigner-Ville distribution (PWVD) and propose a new PWVD-based pitch estimation method. We devise an effi… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  15. arXiv:2210.05200  [pdf, other

    cs.CL cs.SD eess.AS

    CTC Alignments Improve Autoregressive Translation

    Authors: Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, Shinji Watanabe

    Abstract: Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment. However for translation, CTC exhibits clear limitations due to the contextual and non-monotonic nature of the task and thus lags behind attentional decoder approaches in terms of translation quality. In this work, we argue that CT… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  16. arXiv:2209.06337  [pdf, other

    eess.AS cs.SD q-bio.QM

    Deep Speech Synthesis from Articulatory Representations

    Authors: Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

    Abstract: In the articulatory synthesis task, speech is synthesized from input features containing information about the physical behavior of the human vocal tract. This task provides a promising direction for speech synthesis research, as the articulatory space is compact, smooth, and interpretable. Current works have highlighted the potential for deep learning models to perform articulatory synthesis. How… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  17. arXiv:2209.02842  [pdf, other

    cs.CL

    ASR2K: Speech Recognition for Around 2000 Languages without Audio

    Authors: Xinjian Li, Florian Metze, David R Mortensen, Alan W Black, Shinji Watanabe

    Abstract: Most recent speech recognition models rely on large supervised datasets, which are unavailable for many low-resource languages. In this work, we present a speech recognition pipeline that does not require any audio for the target language. The only assumption is that we have access to raw text datasets or a set of n-gram statistics. Our speech pipeline consists of three components: acoustic, pronu… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: INTERSPEECH 2022

  18. arXiv:2207.06670  [pdf, other

    cs.CL cs.SD eess.AS

    Two-Pass Low Latency End-to-End Spoken Language Understanding

    Authors: Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe

    Abstract: End-to-end (E2E) models are becoming increasingly popular for spoken language understanding (SLU) systems and are beginning to achieve competitive performance to pipeline-based approaches. However, recent work has shown that these models struggle to generalize to new phrasings for the same intent indicating that models cannot understand the semantic content of the given utterance. In this work, we… ▽ More

    Submitted 29 July, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: INTERSPEECH 2022

  19. arXiv:2207.00688  [pdf, other

    cs.CL cs.SD eess.AS

    Building African Voices

    Authors: Perez Ogayo, Graham Neubig, Alan W Black

    Abstract: Modern speech synthesis techniques can produce natural-sounding speech given sufficient high-quality data and compute resources. However, such data is not readily available for many languages. This paper focuses on speech synthesis for low-resourced African languages, from corpus creation to sharing and deploying the Text-to-Speech (TTS) systems. We first create a set of general-purpose instructio… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

  20. arXiv:2206.14245  [pdf, other

    cs.CV

    SImProv: Scalable Image Provenance Framework for Robust Content Attribution

    Authors: Alexander Black, Tu Bui, Simon Jenni, Zhifei Zhang, Viswanathan Swaminanthan, John Collomosse

    Abstract: We present SImProv - a scalable image provenance framework to match a query image back to a trusted database of originals and identify possible manipulations on the query. SImProv consists of three stages: a scalable search stage for retrieving top-k most similar images; a re-ranking and near-duplicated detection stage for identifying the original among the candidates; and finally a manipulation d… ▽ More

    Submitted 8 May, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: Under consideration at Computer Vision and Image Understanding

  21. arXiv:2205.11686  [pdf, other

    cs.CL cs.CV

    On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization

    Authors: Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasović

    Abstract: Combining the visual modality with pretrained language models has been surprisingly effective for simple descriptive tasks such as image captioning. More general text generation however remains elusive. We take a step back and ask: How do these models work for more complex generative tasks, i.e. conditioning on both text and images? Are multimodal models simply visually adapted language models, or… ▽ More

    Submitted 22 October, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: v2: EMNLP Findings 2022 accepted paper camera-ready version. 9 pages main, 2 pages appendix

  22. arXiv:2204.00465  [pdf, other

    eess.AS cs.AI eess.SP

    Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition

    Authors: Jiachen Lian, Alan W Black, Louis Goldstein, Gopala Krishna Anumanchipalli

    Abstract: Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure. This work, investigating the speech representations derived from articulatory kinematics signals, uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data… ▽ More

    Submitted 20 June, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: Accepted to 2022 Interspeech. Code is publicly available at https://github.com/Berkeley-Speech-Group/ema_gesture

  23. arXiv:2111.14706  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

    Authors: Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

    Abstract: As Automatic Speech Processing (ASR) systems are getting better, there is an increasing interest of using the ASR output to do downstream Natural Language Processing (NLP) tasks. However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks. Hence, there is a need to build an open source standard that can b… ▽ More

    Submitted 3 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Accepted at ICASSP 2022 (5 pages)

  24. arXiv:2111.01326  [pdf, other

    eess.AS cs.CL cs.SD

    Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

    Authors: Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W Black

    Abstract: Speech processing systems currently do not support the vast majority of languages, in part due to the lack of data in low-resource languages. Cross-lingual transfer offers a compelling way to help bridge this digital divide by incorporating high-resource data into low-resource systems. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-re… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  25. arXiv:2111.01231  [pdf, other

    cs.CL

    Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

    Authors: Parul Chopra, Sai Krishna Rallabandi, Alan W Black, Khyathi Raghavi Chandu

    Abstract: Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is th… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted at EMNLP Findings 2021

  26. arXiv:2111.00610  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units

    Authors: Anurag Katakkar, Alan W Black

    Abstract: Language models (LMs) for text data have been studied extensively for their usefulness in language generation and other downstream tasks. However, language modelling purely in the speech domain is still a relatively unexplored topic, with traditional speech LMs often depending on auxiliary text LMs for learning distributional aspects of the language. For the English language, these LMs treat words… ▽ More

    Submitted 31 October, 2021; originally announced November 2021.

  27. arXiv:2110.09264  [pdf, other

    cs.CL cs.SD eess.AS

    Intent Classification Using Pre-trained Language Agnostic Embeddings For Low Resource Languages

    Authors: Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W Black, Rajiv Ratn Shah

    Abstract: Building Spoken Language Understanding (SLU) systems that do not rely on language specific Automatic Speech Recognition (ASR) is an important yet less explored problem in language processing. In this paper, we present a comparative study aimed at employing a pre-trained acoustic model to perform SLU in low resource scenarios. Specifically, we use three different embeddings extracted using Allosaur… ▽ More

    Submitted 18 April, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

  28. arXiv:2110.06263  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speech Summarization using Restricted Self-Attention

    Authors: Roshan Sharma, Shruti Palaskar, Alan W Black, Florian Metze

    Abstract: Speech summarization is typically performed by using a cascade of speech recognition and text summarization models. End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences. Recent work in document summarization has inspired methods to reduce the complexity of self-attentions, which enables transformer models to… ▽ More

    Submitted 24 January, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted at ICASSP 2022

  29. VPN: Video Provenance Network for Robust Content Attribution

    Authors: Alexander Black, Tu Bui, Simon Jenni, Vishy Swaminathan, John Collomosse

    Abstract: We present VPN - a content attribution method for recovering provenance information from videos shared online. Platforms, and users, often transform video into different quality, codecs, sizes, shapes, etc. or slightly edit its content such as adding text or emoji, as they are redistributed online. We learn a robust search embedding for matching such video, invariant to these transformations, usin… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: CVMP2021 camera-ready version

  30. arXiv:2106.15065  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding

    Authors: Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, Alan W Black

    Abstract: Decomposable tasks are complex and comprise of a hierarchy of sub-tasks. Spoken intent prediction, for example, combines automatic speech recognition and natural language understanding. Existing benchmarks, however, typically hold out examples for only the surface-level sub-task. As a result, models with similar performance on these benchmarks may have unobserved performance differences on the oth… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: INTERSPEECH 2021

  31. arXiv:2106.08009  [pdf, other

    cs.CV

    Compositional Sketch Search

    Authors: Alexander Black, Tu Bui, Long Mai, Hailin Jin, John Collomosse

    Abstract: We present an algorithm for searching image collections using free-hand sketches that describe the appearance and relative positions of multiple objects. Sketch based image retrieval (SBIR) methods predominantly match queries containing a single, dominant object invariant to its position within an image. Our work exploits drawings as a concise and intuitive representation for specifying entire sce… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: ICIP 2021 camera-ready version

  32. arXiv:2106.06004  [pdf, other

    cs.CL

    CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Mixing

    Authors: Sai Muralidhar Jayanthi, Kavya Nerella, Khyathi Raghavi Chandu, Alan W Black

    Abstract: The NLP community has witnessed steep progress in a variety of tasks across the realms of monolingual and multilingual language processing recently. These successes, in conjunction with the proliferating mixed language interactions on social media have boosted interest in modeling code-mixed texts. In this work, we present CodemixedNLP, an open-source library with the goals of bringing together th… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted at the Fifth Workshop on Computational Approaches to Linguistic Code-Switching-CALCS 2021

  33. arXiv:2106.02192  [pdf, other

    cs.CL

    Grounding 'Grounding' in NLP

    Authors: Khyathi Raghavi Chandu, Yonatan Bisk, Alan W Black

    Abstract: The NLP community has seen substantial recent interest in grounding to facilitate interaction between language technologies and the world. However, as a community, we use the term broadly to reference any linking of text to data or non-textual modality. In contrast, Cognitive Science more formally defines "grounding" as the process of establishing what mutual information is required for successful… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: 24 pages

  34. arXiv:2106.00920  [pdf, other

    cs.CL cs.AI cs.LG

    DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues

    Authors: Rishabh Joshi, Vidhisha Balachandran, Shikhar Vashishth, Alan Black, Yulia Tsvetkov

    Abstract: To successfully negotiate a deal, it is not enough to communicate fluently: pragmatic planning of persuasive negotiation strategies is essential. While modern dialogue agents excel at generating fluent sentences, they still lack pragmatic grounding and cannot reason strategically. We present DialoGraph, a negotiation system that incorporates pragmatic strategies in a negotiation dialogue using gra… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: Accepted at ICLR 2021; https://openreview.net/forum?id=kDnal_bbb-E

  35. arXiv:2104.12714  [pdf, other

    cs.CL

    Focused Attention Improves Document-Grounded Generation

    Authors: Shrimai Prabhumoye, Kazuma Hashimoto, Yingbo Zhou, Alan W Black, Ruslan Salakhutdinov

    Abstract: Document grounded generation is the task of using the information provided in a document to improve text generation. This work focuses on two different document grounded generation tasks: Wikipedia Update Generation task and Dialogue response generation. Our work introduces two novel adaptations of large scale pre-trained encoder-decoder models focusing on building context driven representation of… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: Accepted at North American Chapter of the Association for Computational Linguistics (NAACL) 2021

  36. arXiv:2104.01287  [pdf, other

    cs.CL

    Intent Recognition and Unsupervised Slot Identification for Low Resourced Spoken Dialog Systems

    Authors: Akshat Gupta, Olivia Deng, Akruti Kushwaha, Saloni Mittal, William Zeng, Sai Krishna Rallabandi, Alan W Black

    Abstract: Intent Recognition and Slot Identification are crucial components in spoken language understanding (SLU) systems. In this paper, we present a novel approach towards both these tasks in the context of low resourced and unwritten languages. We present an acoustic based SLU system that converts speech to its phonetic transcription using a universal phone recognition system. We build a word-free natur… ▽ More

    Submitted 28 September, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  37. arXiv:2103.14797  [pdf, other

    cs.CL cs.LG

    Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

    Authors: Akshat Gupta, Sargam Menghani, Sai Krishna Rallabandi, Alan W Black

    Abstract: Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and… ▽ More

    Submitted 1 October, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

  38. arXiv:2102.12407  [pdf, ps, other

    cs.CL

    Task-Specific Pre-Training and Cross Lingual Transfer for Code-Switched Data

    Authors: Akshat Gupta, Sai Krishna Rallabandi, Alan Black

    Abstract: Using task-specific pre-training and leveraging cross-lingual transfer are two of the most popular ways to handle code-switched data. In this paper, we aim to compare the effects of both for the task of sentiment analysis. We work with two Dravidian Code-Switched languages - Tamil-Engish and Malayalam-English and four different BERT based models. We compare the effects of task-specific pre-trainin… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

  39. arXiv:2102.08345  [pdf, other

    cs.CL

    NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

    Authors: Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze, Eduard Hovy, Alan W Black

    Abstract: When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, typing questions into a search engine, or even translating questions to languages supported by the QA system. While there has been significant community attention devoted to identifying correct answers in passages assuming a perfectly formed q… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: EACL 2021

  40. arXiv:2012.00876  [pdf, other

    cs.CL eess.AS

    Automatically Identifying Language Family from Acoustic Examples in Low Resource Scenarios

    Authors: Peter Wu, Yifan Zhong, Alan W Black

    Abstract: Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches. In this work, we propose a method to analyze language similarity using deep learning. Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language fam… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

  41. arXiv:2011.03646  [pdf, other

    cs.CL cs.AI

    Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

    Authors: Akshat Gupta, Xinjian Li, Sai Krishna Rallabandi, Alan W Black

    Abstract: With recent advancements in language technologies, humans are now speaking to devices. Increasing the reach of spoken language technologies requires building systems in local languages. A major bottleneck here are the underlying data-intensive parts that make up such systems, including automatic speech recognition (ASR) systems that require large amounts of labelled data. With the aim of aiding de… ▽ More

    Submitted 19 February, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

  42. arXiv:2010.16411  [pdf, ps, other

    cs.CL

    Mere account mein kitna balance hai? -- On building voice enabled Banking Services for Multilingual Communities

    Authors: Akshat Gupta, Sai Krishna Rallabandi, Alan W Black

    Abstract: Tremendous progress in speech and language processing has brought language technologies closer to daily human life. Voice technology has the potential to act as a horizontal enabling layer across all aspects of digitization. It is especially beneficial to rural communities in scenarios like a pandemic. In this work we present our initial exploratory work towards one such direction -- building voic… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

  43. arXiv:2010.13944  [pdf, other

    cs.CL

    Reading Between the Lines: Exploring Infilling in Visual Narratives

    Authors: Khyathi Raghavi Chandu, Ruo-Ping Dong, Alan Black

    Abstract: Generating long form narratives such as stories and procedures from multiple modalities has been a long standing dream for artificial intelligence. In this regard, there is often crucial subtext that is derived from the surrounding contexts. The general seq2seq training methods render the models shorthanded while attempting to bridge the gap between these neighbouring contexts. In this paper, we t… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

  44. arXiv:2010.10472  [pdf, other

    cs.CL

    Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages

    Authors: Yiyuan Li, Antonios Anastasopoulos, Alan W Black

    Abstract: Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict and large corpora are usually required to collect enough examples. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, impro… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: 9 pages

  45. arXiv:2010.07279  [pdf, other

    cs.CL

    Positioning yourself in the maze of Neural Text Generation: A Task-Agnostic Survey

    Authors: Khyathi Raghavi Chandu, Alan W Black

    Abstract: Neural text generation metamorphosed into several critical natural language applications ranging from text completion to free form narrative generation. In order to progress research in text generation, it is critical to absorb the existing research works and position ourselves in this massively growing field. Specifically, this paper surveys the fundamental components of modeling approaches relay… ▽ More

    Submitted 25 March, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: 16 pages

  46. arXiv:2010.04658  [pdf, other

    cs.CL

    Case Study: Deontological Ethics in NLP

    Authors: Shrimai Prabhumoye, Brendon Boldt, Ruslan Salakhutdinov, Alan W Black

    Abstract: Recent work in natural language processing (NLP) has focused on ethical challenges such as understanding and mitigating bias in data and algorithms; identifying objectionable content like hate speech, stereotypes and offensive language; and building frameworks for better system design and data handling practices. However, there has been little discussion about the ethical foundations that underlie… ▽ More

    Submitted 12 April, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted at North American Chapter of the Association for Computational Linguistics (NAACL) 2021

  47. arXiv:2008.04820  [pdf, other

    cs.CL cs.IR cs.LG

    LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for Multi-Granular Propaganda Span Identification

    Authors: Sopan Khosla, Rishabh Joshi, Ritam Dutt, Alan W Black, Yulia Tsvetkov

    Abstract: In this paper we describe our submission for the task of Propaganda Span Identification in news articles. We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda. The "multi-granular" model incorporates linguistic knowledge at various levels of text granularity, including word, sentence and docum… ▽ More

    Submitted 20 August, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

  48. arXiv:2007.12948  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    Nonlinear ISA with Auxiliary Variables for Learning Speech Representations

    Authors: Amrith Setlur, Barnabas Poczos, Alan W Black

    Abstract: This paper extends recent work on nonlinear Independent Component Analysis (ICA) by introducing a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables. Observed high dimensional acoustic features like log Mel spectrograms can be considered as surface level manifestations of nonlinear transformations over individual multivariate sources of i… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

    Comments: To be presented at Interspeech 2020

  49. arXiv:2006.05986  [pdf, other

    cs.CL cs.AI cs.LG

    ClarQ: A large-scale and diverse dataset for Clarification Question Generation

    Authors: Vaibhav Kumar, Alan W. black

    Abstract: Question answering and conversational systems are often baffled and need help clarifying certain ambiguities. However, limitations of existing datasets hinder the development of large-scale models capable of generating and utilising clarification questions. In order to overcome these limitations, we devise a novel bootstrapping framework (based on self-supervision) that assists in the creation of… ▽ More

    Submitted 11 June, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Accepted at ACL 2020

  50. arXiv:2005.13962  [pdf, other

    cs.CL

    A Corpus for Large-Scale Phonetic Typology

    Authors: Elizabeth Salesky, Eleanor Chodroff, Tiago Pimentel, Matthew Wiesner, Ryan Cotterell, Alan W Black, Jason Eisner

    Abstract: A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions. We present VoxClamantis v1.0, the first large-scale corpus for phonetic typology, with aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic measures of vowels and sibilants. Access to such data can greatly… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: Accepted to ACL2020