Zum Hauptinhalt springen

Showing 1–18 of 18 results for author: King, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.15219  [pdf, other

    cs.CL

    The Power of the Noisy Channel: Unsupervised End-to-End Task-Oriented Dialogue with LLMs

    Authors: Brendan King, Jeffrey Flanigan

    Abstract: Training task-oriented dialogue systems typically requires turn-level annotations for interacting with their APIs: e.g. a dialogue state and the system actions taken at each step. These annotations can be costly to produce, error-prone, and require both domain and annotation expertise. With advances in LLMs, we hypothesize unlabelled data and a schema definition are sufficient for building a worki… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 16 Pages, 7 Figures

  2. arXiv:2307.01453  [pdf, other

    cs.CL

    Diverse Retrieval-Augmented In-Context Learning for Dialogue State Tracking

    Authors: Brendan King, Jeffrey Flanigan

    Abstract: There has been significant interest in zero and few-shot learning for dialogue state tracking (DST) due to the high cost of collecting and annotating task-oriented dialogues. Recent work has demonstrated that in-context learning requires very little data and zero parameter updates, and even outperforms trained methods in the few-shot setting (Hu et al. 2022). We propose RefPyDST, which advances th… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 14 pages, 2 figures, to appear in Findings of the ACL 2023

  3. arXiv:2302.12944  [pdf, other

    cs.CL cs.AI

    Dependency Dialogue Acts -- Annotation Scheme and Case Study

    Authors: Jon Z. Cai, Brendan King, Margaret Perkoff, Shiran Dudy, Jie Cao, Marie Grace, Natalia Wojarnik, Ananya Ganesh, James H. Martin, Martha Palmer, Marilyn Walker, Jeffrey Flanigan

    Abstract: In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse,… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: The 13th International Workshop on Spoken Dialogue Systems Technology

    Journal ref: The 13th International Workshop on Spoken Dialogue Systems Technology 2023

  4. Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

    Authors: Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke

    Abstract: As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts. One approach to achieve fairness in speech recognition is to (1) identify speaker cohorts that suffer from subpar performance and (2) apply fairness mitigation measures targeting the cohorts discovered. In this paper, we report on initial findings with both… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: Proc. Interspeech 2022

    Journal ref: Proc. Interspeech, Sept. 2022, pp. 1268-1272

  5. Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

    Authors: Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas

    Abstract: We present an approach to reduce the performance disparity between geographic regions without degrading performance on the overall user population for ASR. A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER). However, when the ASR model is adapted to get better performance on these high-WER regions, its parameters wander from t… ▽ More

    Submitted 16 July, 2022; originally announced July 2022.

    Comments: Accepted for publication at Interspeech 2022

    Journal ref: Proc. Interspeech, Sept. 2022, pp. 1298-1302

  6. arXiv:2207.02393  [pdf, other

    cs.CL cs.SD eess.AS

    Compute Cost Amortized Transformer for Streaming ASR

    Authors: Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel

    Abstract: We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization. Our architecture creates sparse computation pathways dynamically at inference time, resulting in selective use of compute resources throughout decoding, enabling significant reductions in compute with minimal impact on acc… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  7. arXiv:2112.00350  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Investigation of Training Label Error Impact on RNN-T

    Authors: I-Fan Chen, Brian King, Jasha Droppo

    Abstract: In this paper, we propose an approach to quantitatively analyze impacts of different training label errors to RNN-T based ASR models. The result shows deletion errors are more harmful than substitution and insertion label errors in RNN-T training data. We also examined label error impact mitigation approaches on RNN-T and found that, though all the methods mitigate the label-error-caused degradati… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: 6 pages

  8. arXiv:2110.05543  [pdf, other

    cs.DB cs.DC

    Fallout: Distributed Systems Testing as a Service

    Authors: Guy Bolton King, Sean McCarthy, Pushkala Pattabhiraman, Jake Luciani, Matt Fleming

    Abstract: All modern distributed systems list performance and scalability as their core strengths. Given that optimal performance requires carefully selecting configuration options, and typical cluster sizes can range anywhere from 2 to 300 nodes, it is rare for any two clusters to be exactly the same. Validating the behavior and performance of distributed systems in this large configuration space is challe… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Submitted to 2021 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench'21)

  9. arXiv:2106.07734  [pdf, other

    cs.CL cs.LG eess.AS

    CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

    Authors: Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris

    Abstract: We propose a simple yet effective method to compress an RNN-Transducer (RNN-T) through the well-known knowledge distillation paradigm. We show that the transducer's encoder outputs naturally have a high entropy and contain rich information about acoustically similar word-piece confusions. This rich information is suppressed when combined with the lower entropy decoder outputs to produce the joint… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: Accepted at InterSpeech 2021

  10. arXiv:2106.02750  [pdf, other

    eess.AS cs.AI

    Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

    Authors: Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

    Abstract: Automatic speech recognition (ASR) models are typically designed to operate on a single input data type, e.g. a single or multi-channel audio streamed from a device. This design decision assumes the primary input data source does not change and if an additional (auxiliary) data source is occasionally available, it cannot be used. An ASR model that operates on both primary and auxiliary data can ac… ▽ More

    Submitted 28 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

  11. arXiv:2105.05920  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition

    Authors: Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas

    Abstract: Attention-based beamformers have recently been shown to be effective for multi-channel speech recognition. However, they are less capable at capturing local information. In this work, we propose a 2D Conv-Attention module which combines convolution neural networks with attention for beamforming. We apply self- and cross-attention to explicitly model the correlations within and between the input ch… ▽ More

    Submitted 14 May, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

  12. arXiv:2102.03951  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Multi-Channel Transformer for Speech Recognition

    Authors: Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann

    Abstract: Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms. In this paper, we leverage the neural transformer architectures for multi-channel speech recognition systems, where the spectral and spatial information collected from different microphones are integrated using attention layers. Our multi-channel transformer network mainly consist… ▽ More

    Submitted 7 February, 2021; originally announced February 2021.

    Comments: Accepted by 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

  13. arXiv:2102.01740  [pdf, other

    cs.AI stat.AP

    Reliability Analysis of Artificial Intelligence Systems Using Recurrent Events Data from Autonomous Vehicles

    Authors: Yili Hong, Jie Min, Caleb B. King, William Q. Meeker

    Abstract: Artificial intelligence (AI) systems have become increasingly common and the trend will continue. Examples of AI systems include autonomous vehicles (AV), computer vision, natural language processing, and AI medical experts. To allow for safe and effective deployment of AI systems, the reliability of such systems needs to be assessed. Traditionally, reliability assessment is based on reliability t… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: 30 pages, 9 figures

  14. arXiv:2009.09671  [pdf, other

    cs.DB cs.DC cs.IR

    Towards application-specific query processing systems

    Authors: Dimitrios Vasilas, Marc Shapiro, Bradley King, Sara Hamouda

    Abstract: Database systems use query processing subsystems for enabling efficient query-based data retrieval. An essential aspect of designing any query-intensive application is tuning the query system to fit the application's requirements and workload characteristics. However, the configuration parameters provided by traditional database systems do not cover the design decisions and trade-offs that arise f… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

    Journal ref: 36{è}me Conf{é}rence sur la Gestion de Donn{é}es -- Principes, Technologies et Applications (BDA 2020), Oct 2020, Paris, France

  15. arXiv:2007.00131  [pdf, other

    eess.AS cs.CL cs.SD

    Multi-view Frequency LSTM: An Efficient Frontend for Automatic Speech Recognition

    Authors: Maarten Van Segbroeck, Harish Mallidih, Brian King, I-Fan Chen, Gurpreet Chadha, Roland Maas

    Abstract: Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time. Performance improvements over vanilla LSTM architectures have been reported by prepending a stack of frequency-LSTM (FLSTM) layers to the time LSTM. These FLSTM layers can learn a more robust input feature to the time LSTM layers by modeling time-fre… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

  16. arXiv:1803.04141  [pdf, other

    cs.DC cs.DB cs.IR

    A Modular Design for Geo-Distributed Querying

    Authors: Dimitrios Vasilas, Marc Shapiro, Bradley King

    Abstract: Most distributed storage systems provide limited abilities for querying data by attributes other than their primary keys. Supporting efficient search on secondary attributes is challenging as applications pose varying requirements to query processing systems, and no single system design can be suitable for all needs. In this paper, we show how to overcome these challenges in order to extend distri… ▽ More

    Submitted 12 March, 2018; originally announced March 2018.

    Comments: 5th Workshop on Principles and Practice of Consistency for Distributed Data, Apr 2018, Porto, Portugal. 5th Workshop on Principles and Practice of Consistency for Distributed Data April 23--26, 2018, Porto, Portugal, 2018

  17. arXiv:1712.08348  [pdf, other

    cs.RO cs.HC cs.SE

    Towards Software Development For Social Robotics Systems

    Authors: Chong Sun, Jiongyan Zhang, Cong Liu, Barry Chew Bao King, Yuwei Zhang, Matthew Galle, Maria Spichkova

    Abstract: In this paper we introduce the core results of the project on software development for social robotics systems. The usability of maintenance and control features is crucial for many kinds of systems, but in the case of social robotics we also have to take into account that (1) the humanoid robot physically interacts with humans, (2) the conversation with children might have different requirements… ▽ More

    Submitted 22 December, 2017; originally announced December 2017.

  18. arXiv:1603.08016  [pdf, other

    cs.CL

    Classifying Syntactic Regularities for Hundreds of Languages

    Authors: Reed Coke, Ben King, Dragomir Radev

    Abstract: This paper presents a comparison of classification methods for linguistic typology for the purpose of expanding an extensive, but sparse language resource: the World Atlas of Language Structures (WALS) (Dryer and Haspelmath, 2013). We experimented with a variety of regression and nearest-neighbor methods for use in classification over a set of 325 languages and six syntactic rules drawn from WALS.… ▽ More

    Submitted 27 April, 2016; v1 submitted 25 March, 2016; originally announced March 2016.