Zum Hauptinhalt springen

Showing 1–32 of 32 results for author: Kirchhoff, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.08317  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

    Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this work, we investigate the potential vulnerabilities of such instruction-following speech-language models to adversarial attacks and jailbreaking. Specifically, we… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 9+6 pages, Submitted to ACL 2024

  2. arXiv:2405.08295  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechVerse: A Large-scale Generalizable Audio Language Model

    Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

    Abstract: Large language models (LLMs) have shown incredible proficiency in performing tasks that require semantic understanding of natural language instructions. Recently, many works have further expanded this capability to perceive multimodal audio and text inputs, but their capabilities are often limited to specific fine-tuned tasks such as automatic speech recognition and translation. We therefore devel… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Single Column, 13 page

  3. arXiv:2404.16233  [pdf, other

    cs.LG cs.AI

    AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

    Authors: Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

    Abstract: AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted at AutoML 2024 Conference

  4. arXiv:2402.06147  [pdf, other

    cs.AI cs.CL

    DeAL: Decoding-time Alignment for Large Language Models

    Authors: James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-an Lai, Arshit Gupta, Nikolaos Pappas, Saab Mansour, Katrin Kirchhoff, Dan Roth

    Abstract: Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom r… ▽ More

    Submitted 20 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: The appendix contains data that is offensive / disturbing in nature

  5. arXiv:2307.00453  [pdf, other

    cs.CL cs.SD eess.AS

    Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters

    Authors: Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks. However, such representations may be skewed toward canonical data characteristics of such corpora and perform poorly on atypical, non-native accented speaker populations. With the state-of-the-art HuBERT model as a baseline, we propose an… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  6. arXiv:2306.08175  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASR

    Authors: Goeric Huybrechts, Srikanth Ronanki, Xilai Li, Hadis Nosrati, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Conformer-based end-to-end models have become ubiquitous these days and are commonly used in both streaming and non-streaming automatic speech recognition (ASR). Techniques like dual-mode and dynamic chunk training helped unify streaming and non-streaming systems. However, there remains a performance gap between streaming with a full and limited past context. To address this issue, we propose the… ▽ More

    Submitted 1 March, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

  7. arXiv:2305.03837  [pdf, other

    eess.AS cs.LG cs.SD

    Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation

    Authors: Nilaksh Das, Monica Sunkara, Sravan Bodapati, Jinglun Cai, Devang Kulshreshtha, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture,… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2023

  8. arXiv:2212.09095  [pdf, other

    cs.CL cs.AI

    Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

    Authors: Hritik Bansal, Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth

    Abstract: Language models have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a large language model to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter language model (OPT-66B) across a diverse… ▽ More

    Submitted 16 August, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: Accepted at Annual Meeting of the Association for Computational Linguistics (ACL) 2023, Main Proceedings

  9. arXiv:2211.13280  [pdf, other

    cs.CL cs.SD eess.AS

    Device Directedness with Contextual Cues for Spoken Dialog Systems

    Authors: Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

    Abstract: In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins. Following the success of pre-trained models, we use low-level speech representations from a self-supervised representation learning model for our downstream classification task. Further, we propose a novel technique to infu… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  10. arXiv:2210.09510  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting

    Authors: Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff

    Abstract: End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previ… ▽ More

    Submitted 13 November, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: To appear in SLT 2022

  11. arXiv:2205.10643  [pdf, other

    cs.CL cs.SD eess.AS

    Self-Supervised Speech Representation Learning: A Review

    Authors: Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

    Abstract: Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a… ▽ More

    Submitted 27 October, 2022; v1 submitted 21 May, 2022; originally announced May 2022.

  12. arXiv:2112.08718  [pdf, other

    cs.CL cs.LG

    Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems

    Authors: Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains creating a need to adapt to new domains with small memory and deployment overhead. In this work, we introduce domain-prompts, a methodology that involves training a small number of domain embedding parameters to prime a Transformer-based Language Model (LM) to a particular do… ▽ More

    Submitted 21 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted at InterSpeech 2022

  13. arXiv:2112.05863  [pdf, other

    eess.AS cs.CL cs.LG cs.SD eess.SP

    Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech

    Authors: Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero

    Abstract: Many of the recent advances in speech separation are primarily aimed at synthetic mixtures of short audio utterances with high degrees of overlap. Most of these approaches need an additional stitching step to stitch the separated speech chunks for long form audio. Since most of the approaches involve Permutation Invariant training (PIT), the order of separated speech chunks is nondeterministic and… ▽ More

    Submitted 6 September, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Accepted for publication at Interspeech 2022

  14. arXiv:2110.06502  [pdf, other

    cs.CL

    Prompt-tuning in ASR systems for efficient domain-adaptation

    Authors: Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains. Since domain-specific systems perform better than their generic counterparts on in-domain evaluation, the need for memory and compute-efficient domain adaptation is obvious. Particularly, adapting parameter-heavy transformer-based language models used for rescoring ASR hypot… ▽ More

    Submitted 22 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: WeCNLP 2021 camera-ready

  15. arXiv:2109.05092  [pdf, other

    eess.AS cs.SD

    Remember the context! ASR slot error correction through memorization

    Authors: Dhanush Bekal, Ashish Shenoy, Monica Sunkara, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Accurate recognition of slot values such as domain specific words or named entities by automatic speech recognition (ASR) systems forms the core of the Goal-oriented Dialogue Systems. Although it is a critical step with direct impact on downstream tasks such as language understanding, many domain agnostic ASR systems tend to perform poorly on domain specific or long tail words. They are often supp… ▽ More

    Submitted 17 September, 2021; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: 8 pages, 3 figures, 4 tables, Accepted to ASRU 2021

  16. arXiv:2106.09532  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

    Authors: Ashish Shenoy, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Automatic Speech Recognition (ASR) robustness toward slot entities are critical in e-commerce voice assistants that involve monetary transactions and purchases. Along with effective domain adaptation, it is intuitive that cross utterance contextual cues play an important role in disambiguating domain specific content words from speech. In this paper, we investigate various techniques to improve co… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted at ACL-IJCNLP 2021 Workshop on e-Commerce and NLP (ECNLP)

  17. Adapting Long Context NLM for ASR Rescoring in Conversational Agents

    Authors: Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff

    Abstract: Neural Language Models (NLM), when trained and evaluated with context spanning multiple utterances, have been shown to consistently outperform both conventional n-gram language models and NLMs that use limited context. In this paper, we investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs. For recurrent based NLMs, we exp… ▽ More

    Submitted 4 June, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: Accepted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2103.10325

  18. arXiv:2103.10325   

    cs.CL

    Contextual Biasing of Language Models for Speech Recognition in Goal-Oriented Conversational Agents

    Authors: Ashish Shenoy, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Goal-oriented conversational interfaces are designed to accomplish specific tasks and typically have interactions that tend to span multiple turns adhering to a pre-defined structure and a goal. However, conventional neural language models (NLM) in Automatic Speech Recognition (ASR) systems are mostly trained sentence-wise with limited context. In this paper, we explore different ways to incorpora… ▽ More

    Submitted 4 June, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: Updated version with extensions are uploaded here arXiv:2104.11070

  19. arXiv:2102.06380  [pdf, ps, other

    cs.CL eess.AS

    Neural Inverse Text Normalization

    Authors: Monica Sunkara, Chaitanya Shivade, Sravan Bodapati, Katrin Kirchhoff

    Abstract: While there have been several contributions exploring state of the art techniques for text normalization, the problem of inverse text normalization (ITN) remains relatively unexplored. The best known approaches leverage finite state transducer (FST) based models which rely on manually curated rules and are hence not scalable. We propose an efficient and robust neural solution for ITN leveraging tr… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: 5 pages, accepted to ICASSP 2021

  20. arXiv:2011.15023  [pdf, other

    cs.CL eess.AS

    Transformer-Transducers for Code-Switched Speech Recognition

    Authors: Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff

    Abstract: We live in a world where 60% of the population can speak two or more languages fluently. Members of these communities constantly switch between languages when having a conversation. As automatic speech recognition (ASR) systems are being deployed to the real-world, there is a need for practical systems that can handle multiple languages both within an utterance or across utterances. In this paper,… ▽ More

    Submitted 14 February, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

    Comments: Accepted at ICASSP 2021

  21. arXiv:2010.14233  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

    Authors: Ethan A. Chi, Julian Salazar, Katrin Kirchhoff

    Abstract: Non-autoregressive models greatly improve decoding speed over typical sequence-to-sequence models, but suffer from degraded performance. Infilling and iterative refinement models make up some of this gap by editing the outputs of a non-autoregressive model, but are constrained in the edits that they can make. We propose iterative realignment, where refinements occur over latent alignments rather t… ▽ More

    Submitted 24 October, 2020; originally announced October 2020.

    ACM Class: I.2.7

  22. arXiv:2008.00702  [pdf, other

    eess.AS cs.CL

    Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

    Authors: Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff

    Abstract: In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data. Conventional approaches in speech processing typically use forced alignment to encoder per frame acoustic features to word level features and perform multimodal fusion of the resulting acoustic and lexical representatio… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted for Interspeech 2020

  23. arXiv:2007.02025  [pdf, other

    cs.CL cs.SD eess.AS

    Robust Prediction of Punctuation and Truecasing for Medical ASR

    Authors: Monica Sunkara, Srikanth Ronanki, Kalpit Dixit, Sravan Bodapati, Katrin Kirchhoff

    Abstract: Automatic speech recognition (ASR) systems in the medical domain that focus on transcribing clinical dictations and doctor-patient conversations often pose many challenges due to the complexity of the domain. ASR output typically undergoes automatic punctuation to enable users to speak naturally, without having to vocalise awkward and explicit punctuation commands, such as "period", "add comma" or… ▽ More

    Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

    Comments: Accepted for ACL NLPMC workshop 2020

  24. arXiv:1912.01679  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition

    Authors: Shaoshi Ling, Yuzong Liu, Julian Salazar, Katrin Kirchhoff

    Abstract: We propose a novel approach to semi-supervised automatic speech recognition (ASR). We first exploit a large amount of unlabeled audio data via representation learning, where we reconstruct a temporal slice of filterbank features from past and future context frames. The resulting deep contextualized acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end ASR system using a s… ▽ More

    Submitted 9 April, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Accepted to ICASSP 2020 (oral)

  25. arXiv:1910.14659  [pdf, other

    cs.CL cs.LG eess.AS stat.ML

    Masked Language Model Scoring

    Authors: Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff

    Abstract: Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one. We show that PLLs outperform scores from autoregressive language models like GPT-2 in a variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an end-to-end LibriSpeec… ▽ More

    Submitted 31 December, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

    Comments: ACL 2020 camera-ready (presented July 2020)

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), 2699-2712

  26. arXiv:1907.00457  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition

    Authors: Shaoshi Ling, Julian Salazar, Yuzong Liu, Katrin Kirchhoff

    Abstract: We introduce BERTphone, a Transformer encoder trained on large speech corpora that outputs phonetically-aware contextual representation vectors that can be used for both speaker and language recognition. This is accomplished by training on two objectives: the first, inspired by adapting BERT to the continuous domain, involves masking spans of input frames and reconstructing the whole sequence for… ▽ More

    Submitted 29 December, 2021; v1 submitted 30 June, 2019; originally announced July 2019.

    Comments: Odyssey 2020 camera-ready (presented Nov. 2020)

    Journal ref: Proc. the Speaker and Language Recognition Workshop (Odyssey 2020), 9-16

  27. arXiv:1903.08268  [pdf, other

    cs.CL cs.LG

    Simple, Fast, Accurate Intent Classification and Slot Labeling for Goal-Oriented Dialogue Systems

    Authors: Arshit Gupta, John Hewitt, Katrin Kirchhoff

    Abstract: With the advent of conversational assistants, like Amazon Alexa, Google Now, etc., dialogue systems are gaining a lot of traction, especially in industrial setting. These systems typically consist of Spoken Language understanding component which, in turn, consists of two tasks - Intent Classification (IC) and Slot Labeling (SL). Generally, these two tasks are modeled together jointly to achieve be… ▽ More

    Submitted 17 July, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

    Comments: SIGDIAL 2019

  28. arXiv:1901.10055  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

    Authors: Julian Salazar, Katrin Kirchhoff, Zhiheng Huang

    Abstract: The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional net… ▽ More

    Submitted 19 February, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

    Comments: Accepted to ICASSP 2019

  29. arXiv:1901.08608  [pdf, other

    cs.SD cs.MM eess.AS

    Multi-stream Network With Temporal Attention For Environmental Sound Classification

    Authors: Xinyu Li, Venkata Chebiyyam, Katrin Kirchhoff

    Abstract: Environmental sound classification systems often do not perform robustly across different sound classification tasks and audio signals of varying temporal structures. We introduce a multi-stream convolutional neural network with temporal attention that addresses these problems. The network relies on three input streams consisting of raw audio and spectral features and utilizes a temporal attention… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

  30. arXiv:1801.08660  [pdf, other

    cs.CL stat.ML

    Context Models for OOV Word Translation in Low-Resource Languages

    Authors: Angli Liu, Katrin Kirchhoff

    Abstract: Out-of-vocabulary word translation is a major problem for the translation of low-resource languages that suffer from a lack of parallel training data. This paper evaluates the contributions of target-language context models towards the translation of OOV words, specifically in those cases where OOV translations are derived from external knowledge sources, such as dictionaries. We develop both neur… ▽ More

    Submitted 25 January, 2018; originally announced January 2018.

    Comments: to be published at AMTA 2018

    MSC Class: 68T50

  31. Syntactic and Semantic Features For Code-Switching Factored Language Models

    Authors: Heike Adel, Ngoc Thang Vu, Katrin Kirchhoff, Dominic Telaar, Tanja Schultz

    Abstract: This paper presents our latest investigations on different features for factored language models for Code-Switching speech and their effect on automatic speech recognition (ASR) performance. We focus on syntactic and semantic features which can be extracted from Code-Switching text data and integrate them into factored language models. Different possible factors, such as words, part-of-speech tags… ▽ More

    Submitted 4 October, 2017; originally announced October 2017.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing (Volume: 23, Issue: 3, March 2015)

  32. arXiv:1509.01938  [pdf, other

    cs.CL

    Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation

    Authors: Katrin Kirchhoff, Bing Zhao, Wen Wang

    Abstract: Statistical machine translation for dialectal Arabic is characterized by a lack of data since data acquisition involves the transcription and translation of spoken language. In this study we develop techniques for extracting parallel data for one particular dialect of Arabic (Iraqi Arabic) from out-of-domain corpora in different dialects of Arabic or in Modern Standard Arabic. We compare two diffe… ▽ More

    Submitted 7 September, 2015; originally announced September 2015.