Zum Hauptinhalt springen

Showing 1–29 of 29 results for author: Chelba, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.17605  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Coupling Speech Encoders with Downstream Text Models

    Authors: Ciprian Chelba, Johan Schalkwyk

    Abstract: We present a modular approach to building cascade speech translation (AST) models that guarantees that the resulting model performs no worse than the 1-best cascade baseline while preserving state-of-the-art speech recognition (ASR) and text translation (MT) performance for a given task. Our novel contribution is the use of an ``exporter'' layer that is trained under L2-loss to ensure a strong mat… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  2. arXiv:2304.00173  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Lego-Features: Exporting modular encoder features for streaming and deliberation ASR

    Authors: Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays

    Abstract: In end-to-end (E2E) speech recognition models, a representational tight-coupling inevitably emerges between the encoder and the decoder. We build upon recent work that has begun to explore building encoders with modular encoded representations, such that encoders and decoders from different models can be stitched together in a zero-shot manner without further fine-tuning. While previous research o… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  3. arXiv:2211.09070  [pdf, other

    cs.CL

    Towards Computationally Verifiable Semantic Grounding for Language Models

    Authors: Chris Alberti, Kuzman Ganchev, Michael Collins, Sebastian Gehrmann, Ciprian Chelba

    Abstract: The paper presents an approach to semantic grounding of language models (LMs) that conceptualizes the LM as a conditional model generating text given a desired semantic message formalized as a set of entity-relationship triples. It embeds the LM in an auto-encoder by feeding its output to a semantic parser whose output is in the same representation domain as the input message. Compared to a baseli… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  4. arXiv:2109.07740  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Laws for Neural Machine Translation

    Authors: Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry

    Abstract: We present an empirical study of scaling properties of encoder-decoder Transformer models used in neural machine translation (NMT). We show that cross-entropy loss as a function of model size follows a certain scaling law. Specifically (i) We propose a formula which describes the scaling behavior of cross-entropy loss as a bivariate function of encoder and decoder size, and show that it gives accu… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: 31 pages, 23 figures

  5. arXiv:2010.13856  [pdf, ps, other

    cs.CL cs.LG

    Data Troubles in Sentence Level Confidence Estimation for Machine Translation

    Authors: Ciprian Chelba, Junpei Zhou, Yuezhang, Li, Hideto Kazawa, Jeff Klingner, Mengmeng Niu

    Abstract: The paper investigates the feasibility of confidence estimation for neural machine translation models operating at the high end of the performance spectrum. As a side product of the data annotation process necessary for building such models we propose sentence level accuracy $SACC$ as a simple, self-explanatory evaluation metric for quality of translation. Experiments on two different annotator… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

  6. arXiv:2007.09081  [pdf, other

    cs.LG cs.CV stat.ML

    Multi-Stage Influence Function

    Authors: Hongge Chen, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh

    Abstract: Multi-stage training and knowledge transfer, from a large-scale pretraining task to various finetuning tasks, have revolutionized natural language processing and computer vision resulting in state-of-the-art performance improvements. In this paper, we develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data. With this score… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  7. arXiv:2005.03519  [pdf, ps, other

    cs.CL

    Practical Perspectives on Quality Estimation for Machine Translation

    Authors: Junpei Zhou, Ciprian Chelba, Yuezhang, Li

    Abstract: Sentence level quality estimation (QE) for machine translation (MT) attempts to predict the translation edit rate (TER) cost of post-editing work required to correct MT output. We describe our view on sentence-level QE as dictated by several practical setups encountered in the industry. We find consumers of MT output---whether human or algorithmic ones---to be primarily interested in a binary qual… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

  8. arXiv:2001.04589  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Faster Transformer Decoding: N-gram Masked Self-Attention

    Authors: Ciprian Chelba, Mia Chen, Ankur Bapna, Noam Shazeer

    Abstract: Motivated by the fact that most of the information relevant to the prediction of target tokens is drawn from the source sentence $S=s_1, \ldots, s_S$, we propose truncating the target-side window used for computing self-attention by making an $N$-gram assumption. Experiments on WMT EnDe and EnFr data sets show that the $N$-gram masked self-attention model loses very little in BLEU score for $N$ va… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

  9. arXiv:1906.06442  [pdf, other

    cs.CL cs.LG

    Tagged Back-Translation

    Authors: Isaac Caswell, Ciprian Chelba, David Grangier

    Abstract: Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversify the source side, as previously suggested, but simply to indicate to the model that the given source is synthetic. We propose a simpler alternative t… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: Accepted as oral presentation in WMT 2019; 9 pages; 9 tables; 1 figure

  10. arXiv:1906.01130  [pdf, other

    cs.CL cs.LG

    Dynamically Composing Domain-Data Selection with Clean-Data Selection by "Co-Curricular Learning" for Neural Machine Translation

    Authors: Wei Wang, Isaac Caswell, Ciprian Chelba

    Abstract: Noise and domain are important aspects of data quality for neural machine translation. Existing research focus separately on domain-data selection, clean-data selection, or their static combination, leaving the dynamic interaction across them not explicitly examined. This paper introduces a "co-curricular learning" method to compose dynamic domain-data selection with dynamic clean-data selection,… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: 11 pages

    Journal ref: The 57th Annual Meeting of the Association for Computational Linguistics (ACL2019)

  11. arXiv:1902.08295  [pdf, other

    cs.LG stat.ML

    Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

    Authors: Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob , et al. (66 additional authors not shown)

    Abstract: Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly w… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

  12. arXiv:1809.00068  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

    Authors: Wei Wang, Taro Watanabe, Macduff Hughes, Tetsuji Nakagawa, Ciprian Chelba

    Abstract: Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet. Denoising is concerned with a different type of data quality and tries to reduce the negative impact of data noise on MT training, in particular, neural MT (NMT) training. This paper generalizes methods for measuring and selecting data… ▽ More

    Submitted 31 August, 2018; originally announced September 2018.

    Comments: 11 pages, 2018 Third Conference on Machine Translation (WMT18)

  13. arXiv:1806.06950  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

    Authors: Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, Cho-jui Hsieh

    Abstract: Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses. As a case study, a state-of-the-art neural language model usually consists of one or more recurrent layers sandwiched between an embedding layer used for representing input tokens and a softmax layer for generating output tokens. For problems with a… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

  14. arXiv:1703.10724  [pdf, ps, other

    cs.CL

    N-gram Language Modeling using Recurrent Neural Network Estimation

    Authors: Ciprian Chelba, Mohammad Norouzi, Samy Bengio

    Abstract: We investigate the effective memory depth of RNN models by using them for $n$-gram language model (LM) smoothing. Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell with dropout to be the best model for encoding the $n$-gram state when compared with feed-forward and vanilla RNN models. When preserving the sentence indepe… ▽ More

    Submitted 19 June, 2017; v1 submitted 30 March, 2017; originally announced March 2017.

    Comments: 10 pages, including references

  15. arXiv:1511.01574  [pdf, ps, other

    cs.CL

    Multinomial Loss on Held-out Data for the Sparse Non-negative Matrix Language Model

    Authors: Ciprian Chelba, Fernando Pereira

    Abstract: We describe Sparse Non-negative Matrix (SNM) language model estimation using multinomial loss on held-out data. Being able to train on held-out data is important in practical situations where the training data is usually mismatched from the held-out/test data. It is also less constrained than the previous training algorithm using leave-one-out on training data: it allows the use of richer meta-f… ▽ More

    Submitted 22 February, 2016; v1 submitted 4 November, 2015; originally announced November 2015.

  16. arXiv:1412.1454  [pdf, ps, other

    cs.LG cs.CL

    Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation

    Authors: Noam Shazeer, Joris Pelemans, Ciprian Chelba

    Abstract: We present a novel family of language model (LM) estimation techniques named Sparse Non-negative Matrix (SNM) estimation. A first set of experiments empirically evaluating it on the One Billion Word Benchmark shows that SNM $n$-gram LMs perform almost as well as the well-established Kneser-Ney (KN) models. When using skip-gram features the models are able to match the state-of-the-art recurrent ne… ▽ More

    Submitted 26 June, 2015; v1 submitted 3 December, 2014; originally announced December 2014.

    Report number: Google Research Publication Id: 43222

  17. arXiv:1312.3005  [pdf, ps, other

    cs.CL

    One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

    Authors: Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson

    Abstract: We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the… ▽ More

    Submitted 4 March, 2014; v1 submitted 10 December, 2013; originally announced December 2013.

    Comments: Accompanied by a code.google.com project allowing anyone to generate the benchmark data, and use it to compare their language model against the ones described in the paper

  18. Large Scale Distributed Acoustic Modeling With Back-off N-grams

    Authors: Ciprian Chelba, Peng Xu, Fernando Pereira, Thomas Richardson

    Abstract: The paper revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition. In such a data-rich setting, we can expand the phonetic context significantly b… ▽ More

    Submitted 5 February, 2013; originally announced February 2013.

    MSC Class: 68T10 ACM Class: I.2.7

  19. arXiv:1210.8440  [pdf, other

    cs.CL

    Large Scale Language Modeling in Automatic Speech Recognition

    Authors: Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar

    Abstract: Large language models have been proven quite beneficial for a variety of automatic speech recognition tasks in Google. We summarize results on Voice Search and a few YouTube speech transcription tasks to highlight the impact that one can expect from increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amou… ▽ More

    Submitted 31 October, 2012; originally announced October 2012.

  20. arXiv:1210.8436  [pdf, ps, other

    cs.CL cs.IR

    Optimal size, freshness and time-frame for voice search vocabulary

    Authors: Maryam Kamvar, Ciprian Chelba

    Abstract: In this paper, we investigate how to optimize the vocabulary for a voice search language model. The metric we optimize over is the out-of-vocabulary (OoV) rate since it is a strong indicator of user experience. In a departure from the usual way of measuring OoV rates, web search logs allow us to compute the per-session OoV rate and thus estimate the percentage of users that experience a given OoV… ▽ More

    Submitted 31 October, 2012; originally announced October 2012.

  21. arXiv:cs/0110015  [pdf, ps, other

    cs.CL

    Richer Syntactic Dependencies for Structured Language Modeling

    Authors: Ciprian Chelba, Peng Xu

    Abstract: The paper investigates the use of richer syntactic dependencies in the structured language model (SLM). We present two simple methods of enriching the dependencies in the syntactic parse trees used for intializing the SLM. We evaluate the impact of both methods on the perplexity (PPL) and word-error-rate(WER, N-best rescoring) performance of the SLM. We show that the new model achieves an improv… ▽ More

    Submitted 3 October, 2001; originally announced October 2001.

    Comments: Proceedings of ASRU 2001, 4 pages

    ACM Class: I.2.7; G.3

  22. arXiv:cs/0108023  [pdf, ps, other

    cs.CL cs.IR

    Information Extraction Using the Structured Language Model

    Authors: Ciprian Chelba, Milind Mahajan

    Abstract: The paper presents a data-driven approach to information extraction (viewed as template filling) using the structured language model (SLM) as a statistical parser. The task of template filling is cast as constrained parsing using the SLM. The model is automatically trained from a set of sentences annotated with frame/slot labels and spans. Training proceeds in stages: first a constrained syntact… ▽ More

    Submitted 29 August, 2001; originally announced August 2001.

    Comments: EMNLP'01, Pittsburgh; 8 pages

    ACM Class: I.2.7

    Journal ref: EMNLP/NAACL 2001 Conference Proceedings

  23. arXiv:cs/0108022  [pdf, ps, other

    cs.CL

    Portability of Syntactic Structure for Language Modeling

    Authors: Ciprian Chelba

    Abstract: The paper presents a study on the portability of statistical syntactic knowledge in the framework of the structured language model (SLM). We investigate the impact of porting SLM statistics from the Wall Street Journal (WSJ) to the Air Travel Information System (ATIS) domain. We compare this approach to applying the Microsoft rule-based parser (NLPwin) for the ATIS data and to using a small amou… ▽ More

    Submitted 28 August, 2001; originally announced August 2001.

    Comments: ICASSP 2001, Salt Lake City; 4 pages

    ACM Class: I.2.7

    Journal ref: ICASSP 2001 Proceedings

  24. arXiv:cs/0001023  [pdf, ps, other

    cs.CL

    Structured Language Modeling for Speech Recognition

    Authors: Ciprian Chelba, Frederick Jelinek

    Abstract: A new language model for speech recognition is presented. The model develops hidden hierarchical syntactic-like structure incrementally and uses it to extract meaningful information from the word history, thus complementing the locality of currently used trigram models. The structured language model (SLM) and its performance in a two-pass speech recognizer --- lattice decoding --- are presented.… ▽ More

    Submitted 25 January, 2000; originally announced January 2000.

    Comments: 4 pages + 2 pages of ERRATA

    ACM Class: G.3, I.2.7, I.5.1, I.5.4

    Journal ref: Proceedings of NLDB'99, Klagenfurt, Austria

  25. arXiv:cs/0001022  [pdf, ps, other

    cs.CL

    Recognition Performance of a Structured Language Model

    Authors: Ciprian Chelba, Frederick Jelinek

    Abstract: A new language model for speech recognition inspired by linguistic analysis is presented. The model develops hidden hierarchical structure incrementally and uses it to extract meaningful information from the word history - thus enabling the use of extended distance dependencies - in an attempt to complement the locality of currently used trigram models. The structured language model, its probabi… ▽ More

    Submitted 24 January, 2000; originally announced January 2000.

    Comments: 4 pages

    ACM Class: G.3, I.2.7, I.5.1, I.5.4

    Journal ref: Proceedings of Eurospeech, 1999, pp. 1567-1570, Budapest, Hungary

  26. arXiv:cs/0001021  [pdf, ps, other

    cs.CL

    Refinement of a Structured Language Model

    Authors: Ciprian Chelba, Frederick Jelinek

    Abstract: A new language model for speech recognition inspired by linguistic analysis is presented. The model develops hidden hierarchical structure incrementally and uses it to extract meaningful information from the word history - thus enabling the use of extended distance dependencies - in an attempt to complement the locality of currently used n-gram Markov models. The model, its probabilistic paramet… ▽ More

    Submitted 24 January, 2000; originally announced January 2000.

    Comments: 10 pages

    ACM Class: G.3, I.2.7, I.5.1, I.5.4

    Journal ref: Proceedings of the International Conference on Advances in Pattern Recognition, 1998, pp. 275-284, Plymouth, UK

  27. arXiv:cs/0001020  [pdf, ps, other

    cs.CL

    Exploiting Syntactic Structure for Natural Language Modeling

    Authors: Ciprian Chelba

    Abstract: The thesis presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shift-reduce parser. A maximum likelihood reestimation procedure belonging to the class of expectation-maximization… ▽ More

    Submitted 24 January, 2000; originally announced January 2000.

    Comments: Advisor: Frederick Jelinek, Ph.D. Thesis, 122 pages; removed unused .eps file

    ACM Class: G.3, I.2.7, I.5.1, I.5.4

  28. arXiv:cs/9811025  [pdf, ps, other

    cs.CL

    A Structured Language Model

    Authors: Ciprian Chelba

    Abstract: The paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. The model assigns probability to every joint sequence of words - binary-parse-structure with headword annotation. The model, its probabilistic parametrization, and a set of experiments meant to evaluate its… ▽ More

    Submitted 25 January, 2000; v1 submitted 13 November, 1998; originally announced November 1998.

    Comments: changed ACM-class membership, Proceedings of ACL-EACL'97, Student Section, Madrid, Spain

    ACM Class: G.3, I.2.7, I.5.1, I.5.4

  29. arXiv:cs/9811022  [pdf, ps, other

    cs.CL

    Expoiting Syntactic Structure for Language Modeling

    Authors: Ciprian Chelba, Frederick Jelinek

    Abstract: The paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. The model assigns probability to every joint sequence of words--binary-parse-structure with headword annotation and operates in a left-to-right manner --- therefore usable for automatic speech recognition.… ▽ More

    Submitted 25 January, 2000; v1 submitted 12 November, 1998; originally announced November 1998.

    Comments: changed ACM-class membership and buggy author names

    ACM Class: G.3, I.2.7, I.5.1, I.5.4

    Journal ref: Proceedings of ACL'98, Montreal, Canada