Zum Hauptinhalt springen

Showing 1–34 of 34 results for author: Lane, I

.
  1. arXiv:2207.05071  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Online Continual Learning of End-to-End Speech Recognition Models

    Authors: Muqiao Yang, Ian Lane, Shinji Watanabe

    Abstract: Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available. While prior research on continual learning in automatic speech recognition has focused on the adaptation of models across multiple different speech recognition tasks, in this paper we propose an experimental setting for \textit{online continual learning} for automatic speech recogn… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: Accepted at InterSpeech 2022

  2. arXiv:2207.02971  [pdf, other

    cs.CL cs.SD eess.AS

    Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

    Authors: Yifan Peng, Siddharth Dalmia, Ian Lane, Shinji Watanabe

    Abstract: Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted at ICML 2022

  3. arXiv:2104.12693  [pdf, other

    cs.SD eess.AS

    Identifying Actions for Sound Event Classification

    Authors: Benjamin Elizalde, Radu Revutchi, Samarjit Das, Bhiksha Raj, Ian Lane, Laurie M. Heller

    Abstract: In Psychology, actions are paramount for humans to identify sound events. In Machine Learning (ML), action recognition achieves high accuracy; however, it has not been asked whether identifying actions can benefit Sound Event Classification (SEC), as opposed to mapping the audio directly to a sound event. Therefore, we propose a new Psychology-inspired approach for SEC that includes identification… ▽ More

    Submitted 5 August, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

  4. arXiv:1907.13280  [pdf, other

    cs.CL

    Learning Question-Guided Video Representation for Multi-Turn Video Question Answering

    Authors: Guan-Lin Chao, Abhinav Rastogi, Semih Yavuz, Dilek Hakkani-Tür, Jindong Chen, Ian Lane

    Abstract: Understanding and conversing about dynamic scenes is one of the key capabilities of AI agents that navigate the environment and convey useful information to humans. Video question answering is a specific scenario of such AI-human interaction where an agent generates a natural language response to a question regarding the video of a dynamic scene. Incorporating features from multiple modalities, wh… ▽ More

    Submitted 30 July, 2019; originally announced July 2019.

    Comments: Accepted at SIGDIAL 2019

  5. arXiv:1907.03040  [pdf, other

    cs.CL

    BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer

    Authors: Guan-Lin Chao, Ian Lane

    Abstract: An important yet rarely tackled problem in dialogue state tracking (DST) is scalability for dynamic ontology (e.g., movie, restaurant) and unseen slot values. We focus on a specific condition, where the ontology is unknown to the state tracker, but the target slot value (except for none and dontcare), possibly unseen during training, can be found as word segment in the dialogue context. Prior appr… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

    Comments: Published in Interspeech 2019

  6. arXiv:1906.05962  [pdf, other

    eess.AS cs.CL cs.CV cs.SD

    Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments

    Authors: Guan-Lin Chao, William Chan, Ian Lane

    Abstract: Speech recognition in cocktail-party environments remains a significant challenge for state-of-the-art speech recognition systems, as it is extremely difficult to extract an acoustic signal of an individual speaker from a background of overlapping speech with similar frequency and temporal characteristics. We propose the use of speaker-targeted acoustic and audio-visual models for this task. We co… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

    Comments: Published in INTERSPEECH 2016

  7. arXiv:1904.07326  [pdf, other

    physics.atom-ph physics.chem-ph

    Assignment of excited-state bond lengths using branching-ratio measurements: The B$^2Σ^+$ state of BaH molecules

    Authors: K. Moore, I. C. Lane, R. L. McNally, T. Zelevinsky

    Abstract: Vibrational branching ratios in the B$^2Σ^+$ -- X$^2Σ^+$ and A$^2Π$ -- X$^2Σ^+$ optical-cycling transitions of BaH molecules are investigated using measurements and {\it ab initio} calculations. The experimental values are determined using fluorescence and absorption detection. The observed branching ratios have a very sensitive dependence on the difference in the equilibrium bond length between t… ▽ More

    Submitted 9 August, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

    Journal ref: Phys. Rev. A 100, 022506 (2019)

  8. arXiv:1811.10761   

    cs.CL

    Speaker Diarization With Lexical Information

    Authors: Tae Jin Park, Kyu Han, Ian Lane, Panayiotis Georgiou

    Abstract: This work presents a novel approach to leverage lexical information for speaker diarization. We introduce a speaker diarization system that can directly integrate lexical as well as acoustic information into a speaker clustering process. Thus, we propose an adjacency matrix integration technique to integrate word level speaker turn probabilities with speaker embeddings in a comprehensive way. Our… ▽ More

    Submitted 28 November, 2018; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: This version removed by arXiv administrators because the author did not have the right to agree to our license at the time of submission

  9. arXiv:1810.04038  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding and Improving Recurrent Networks for Human Activity Recognition by Continuous Attention

    Authors: Ming Zeng, Haoxiang Gao, Tong Yu, Ole J. Mengshoel, Helge Langseth, Ian Lane, Xiaobing Liu

    Abstract: Deep neural networks, including recurrent networks, have been successfully applied to human activity recognition. Unfortunately, the final representation learned by recurrent networks might encode some noise (irrelevant signal components, unimportant sensor modalities, etc.). Besides, it is difficult to interpret the recurrent networks to gain insight into the models' behavior. To address these is… ▽ More

    Submitted 7 October, 2018; originally announced October 2018.

    Comments: 8 pages. published in The International Symposium on Wearable Computers (ISWC) 2018

    Journal ref: The International Symposium on Wearable Computers (ISWC) 2018

  10. arXiv:1805.11762  [pdf, other

    cs.CL

    Adversarial Learning of Task-Oriented Neural Dialog Models

    Authors: Bing Liu, Ian Lane

    Abstract: In this work, we propose an adversarial learning method for reward estimation in reinforcement learning (RL) based task-oriented dialog models. Most of the current RL based task-oriented dialog systems require the access to a reward signal from either user feedback or user ratings. Such user ratings, however, may not always be consistent or available in practice. Furthermore, online dialog policy… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

    Comments: To appear at SIGDIAL 2018

  11. arXiv:1803.04849  [pdf, other

    physics.chem-ph physics.atom-ph

    Quantitative theoretical analysis of lifetimes and decay rates relevant in laser cooling BaH

    Authors: Keith Moore, Ian C Lane

    Abstract: Tiny radiative losses below the 0.1% level can prove ruinous to the effective laser cooling of a molecule. In this paper the laser cooling of a hydride is studied with rovibronic detail using ab initio quantum chemistry in order to document the decays to all possible electronic states (not just the vibrational branching within a single electronic transition) and to identify the most populated fina… ▽ More

    Submitted 15 March, 2018; v1 submitted 13 March, 2018; originally announced March 2018.

  12. arXiv:1801.07827  [pdf, other

    cs.LG stat.ML

    Semi-Supervised Convolutional Neural Networks for Human Activity Recognition

    Authors: Ming Zeng, Tong Yu, Xiao Wang, Le T. Nguyen, Ole J. Mengshoel, Ian Lane

    Abstract: Labeled data used for training activity recognition classifiers are usually limited in terms of size and diversity. Thus, the learned model may not generalize well when used in real-world use cases. Semi-supervised learning augments labeled examples with unlabeled examples, often resulting in improved performance. However, the semi-supervised methods studied in the activity recognition literatures… ▽ More

    Submitted 22 January, 2018; originally announced January 2018.

    Comments: Accepted by BigData2017

  13. arXiv:1801.00059  [pdf, other

    cs.CL

    The CAPIO 2017 Conversational Speech Recognition System

    Authors: Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane

    Abstract: In this paper we show how we have achieved the state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation set. We explore densely connected LSTMs, inspired by the densely connected convolutional networks recently introduced for image classification tasks. We also propose an acoustic model adaptation scheme that simply averages the parameters of a seed neural network ac… ▽ More

    Submitted 9 April, 2018; v1 submitted 29 December, 2017; originally announced January 2018.

    Comments: 8 page, 3 figures, 8 tables; extra experimental results added

  14. arXiv:1711.11310  [pdf, other

    cs.CL

    Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding

    Authors: Bing Liu, Ian Lane

    Abstract: The goal of this paper is to learn cross-domain representations for slot filling task in spoken language understanding (SLU). Most of the recently published SLU models are domain-specific ones that work on individual task domains. Annotating data for each individual task domain is both financially costly and non-scalable. In this work, we propose an adversarial training method in learning common f… ▽ More

    Submitted 30 November, 2017; originally announced November 2017.

  15. arXiv:1711.08493  [pdf, other

    cs.CL

    Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

    Authors: Bing Liu, Tong Yu, Ian Lane, Ole J. Mengshoel

    Abstract: Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with… ▽ More

    Submitted 22 November, 2017; originally announced November 2017.

    Comments: Accepted at AAAI 2018

  16. arXiv:1709.06136  [pdf, other

    cs.CL

    Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

    Authors: Bing Liu, Ian Lane

    Abstract: In this paper, we present a deep reinforcement learning (RL) framework for iterative dialog policy optimization in end-to-end task-oriented dialog systems. Popular approaches in learning dialog policy with RL include letting a dialog agent to learn against a user simulator. Building a reliable user simulator, however, is not trivial, often as difficult as building a good dialog agent. We address t… ▽ More

    Submitted 18 September, 2017; originally announced September 2017.

    Comments: Accepted at ASRU 2017

  17. An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog

    Authors: Bing Liu, Ian Lane

    Abstract: We present a novel end-to-end trainable neural network model for task-oriented dialog systems. The model is able to track dialog state, issue API calls to knowledge base (KB), and incorporate structured KB query results into system responses to successfully complete task-oriented dialogs. The proposed model produces well-structured system responses by jointly learning belief tracking and KB result… ▽ More

    Submitted 20 August, 2017; originally announced August 2017.

    Comments: Published at Interspeech 2017

  18. arXiv:1701.04056  [pdf, other

    cs.CL

    Dialog Context Language Modeling with Recurrent Neural Networks

    Authors: Bing Liu, Ian Lane

    Abstract: In this work, we propose contextual language models that incorporate dialog level discourse information into language modeling. Previous works on contextual language model treat preceding utterances as a sequence of inputs, without considering dialog interactions. We design recurrent neural network (RNN) based contextual language models that specially track the interactions between speakers in a d… ▽ More

    Submitted 15 January, 2017; originally announced January 2017.

    Comments: Accepted for publication at ICASSP 2017

  19. arXiv:1609.06026  [pdf, other

    cs.SD cs.LG cs.MM

    An Approach for Self-Training Audio Event Detectors Using Web Data

    Authors: Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

    Abstract: Audio Event Detection (AED) aims to recognize sounds within audio and video recordings. AED employs machine learning algorithms commonly trained and tested on annotated datasets. However, available datasets are limited in number of samples and hence it is difficult to model acoustic diversity. Therefore, we propose combining labeled audio from a dataset and unlabeled audio from the web to improve… ▽ More

    Submitted 27 June, 2017; v1 submitted 20 September, 2016; originally announced September 2016.

    Comments: 5 pages

  20. arXiv:1609.01462  [pdf, other

    cs.CL

    Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks

    Authors: Bing Liu, Ian Lane

    Abstract: Speaker intent detection and semantic slot filling are two critical tasks in spoken language understanding (SLU) for dialogue systems. In this paper, we describe a recurrent neural network (RNN) model that jointly performs intent detection, slot filling, and language modeling. The neural network model keeps updating the intent estimation as word in the transcribed utterance arrives and uses it as… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

    Comments: Accepted at SIGDIAL 2016

  21. arXiv:1609.01454  [pdf, other

    cs.CL

    Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling

    Authors: Bing Liu, Ian Lane

    Abstract: Attention-based encoder-decoder neural network models have recently shown promising results in machine translation and speech recognition. In this work, we propose an attention-based neural network model for joint intent detection and slot filling, both of which are critical steps for many speech understanding and dialog systems. Unlike in machine translation and speech recognition, alignment is e… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

    Comments: Accepted at Interspeech 2016

  22. arXiv:1607.06706  [pdf, other

    cs.SD

    Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording

    Authors: Benjamin Elizalde, Anurag Kumar, Ankit Shah, Rohan Badlani, Emmanuel Vincent, Bhiksha Raj, Ian Lane

    Abstract: In this paper we present our work on Task 1 Acoustic Scene Classi- fication and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the… ▽ More

    Submitted 25 August, 2016; v1 submitted 22 July, 2016; originally announced July 2016.

  23. arXiv:1607.03766  [pdf, other

    cs.SD cs.CL

    AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

    Authors: Sebastian Sager, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, Ian Lane

    Abstract: Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of d… ▽ More

    Submitted 8 January, 2018; v1 submitted 13 July, 2016; originally announced July 2016.

    Comments: This paper is a revised version of "AudioSentibank: Large-scale Semantic Ontology of Acoustic Concepts for Audio Content Analysis"

  24. arXiv:1607.03257  [pdf, other

    cs.MM cs.CV cs.SD

    City-Identification of Flickr Videos Using Semantic Acoustic Features

    Authors: Benjamin Elizalde, Guan-Lin Chao, Ming Zeng, Ian Lane

    Abstract: City-identification of videos aims to determine the likelihood of a video belonging to a set of cities. In this paper, we present an approach using only audio, thus we do not use any additional modality such as images, user-tags or geo-tags. In this manner, we show to what extent the city-location of videos correlates to their acoustic information. Success in this task suggests improvements can be… ▽ More

    Submitted 12 July, 2016; originally announced July 2016.

  25. arXiv:1601.02553  [pdf, other

    cs.CL

    Environmental Noise Embeddings for Robust Speech Recognition

    Authors: Suyoun Kim, Bhiksha Raj, Ian Lane

    Abstract: We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with tr… ▽ More

    Submitted 29 September, 2016; v1 submitted 11 January, 2016; originally announced January 2016.

  26. arXiv:1511.06407  [pdf, other

    cs.LG cs.CL

    Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition

    Authors: Suyoun Kim, Ian Lane

    Abstract: Integration of multiple microphone data is one of the key ways to achieve robust speech recognition in noisy environments or when the speaker is located at some distance from the input device. Signal processing techniques such as beamforming are widely used to extract a speech signal of interest from background noise. These techniques, however, are highly dependent on prior spatial information abo… ▽ More

    Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Under review as a conference paper at ICLR 2016

  27. arXiv:1509.06657  [pdf, other

    physics.atom-ph physics.chem-ph

    Towards a spectroscopically accurate set of potentials for heavy hydride laser cooling candidates: effective core potential calculations of BaH

    Authors: Keith Moore, Brendan M. McLaughlin, Ian C. Lane

    Abstract: BaH (and its isotopomers) is an attractive molecular candidate for laser cooling to ultracold temperatures and a potential precursor for the production of ultracold gases of hydrogen and deuterium. The theoretical challenge is to simulate the laser cooling cycle as reliably as possible and this paper addresses the generation of a highly accurate ab initio $^{2}Σ^+$ potential for such studies. The… ▽ More

    Submitted 28 March, 2016; v1 submitted 22 September, 2015; originally announced September 2015.

    Comments: 14 pages, 9 figures: final accepted version

  28. arXiv:1504.01483  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    Transferring Knowledge from a RNN to a DNN

    Authors: William Chan, Nan Rosemary Ke, Ian Lane

    Abstract: Deep Neural Network (DNN) acoustic models have yielded many state-of-the-art results in Automatic Speech Recognition (ASR) tasks. More recently, Recurrent Neural Network (RNN) models have been shown to outperform DNNs counterparts. However, state-of-the-art DNN and RNN models tend to be impractical to deploy on embedded systems with limited computational capacity. Traditionally, the approach for e… ▽ More

    Submitted 7 April, 2015; originally announced April 2015.

  29. arXiv:1504.01482  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    Deep Recurrent Neural Networks for Acoustic Modelling

    Authors: William Chan, Ian Lane

    Abstract: We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR). We term our contribution as a TC-DNN-BLSTM-DNN model, the model combines a Deep Neural Network (DNN) with Time Convolution (TC), followed by a Bidirectional Long Short-Term Memory (BLSTM), and a final DNN. The first DNN acts as a feature processor to our model, the BLSTM then… ▽ More

    Submitted 7 April, 2015; originally announced April 2015.

  30. Ultracold, radiative charge transfer in hybrid Yb ion - Rb atom traps

    Authors: B. M. McLaughlin, H. D. L. Lamb, I. C. Lane, J. F. McCann

    Abstract: Ultracold hybrid ion-atom traps offer the possibility of microscopic manipulation of quantum coherences in the gas using the ion as a probe. However, inelastic processes, particularly charge transfer can be a significant process of ion loss and has been measured experimentally for the Yb$^{+}$ ion immersed in a Rb vapour. We use first-principles quantum chemistry codes to obtain the potential ener… ▽ More

    Submitted 25 April, 2014; originally announced April 2014.

    Comments: 7 figures, 1 table accepted for publication in J. Phys. B: At. Mol. Opt. Phys. arXiv admin note: text overlap with arXiv:1107.1141

  31. Ultracold hydrogen and deuterium production via Doppler-cooled Feshbach molecules

    Authors: Ian Lane

    Abstract: A counterintuitive scheme to produce ultracold hydrogen via fragmentation of laser cooled diatomic hydrides is presented where the final atomic H temperature is inversely proportional to the mass of the molecular parent. In addition, the critical density for formation of a Bose-Einstein Condensate (BEC) at a fixed temperature is reduced by a factor ratio hydrogen mass: parent mass raised to power… ▽ More

    Submitted 27 November, 2013; originally announced November 2013.

    Comments: 9 pages, 4 figures

    Journal ref: Phys. Rev. A 92, 022511 (2015)

  32. arXiv:1107.1141  [pdf, ps, other

    physics.atom-ph cond-mat.quant-gas quant-ph

    Structure and interactions of ultracold Yb ions and Rb atoms

    Authors: H. D. L. Lamb, J. F. McCann, B. M. McLaughlin, J. Goold, N. Wells, I. Lane

    Abstract: In order to study ultracold charge-transfer processes in hybrid atom-ion traps, we have mapped out the potential energy curves and molecular parameters for several low lying states of the Rb, Yb$^+$ system. We employ both a multi-reference configuration interaction (MRCI) and a full configuration interaction (FCI) approach. Turning points, crossing points, potential minima and spectroscopic molecu… ▽ More

    Submitted 23 July, 2012; v1 submitted 6 July, 2011; originally announced July 2011.

    Comments: 8 pages, 3 figures, 5 tables

  33. Doppler cooling of gallium atoms: 2. Simulation in complex multilevel systems

    Authors: L Rutherford, I C Lane, J F McCann

    Abstract: This paper derives a general procedure for the numerical solution of the Lindblad equations that govern the coherences arising from multicoloured light interacting with a multilevel system. A systematic approach to finding the conservative and dissipative terms is derived and applied to the laser cooling of gallium. An improved numerical method is developed to solve the time-dependent master equat… ▽ More

    Submitted 3 June, 2010; originally announced June 2010.

    Comments: 15 pages, 8 figures

  34. Measurement of the 1s-2s energy interval in muonium

    Authors: V. Meyer, S. N. Bagayev, P. E. G. Baird, P. Bakule, M. G. Boshier, A. Breitrueck, S. L. Cornish, S. Dychkov, G. H. Eaton, A. Grossmann, D. Huebl, V. W. Hughes, K. Jungmann, I. C. Lane, Y. W. Liu, D. Lucas, Y. Matyugin, J. Merkel, G. zu Putlitz, I. Reinhard, P. G. H. Sandars, R. Santra, P. Schmidt, C. A. Scott, W. T. Toner , et al. (4 additional authors not shown)

    Abstract: The 1s-2s interval has been measured in the muonium ({$μ^+e^-$}) atom by Doppler-free two-photon laser spectroscopy. The frequency separation of the states was determined to be 2 455 528 941.0(9.8) MHz in good agreement with quantum electrodynamics. The muon-electron mass ratio can be extracted and is found to be 206.768 38(17). The result may be interpreted as measurement of the muon-electron c… ▽ More

    Submitted 12 July, 1999; originally announced July 1999.

    Comments: 12 Pages, 4 figures

    Report number: UHD-PI-MY-9908

    Journal ref: Phys.Rev.Lett.84:1136,2000