Search | arXiv e-print repository

Online Continual Learning of End-to-End Speech Recognition Models

Authors: Muqiao Yang, Ian Lane, Shinji Watanabe

Abstract: Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available. While prior research on continual learning in automatic speech recognition has focused on the adaptation of models across multiple different speech recognition tasks, in this paper we propose an experimental setting for \textit{online continual learning} for automatic speech recogn… ▽ More Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available. While prior research on continual learning in automatic speech recognition has focused on the adaptation of models across multiple different speech recognition tasks, in this paper we propose an experimental setting for \textit{online continual learning} for automatic speech recognition of a single task. Specifically focusing on the case where additional training data for the same task becomes available incrementally over time, we demonstrate the effectiveness of performing incremental model updates to end-to-end speech recognition models with an online Gradient Episodic Memory (GEM) method. Moreover, we show that with online continual learning and a selective sampling strategy, we can maintain an accuracy that is similar to retraining a model from scratch while requiring significantly lower computation costs. We have also verified our method with self-supervised learning (SSL) features. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: Accepted at InterSpeech 2022

arXiv:2207.02971 [pdf, other]

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Authors: Yifan Peng, Siddharth Dalmia, Ian Lane, Shinji Watanabe

Abstract: Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing… ▽ More Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing. In each encoder layer, one branch employs self-attention or its variant to capture long-range dependencies, while the other branch utilizes an MLP module with convolutional gating (cgMLP) to extract local relationships. We conduct experiments on several speech recognition and spoken language understanding benchmarks. Results show that our model outperforms both Transformer and cgMLP. It also matches with or outperforms state-of-the-art results achieved by Conformer. Furthermore, we show various strategies to reduce computation thanks to the two-branch architecture, including the ability to have variable inference complexity in a single trained model. The weights learned for merging branches indicate how local and global dependencies are utilized in different layers, which benefits model designing. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: Accepted at ICML 2022

arXiv:2104.12693 [pdf, other]

Identifying Actions for Sound Event Classification

Authors: Benjamin Elizalde, Radu Revutchi, Samarjit Das, Bhiksha Raj, Ian Lane, Laurie M. Heller

Abstract: In Psychology, actions are paramount for humans to identify sound events. In Machine Learning (ML), action recognition achieves high accuracy; however, it has not been asked whether identifying actions can benefit Sound Event Classification (SEC), as opposed to mapping the audio directly to a sound event. Therefore, we propose a new Psychology-inspired approach for SEC that includes identification… ▽ More In Psychology, actions are paramount for humans to identify sound events. In Machine Learning (ML), action recognition achieves high accuracy; however, it has not been asked whether identifying actions can benefit Sound Event Classification (SEC), as opposed to mapping the audio directly to a sound event. Therefore, we propose a new Psychology-inspired approach for SEC that includes identification of actions via human listeners. To achieve this goal, we used crowdsourcing to have listeners identify 20 actions that in isolation or in combination may have produced any of the 50 sound events in the well-studied dataset ESC-50. The resulting annotations for each audio recording relate actions to a database of sound events for the first time. The annotations were used to create semantic representations called Action Vectors (AVs). We evaluated SEC by comparing the AVs with two types of audio features -- log-mel spectrograms and state-of-the-art audio embeddings. Because audio features and AVs capture different abstractions of the acoustic content, we combined them and achieved one of the highest reported accuracies (88%). △ Less

Submitted 5 August, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

arXiv:1907.13280 [pdf, other]

Learning Question-Guided Video Representation for Multi-Turn Video Question Answering

Authors: Guan-Lin Chao, Abhinav Rastogi, Semih Yavuz, Dilek Hakkani-Tür, Jindong Chen, Ian Lane

Abstract: Understanding and conversing about dynamic scenes is one of the key capabilities of AI agents that navigate the environment and convey useful information to humans. Video question answering is a specific scenario of such AI-human interaction where an agent generates a natural language response to a question regarding the video of a dynamic scene. Incorporating features from multiple modalities, wh… ▽ More Understanding and conversing about dynamic scenes is one of the key capabilities of AI agents that navigate the environment and convey useful information to humans. Video question answering is a specific scenario of such AI-human interaction where an agent generates a natural language response to a question regarding the video of a dynamic scene. Incorporating features from multiple modalities, which often provide supplementary information, is one of the challenging aspects of video question answering. Furthermore, a question often concerns only a small segment of the video, hence encoding the entire video sequence using a recurrent neural network is not computationally efficient. Our proposed question-guided video representation module efficiently generates the token-level video summary guided by each word in the question. The learned representations are then fused with the question to generate the answer. Through empirical evaluation on the Audio Visual Scene-aware Dialog (AVSD) dataset, our proposed models in single-turn and multi-turn question answering achieve state-of-the-art performance on several automatic natural language generation evaluation metrics. △ Less

Submitted 30 July, 2019; originally announced July 2019.

Comments: Accepted at SIGDIAL 2019

arXiv:1907.03040 [pdf, other]

BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer

Authors: Guan-Lin Chao, Ian Lane

Abstract: An important yet rarely tackled problem in dialogue state tracking (DST) is scalability for dynamic ontology (e.g., movie, restaurant) and unseen slot values. We focus on a specific condition, where the ontology is unknown to the state tracker, but the target slot value (except for none and dontcare), possibly unseen during training, can be found as word segment in the dialogue context. Prior appr… ▽ More An important yet rarely tackled problem in dialogue state tracking (DST) is scalability for dynamic ontology (e.g., movie, restaurant) and unseen slot values. We focus on a specific condition, where the ontology is unknown to the state tracker, but the target slot value (except for none and dontcare), possibly unseen during training, can be found as word segment in the dialogue context. Prior approaches often rely on candidate generation from n-gram enumeration or slot tagger outputs, which can be inefficient or suffer from error propagation. We propose BERT-DST, an end-to-end dialogue state tracker which directly extracts slot values from the dialogue context. We use BERT as dialogue context encoder whose contextualized language representations are suitable for scalable DST to identify slot values from their semantic context. Furthermore, we employ encoder parameter sharing across all slots with two advantages: (1) Number of parameters does not grow linearly with the ontology. (2) Language representation knowledge can be transferred among slots. Empirical evaluation shows BERT-DST with cross-slot parameter sharing outperforms prior work on the benchmark scalable DST datasets Sim-M and Sim-R, and achieves competitive performance on the standard DSTC2 and WOZ 2.0 datasets. △ Less

Submitted 5 July, 2019; originally announced July 2019.

Comments: Published in Interspeech 2019

arXiv:1906.05962 [pdf, other]

Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments

Authors: Guan-Lin Chao, William Chan, Ian Lane

Abstract: Speech recognition in cocktail-party environments remains a significant challenge for state-of-the-art speech recognition systems, as it is extremely difficult to extract an acoustic signal of an individual speaker from a background of overlapping speech with similar frequency and temporal characteristics. We propose the use of speaker-targeted acoustic and audio-visual models for this task. We co… ▽ More Speech recognition in cocktail-party environments remains a significant challenge for state-of-the-art speech recognition systems, as it is extremely difficult to extract an acoustic signal of an individual speaker from a background of overlapping speech with similar frequency and temporal characteristics. We propose the use of speaker-targeted acoustic and audio-visual models for this task. We complement the acoustic features in a hybrid DNN-HMM model with information of the target speaker's identity as well as visual features from the mouth region of the target speaker. Experimentation was performed using simulated cocktail-party data generated from the GRID audio-visual corpus by overlapping two speakers's speech on a single acoustic channel. Our audio-only baseline achieved a WER of 26.3%. The audio-visual model improved the WER to 4.4%. Introducing speaker identity information had an even more pronounced effect, improving the WER to 3.6%. Combining both approaches, however, did not significantly improve performance further. Our work demonstrates that speaker-targeted models can significantly improve the speech recognition in cocktail party environments. △ Less

Submitted 13 June, 2019; originally announced June 2019.

Comments: Published in INTERSPEECH 2016

arXiv:1904.07326 [pdf, other]

doi 10.1103/PhysRevA.100.022506

Assignment of excited-state bond lengths using branching-ratio measurements: The B$^2Σ^+$ state of BaH molecules

Authors: K. Moore, I. C. Lane, R. L. McNally, T. Zelevinsky

Abstract: Vibrational branching ratios in the B$^2Σ^+$ -- X$^2Σ^+$ and A$^2Π$ -- X$^2Σ^+$ optical-cycling transitions of BaH molecules are investigated using measurements and {\it ab initio} calculations. The experimental values are determined using fluorescence and absorption detection. The observed branching ratios have a very sensitive dependence on the difference in the equilibrium bond length between t… ▽ More Vibrational branching ratios in the B$^2Σ^+$ -- X$^2Σ^+$ and A$^2Π$ -- X$^2Σ^+$ optical-cycling transitions of BaH molecules are investigated using measurements and {\it ab initio} calculations. The experimental values are determined using fluorescence and absorption detection. The observed branching ratios have a very sensitive dependence on the difference in the equilibrium bond length between the excited and ground state, $Δr_e$: a 1 pm (.5\%) displacement can have a 25\% effect on the branching ratios but only a 1\% effect on the lifetime. The measurements are combined with theoretical calculations to reveal a preference for a particular set of published spectroscopic values for the B$^2Σ^+$ state ($Δr_e^{B-X}$ = +5.733 pm), while a larger bond-length difference ($Δr_e^{B-X} = 6.3-6.7$ pm) would match the branching-ratio data even better. By contrast, the observed branching ratio for the A$^2Π_{3/2}$ -- X$^2Σ^+$ transition is in excellent agreement with both the {\it ab initio} result and the spectroscopically measured bond lengths. This shows that care must be taken when estimating branching ratios for molecular laser cooling candidates, as small errors in bond-length measurements can have outsize effects on the suitability for laser cooling. Additionally, our calculations agree more closely with experimental values of the B$^2Σ^+$ state lifetime and spin-rotation constant, and revise the predicted lifetime of the H$^2Δ$ state to 9.5 $μ$s. △ Less

Submitted 9 August, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

Journal ref: Phys. Rev. A 100, 022506 (2019)

arXiv:1811.10761

Speaker Diarization With Lexical Information

Authors: Tae Jin Park, Kyu Han, Ian Lane, Panayiotis Georgiou

Abstract: This work presents a novel approach to leverage lexical information for speaker diarization. We introduce a speaker diarization system that can directly integrate lexical as well as acoustic information into a speaker clustering process. Thus, we propose an adjacency matrix integration technique to integrate word level speaker turn probabilities with speaker embeddings in a comprehensive way. Our… ▽ More This work presents a novel approach to leverage lexical information for speaker diarization. We introduce a speaker diarization system that can directly integrate lexical as well as acoustic information into a speaker clustering process. Thus, we propose an adjacency matrix integration technique to integrate word level speaker turn probabilities with speaker embeddings in a comprehensive way. Our proposed method works without any reference transcript. Words, and word boundary information are provided by an ASR system. We show that our proposed method improves a baseline speaker diarization system solely based on speaker embeddings, achieving a meaningful improvement on the CALLHOME American English Speech dataset. △ Less

Submitted 28 November, 2018; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: This version removed by arXiv administrators because the author did not have the right to agree to our license at the time of submission

arXiv:1810.04038 [pdf, other]

Understanding and Improving Recurrent Networks for Human Activity Recognition by Continuous Attention

Authors: Ming Zeng, Haoxiang Gao, Tong Yu, Ole J. Mengshoel, Helge Langseth, Ian Lane, Xiaobing Liu

Abstract: Deep neural networks, including recurrent networks, have been successfully applied to human activity recognition. Unfortunately, the final representation learned by recurrent networks might encode some noise (irrelevant signal components, unimportant sensor modalities, etc.). Besides, it is difficult to interpret the recurrent networks to gain insight into the models' behavior. To address these is… ▽ More Deep neural networks, including recurrent networks, have been successfully applied to human activity recognition. Unfortunately, the final representation learned by recurrent networks might encode some noise (irrelevant signal components, unimportant sensor modalities, etc.). Besides, it is difficult to interpret the recurrent networks to gain insight into the models' behavior. To address these issues, we propose two attention models for human activity recognition: temporal attention and sensor attention. These two mechanisms adaptively focus on important signals and sensor modalities. To further improve the understandability and mean F1 score, we add continuity constraints, considering that continuous sensor signals are more robust than discrete ones. We evaluate the approaches on three datasets and obtain state-of-the-art results. Furthermore, qualitative analysis shows that the attention learned by the models agree well with human intuition. △ Less

Submitted 7 October, 2018; originally announced October 2018.

Comments: 8 pages. published in The International Symposium on Wearable Computers (ISWC) 2018

Journal ref: The International Symposium on Wearable Computers (ISWC) 2018

arXiv:1805.11762 [pdf, other]

Adversarial Learning of Task-Oriented Neural Dialog Models

Authors: Bing Liu, Ian Lane

Abstract: In this work, we propose an adversarial learning method for reward estimation in reinforcement learning (RL) based task-oriented dialog models. Most of the current RL based task-oriented dialog systems require the access to a reward signal from either user feedback or user ratings. Such user ratings, however, may not always be consistent or available in practice. Furthermore, online dialog policy… ▽ More In this work, we propose an adversarial learning method for reward estimation in reinforcement learning (RL) based task-oriented dialog models. Most of the current RL based task-oriented dialog systems require the access to a reward signal from either user feedback or user ratings. Such user ratings, however, may not always be consistent or available in practice. Furthermore, online dialog policy learning with RL typically requires a large number of queries to users, suffering from sample efficiency problem. To address these challenges, we propose an adversarial learning method to learn dialog rewards directly from dialog samples. Such rewards are further used to optimize the dialog policy with policy gradient based RL. In the evaluation in a restaurant search domain, we show that the proposed adversarial dialog learning method achieves advanced dialog success rate comparing to strong baseline methods. We further discuss the covariate shift problem in online adversarial dialog learning and show how we can address that with partial access to user feedback. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Comments: To appear at SIGDIAL 2018

arXiv:1803.04849 [pdf, other]

doi 10.1016/j.jqsrt.2018.03.003

Quantitative theoretical analysis of lifetimes and decay rates relevant in laser cooling BaH

Authors: Keith Moore, Ian C Lane

Abstract: Tiny radiative losses below the 0.1% level can prove ruinous to the effective laser cooling of a molecule. In this paper the laser cooling of a hydride is studied with rovibronic detail using ab initio quantum chemistry in order to document the decays to all possible electronic states (not just the vibrational branching within a single electronic transition) and to identify the most populated fina… ▽ More Tiny radiative losses below the 0.1% level can prove ruinous to the effective laser cooling of a molecule. In this paper the laser cooling of a hydride is studied with rovibronic detail using ab initio quantum chemistry in order to document the decays to all possible electronic states (not just the vibrational branching within a single electronic transition) and to identify the most populated final quantum states. The effect of spin-orbit and associated couplings on the properties of the lowest excited states of BaH are analysed in detail. The lifetimes of the A$^2Π_{1/2}$, H$^2Δ_{3/2}$ and E$^2Π_{1/2}$ states are calculated (136 ns, 5.8 μs and 46 ns respectively) for the first time, while the theoretical value for B$^2Σ^+_{1/2}$ is in good agreement with experiments. Using a simple rate model the numbers of absorption-emission cycles possible for both one- and two-colour cooling on the competing electronic transitions are determined, and it is clearly demonstrated that the A$^2Π$ - X$^2Σ^+$ transition is superior to B$^2Σ^+$ - X$^2Σ^+$, where multiple tiny decay channels degrade its efficiency. Further possible improvements to the cooling method are proposed. △ Less

Submitted 15 March, 2018; v1 submitted 13 March, 2018; originally announced March 2018.

arXiv:1801.07827 [pdf, other]

Semi-Supervised Convolutional Neural Networks for Human Activity Recognition

Authors: Ming Zeng, Tong Yu, Xiao Wang, Le T. Nguyen, Ole J. Mengshoel, Ian Lane

Abstract: Labeled data used for training activity recognition classifiers are usually limited in terms of size and diversity. Thus, the learned model may not generalize well when used in real-world use cases. Semi-supervised learning augments labeled examples with unlabeled examples, often resulting in improved performance. However, the semi-supervised methods studied in the activity recognition literatures… ▽ More Labeled data used for training activity recognition classifiers are usually limited in terms of size and diversity. Thus, the learned model may not generalize well when used in real-world use cases. Semi-supervised learning augments labeled examples with unlabeled examples, often resulting in improved performance. However, the semi-supervised methods studied in the activity recognition literatures assume that feature engineering is already done. In this paper, we lift this assumption and present two semi-supervised methods based on convolutional neural networks (CNNs) to learn discriminative hidden features. Our semi-supervised CNNs learn from both labeled and unlabeled data while also performing feature learning on raw sensor data. In experiments on three real world datasets, we show that our CNNs outperform supervised methods and traditional semi-supervised learning methods by up to 18% in mean F1-score (Fm). △ Less

Submitted 22 January, 2018; originally announced January 2018.

Comments: Accepted by BigData2017

arXiv:1801.00059 [pdf, other]

The CAPIO 2017 Conversational Speech Recognition System

Authors: Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane

Abstract: In this paper we show how we have achieved the state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation set. We explore densely connected LSTMs, inspired by the densely connected convolutional networks recently introduced for image classification tasks. We also propose an acoustic model adaptation scheme that simply averages the parameters of a seed neural network ac… ▽ More In this paper we show how we have achieved the state-of-the-art performance on the industry-standard NIST 2000 Hub5 English evaluation set. We explore densely connected LSTMs, inspired by the densely connected convolutional networks recently introduced for image classification tasks. We also propose an acoustic model adaptation scheme that simply averages the parameters of a seed neural network acoustic model and its adapted version. This method was applied with the CallHome training corpus and improved individual system performances by on average 6.1% (relative) against the CallHome portion of the evaluation set with no performance loss on the Switchboard portion. With RNN-LM rescoring and lattice combination on the 5 systems trained across three different phone sets, our 2017 speech recognition system has obtained 5.0% and 9.1% on Switchboard and CallHome, respectively, both of which are the best word error rates reported thus far. According to IBM in their latest work to compare human and machine transcriptions, our reported Switchboard word error rate can be considered to surpass the human parity (5.1%) of transcribing conversational telephone speech. △ Less

Submitted 9 April, 2018; v1 submitted 29 December, 2017; originally announced January 2018.

Comments: 8 page, 3 figures, 8 tables; extra experimental results added

arXiv:1711.11310 [pdf, other]

Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding

Authors: Bing Liu, Ian Lane

Abstract: The goal of this paper is to learn cross-domain representations for slot filling task in spoken language understanding (SLU). Most of the recently published SLU models are domain-specific ones that work on individual task domains. Annotating data for each individual task domain is both financially costly and non-scalable. In this work, we propose an adversarial training method in learning common f… ▽ More The goal of this paper is to learn cross-domain representations for slot filling task in spoken language understanding (SLU). Most of the recently published SLU models are domain-specific ones that work on individual task domains. Annotating data for each individual task domain is both financially costly and non-scalable. In this work, we propose an adversarial training method in learning common features and representations that can be shared across multiple domains. Model that produces such shared representations can be combined with models trained on individual domain SLU data to reduce the amount of training samples required for developing a new domain. In our experiments using data sets from multiple domains, we show that adversarial training helps in learning better domain-general SLU models, leading to improved slot filling F1 scores. We further show that applying adversarial learning on domain-general model also helps in achieving higher slot filling performance when the model is jointly optimized with domain-specific models. △ Less

Submitted 30 November, 2017; originally announced November 2017.

arXiv:1711.08493 [pdf, other]

Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models

Authors: Bing Liu, Tong Yu, Ian Lane, Ole J. Mengshoel

Abstract: Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with… ▽ More Dialog response selection is an important step towards natural response generation in conversational agents. Existing work on neural conversational models mainly focuses on offline supervised learning using a large set of context-response pairs. In this paper, we focus on online learning of response selection in retrieval-based dialog systems. We propose a contextual multi-armed bandit model with a nonlinear reward function that uses distributed representation of text for online response selection. A bidirectional LSTM is used to produce the distributed representations of dialog context and responses, which serve as the input to a contextual bandit. In learning the bandit, we propose a customized Thompson sampling method that is applied to a polynomial feature space in approximating the reward. Experimental results on the Ubuntu Dialogue Corpus demonstrate significant performance gains of the proposed method over conventional linear contextual bandits. Moreover, we report encouraging response selection performance of the proposed neural bandit model using the Recall@k metric for a small set of online training samples. △ Less

Submitted 22 November, 2017; originally announced November 2017.

Comments: Accepted at AAAI 2018

arXiv:1709.06136 [pdf, other]

Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

Authors: Bing Liu, Ian Lane

Abstract: In this paper, we present a deep reinforcement learning (RL) framework for iterative dialog policy optimization in end-to-end task-oriented dialog systems. Popular approaches in learning dialog policy with RL include letting a dialog agent to learn against a user simulator. Building a reliable user simulator, however, is not trivial, often as difficult as building a good dialog agent. We address t… ▽ More In this paper, we present a deep reinforcement learning (RL) framework for iterative dialog policy optimization in end-to-end task-oriented dialog systems. Popular approaches in learning dialog policy with RL include letting a dialog agent to learn against a user simulator. Building a reliable user simulator, however, is not trivial, often as difficult as building a good dialog agent. We address this challenge by jointly optimizing the dialog agent and the user simulator with deep RL by simulating dialogs between the two agents. We first bootstrap a basic dialog agent and a basic user simulator by learning directly from dialog corpora with supervised training. We then improve them further by letting the two agents to conduct task-oriented dialogs and iteratively optimizing their policies with deep RL. Both the dialog agent and the user simulator are designed with neural network models that can be trained end-to-end. Our experiment results show that the proposed method leads to promising improvements on task success rate and total task reward comparing to supervised training and single-agent RL training baseline models. △ Less

Submitted 18 September, 2017; originally announced September 2017.

Comments: Accepted at ASRU 2017

arXiv:1708.05956 [pdf, other]

doi 10.21437/Interspeech.2017-1326

An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog

Authors: Bing Liu, Ian Lane

Abstract: We present a novel end-to-end trainable neural network model for task-oriented dialog systems. The model is able to track dialog state, issue API calls to knowledge base (KB), and incorporate structured KB query results into system responses to successfully complete task-oriented dialogs. The proposed model produces well-structured system responses by jointly learning belief tracking and KB result… ▽ More We present a novel end-to-end trainable neural network model for task-oriented dialog systems. The model is able to track dialog state, issue API calls to knowledge base (KB), and incorporate structured KB query results into system responses to successfully complete task-oriented dialogs. The proposed model produces well-structured system responses by jointly learning belief tracking and KB result processing conditioning on the dialog history. We evaluate the model in a restaurant search domain using a dataset that is converted from the second Dialog State Tracking Challenge (DSTC2) corpus. Experiment results show that the proposed model can robustly track dialog state given the dialog history. Moreover, our model demonstrates promising results in producing appropriate system responses, outperforming prior end-to-end trainable neural network models using per-response accuracy evaluation metrics. △ Less

Submitted 20 August, 2017; originally announced August 2017.

Comments: Published at Interspeech 2017

arXiv:1701.04056 [pdf, other]

Dialog Context Language Modeling with Recurrent Neural Networks

Authors: Bing Liu, Ian Lane

Abstract: In this work, we propose contextual language models that incorporate dialog level discourse information into language modeling. Previous works on contextual language model treat preceding utterances as a sequence of inputs, without considering dialog interactions. We design recurrent neural network (RNN) based contextual language models that specially track the interactions between speakers in a d… ▽ More In this work, we propose contextual language models that incorporate dialog level discourse information into language modeling. Previous works on contextual language model treat preceding utterances as a sequence of inputs, without considering dialog interactions. We design recurrent neural network (RNN) based contextual language models that specially track the interactions between speakers in a dialog. Experiment results on Switchboard Dialog Act Corpus show that the proposed model outperforms conventional single turn based RNN language model by 3.3% on perplexity. The proposed models also demonstrate advantageous performance over other competitive contextual language models. △ Less

Submitted 15 January, 2017; originally announced January 2017.

Comments: Accepted for publication at ICASSP 2017

arXiv:1609.06026 [pdf, other]

An Approach for Self-Training Audio Event Detectors Using Web Data

Authors: Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

Abstract: Audio Event Detection (AED) aims to recognize sounds within audio and video recordings. AED employs machine learning algorithms commonly trained and tested on annotated datasets. However, available datasets are limited in number of samples and hence it is difficult to model acoustic diversity. Therefore, we propose combining labeled audio from a dataset and unlabeled audio from the web to improve… ▽ More Audio Event Detection (AED) aims to recognize sounds within audio and video recordings. AED employs machine learning algorithms commonly trained and tested on annotated datasets. However, available datasets are limited in number of samples and hence it is difficult to model acoustic diversity. Therefore, we propose combining labeled audio from a dataset and unlabeled audio from the web to improve the sound models. The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube. Whenever the detectors recognized any of the known sounds with high confidence, the unlabeled audio was use to re-train the detectors. The performance of the re-trained detectors is compared to the one from the original detectors using the annotated test set. Results showed an improvement of the AED, and uncovered challenges of using web audio from videos. △ Less

Submitted 27 June, 2017; v1 submitted 20 September, 2016; originally announced September 2016.

Comments: 5 pages

arXiv:1609.01462 [pdf, other]

Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks

Authors: Bing Liu, Ian Lane

Abstract: Speaker intent detection and semantic slot filling are two critical tasks in spoken language understanding (SLU) for dialogue systems. In this paper, we describe a recurrent neural network (RNN) model that jointly performs intent detection, slot filling, and language modeling. The neural network model keeps updating the intent estimation as word in the transcribed utterance arrives and uses it as… ▽ More Speaker intent detection and semantic slot filling are two critical tasks in spoken language understanding (SLU) for dialogue systems. In this paper, we describe a recurrent neural network (RNN) model that jointly performs intent detection, slot filling, and language modeling. The neural network model keeps updating the intent estimation as word in the transcribed utterance arrives and uses it as contextual features in the joint model. Evaluation of the language model and online SLU model is made on the ATIS benchmarking data set. On language modeling task, our joint model achieves 11.8% relative reduction on perplexity comparing to the independent training language model. On SLU tasks, our joint model outperforms the independent task training model by 22.3% on intent detection error rate, with slight degradation on slot filling F1 score. The joint model also shows advantageous performance in the realistic ASR settings with noisy speech input. △ Less

Submitted 6 September, 2016; originally announced September 2016.

Comments: Accepted at SIGDIAL 2016

arXiv:1609.01454 [pdf, other]

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling

Authors: Bing Liu, Ian Lane

Abstract: Attention-based encoder-decoder neural network models have recently shown promising results in machine translation and speech recognition. In this work, we propose an attention-based neural network model for joint intent detection and slot filling, both of which are critical steps for many speech understanding and dialog systems. Unlike in machine translation and speech recognition, alignment is e… ▽ More Attention-based encoder-decoder neural network models have recently shown promising results in machine translation and speech recognition. In this work, we propose an attention-based neural network model for joint intent detection and slot filling, both of which are critical steps for many speech understanding and dialog systems. Unlike in machine translation and speech recognition, alignment is explicit in slot filling. We explore different strategies in incorporating this alignment information to the encoder-decoder framework. Learning from the attention mechanism in encoder-decoder model, we further propose introducing attention to the alignment-based RNN models. Such attentions provide additional information to the intent classification and slot label prediction. Our independent task models achieve state-of-the-art intent detection error rate and slot filling F1 score on the benchmark ATIS task. Our joint training model further obtains 0.56% absolute (23.8% relative) error reduction on intent detection and 0.23% absolute gain on slot filling over the independent task models. △ Less

Submitted 6 September, 2016; originally announced September 2016.

Comments: Accepted at Interspeech 2016

arXiv:1607.06706 [pdf, other]

Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording

Authors: Benjamin Elizalde, Anurag Kumar, Ankit Shah, Rohan Badlani, Emmanuel Vincent, Bhiksha Raj, Ian Lane

Abstract: In this paper we present our work on Task 1 Acoustic Scene Classi- fication and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the… ▽ More In this paper we present our work on Task 1 Acoustic Scene Classi- fication and Task 3 Sound Event Detection in Real Life Recordings. Among our experiments we have low-level and high-level features, classifier optimization and other heuristics specific to each task. Our performance for both tasks improved the baseline from DCASE: for Task 1 we achieved an overall accuracy of 78.9% compared to the baseline of 72.6% and for Task 3 we achieved a Segment-Based Error Rate of 0.76 compared to the baseline of 0.91. △ Less

Submitted 25 August, 2016; v1 submitted 22 July, 2016; originally announced July 2016.

arXiv:1607.03766 [pdf, other]

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

Authors: Sebastian Sager, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, Ian Lane

Abstract: Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of d… ▽ More Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1,123 pairs and over 33,000 audio files. One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels. A second contribution is to show the degree of correlation between the audio content and the labels through sound recognition experiments, which yielded results of 70% accuracy, hence also providing a performance benchmark. The results and study in this paper encourage further exploration of the nuances in audio and are meant to complement similar research performed on images and text in multimedia analysis. △ Less

Submitted 8 January, 2018; v1 submitted 13 July, 2016; originally announced July 2016.

Comments: This paper is a revised version of "AudioSentibank: Large-scale Semantic Ontology of Acoustic Concepts for Audio Content Analysis"

arXiv:1607.03257 [pdf, other]

City-Identification of Flickr Videos Using Semantic Acoustic Features

Authors: Benjamin Elizalde, Guan-Lin Chao, Ming Zeng, Ian Lane

Abstract: City-identification of videos aims to determine the likelihood of a video belonging to a set of cities. In this paper, we present an approach using only audio, thus we do not use any additional modality such as images, user-tags or geo-tags. In this manner, we show to what extent the city-location of videos correlates to their acoustic information. Success in this task suggests improvements can be… ▽ More City-identification of videos aims to determine the likelihood of a video belonging to a set of cities. In this paper, we present an approach using only audio, thus we do not use any additional modality such as images, user-tags or geo-tags. In this manner, we show to what extent the city-location of videos correlates to their acoustic information. Success in this task suggests improvements can be made to complement the other modalities. In particular, we present a method to compute and use semantic acoustic features to perform city-identification and the features show semantic evidence of the identification. The semantic evidence is given by a taxonomy of urban sounds and expresses the potential presence of these sounds in the city- soundtracks. We used the MediaEval Placing Task set, which contains Flickr videos labeled by city. In addition, we used the UrbanSound8K set containing audio clips labeled by sound- type. Our method improved the state-of-the-art performance and provides a novel semantic approach to this task △ Less

Submitted 12 July, 2016; originally announced July 2016.

arXiv:1601.02553 [pdf, other]

Environmental Noise Embeddings for Robust Speech Recognition

Authors: Suyoun Kim, Bhiksha Raj, Ian Lane

Abstract: We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with tr… ▽ More We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model. A deep neural network is used to predict the acoustic environment in which the system in being used. The discriminative embedding generated at the bottleneck layer of this network is then concatenated with traditional acoustic features as input to a deep neural network acoustic model. Through a series of experiments on Resource Management, CHiME-3 task, and Aurora4, we show that the proposed approach significantly improves speech recognition accuracy in noisy and highly reverberant environments, outperforming multi-condition training, noise-aware training, i-vector framework, and multi-task learning on both in-domain noise and unseen noise. △ Less

Submitted 29 September, 2016; v1 submitted 11 January, 2016; originally announced January 2016.

arXiv:1511.06407 [pdf, other]

Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition

Authors: Suyoun Kim, Ian Lane

Abstract: Integration of multiple microphone data is one of the key ways to achieve robust speech recognition in noisy environments or when the speaker is located at some distance from the input device. Signal processing techniques such as beamforming are widely used to extract a speech signal of interest from background noise. These techniques, however, are highly dependent on prior spatial information abo… ▽ More Integration of multiple microphone data is one of the key ways to achieve robust speech recognition in noisy environments or when the speaker is located at some distance from the input device. Signal processing techniques such as beamforming are widely used to extract a speech signal of interest from background noise. These techniques, however, are highly dependent on prior spatial information about the microphones and the environment in which the system is being used. In this work, we present a neural attention network that directly combines multi-channel audio to generate phonetic states without requiring any prior knowledge of the microphone layout or any explicit signal preprocessing for speech enhancement. We embed an attention mechanism within a Recurrent Neural Network (RNN) based acoustic model to automatically tune its attention to a more reliable input source. Unlike traditional multi-channel preprocessing, our system can be optimized towards the desired output in one step. Although attention-based models have recently achieved impressive results on sequence-to-sequence learning, no attention mechanisms have previously been applied to learn potentially asynchronous and non-stationary multiple inputs. We evaluate our neural attention model on the CHiME-3 challenge task, and show that the model achieves comparable performance to beamforming using a purely data-driven method. △ Less

Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

Comments: Under review as a conference paper at ICLR 2016

arXiv:1509.06657 [pdf, other]

doi 10.1063/1.4945623

Towards a spectroscopically accurate set of potentials for heavy hydride laser cooling candidates: effective core potential calculations of BaH

Authors: Keith Moore, Brendan M. McLaughlin, Ian C. Lane

Abstract: BaH (and its isotopomers) is an attractive molecular candidate for laser cooling to ultracold temperatures and a potential precursor for the production of ultracold gases of hydrogen and deuterium. The theoretical challenge is to simulate the laser cooling cycle as reliably as possible and this paper addresses the generation of a highly accurate ab initio $^{2}Σ^+$ potential for such studies. The… ▽ More BaH (and its isotopomers) is an attractive molecular candidate for laser cooling to ultracold temperatures and a potential precursor for the production of ultracold gases of hydrogen and deuterium. The theoretical challenge is to simulate the laser cooling cycle as reliably as possible and this paper addresses the generation of a highly accurate ab initio $^{2}Σ^+$ potential for such studies. The performance of various basis sets within the multi-reference configuration-interaction (MRCI) approximation with the Davidson correction (MRCI+Q) is tested and taken to the complete basis set limit. It is shown that the calculated molecular constants using a 46 electron Effective Core-Potential (ECP), the augmented polarized core-valence quintuplet basis set (aug-pCV5Z-PP) but only including three active electrons in the MRCI calculation are in close agreement with the available experimental values. The predicted dissociation energy D$_e$ for the X$^2Σ^+$ state (extrapolated to the complete basis set (CBS) limit) is 16895.12 cm$^{-1}$ (2.094 eV), which agrees within 0.1$\%$ of a revised experimental value of $<$16910.6 cm$^{-1}$, while the calculated r$_e$ is within 0.03 pm of the experimental result. △ Less

Submitted 28 March, 2016; v1 submitted 22 September, 2015; originally announced September 2015.

Comments: 14 pages, 9 figures: final accepted version

arXiv:1504.01483 [pdf, other]

Transferring Knowledge from a RNN to a DNN

Authors: William Chan, Nan Rosemary Ke, Ian Lane

Abstract: Deep Neural Network (DNN) acoustic models have yielded many state-of-the-art results in Automatic Speech Recognition (ASR) tasks. More recently, Recurrent Neural Network (RNN) models have been shown to outperform DNNs counterparts. However, state-of-the-art DNN and RNN models tend to be impractical to deploy on embedded systems with limited computational capacity. Traditionally, the approach for e… ▽ More Deep Neural Network (DNN) acoustic models have yielded many state-of-the-art results in Automatic Speech Recognition (ASR) tasks. More recently, Recurrent Neural Network (RNN) models have been shown to outperform DNNs counterparts. However, state-of-the-art DNN and RNN models tend to be impractical to deploy on embedded systems with limited computational capacity. Traditionally, the approach for embedded platforms is to either train a small DNN directly, or to train a small DNN that learns the output distribution of a large DNN. In this paper, we utilize a state-of-the-art RNN to transfer knowledge to small DNN. We use the RNN model to generate soft alignments and minimize the Kullback-Leibler divergence against the small DNN. The small DNN trained on the soft RNN alignments achieved a 3.93 WER on the Wall Street Journal (WSJ) eval92 task compared to a baseline 4.54 WER or more than 13% relative improvement. △ Less

Submitted 7 April, 2015; originally announced April 2015.

arXiv:1504.01482 [pdf, other]

Deep Recurrent Neural Networks for Acoustic Modelling

Authors: William Chan, Ian Lane

Abstract: We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR). We term our contribution as a TC-DNN-BLSTM-DNN model, the model combines a Deep Neural Network (DNN) with Time Convolution (TC), followed by a Bidirectional Long Short-Term Memory (BLSTM), and a final DNN. The first DNN acts as a feature processor to our model, the BLSTM then… ▽ More We present a novel deep Recurrent Neural Network (RNN) model for acoustic modelling in Automatic Speech Recognition (ASR). We term our contribution as a TC-DNN-BLSTM-DNN model, the model combines a Deep Neural Network (DNN) with Time Convolution (TC), followed by a Bidirectional Long Short-Term Memory (BLSTM), and a final DNN. The first DNN acts as a feature processor to our model, the BLSTM then generates a context from the sequence acoustic signal, and the final DNN takes the context and models the posterior probabilities of the acoustic states. We achieve a 3.47 WER on the Wall Street Journal (WSJ) eval92 task or more than 8% relative improvement over the baseline DNN models. △ Less

Submitted 7 April, 2015; originally announced April 2015.

arXiv:1404.6579 [pdf, other]

doi 10.1088/0953-4075/47/14/145201

Ultracold, radiative charge transfer in hybrid Yb ion - Rb atom traps

Authors: B. M. McLaughlin, H. D. L. Lamb, I. C. Lane, J. F. McCann

Abstract: Ultracold hybrid ion-atom traps offer the possibility of microscopic manipulation of quantum coherences in the gas using the ion as a probe. However, inelastic processes, particularly charge transfer can be a significant process of ion loss and has been measured experimentally for the Yb$^{+}$ ion immersed in a Rb vapour. We use first-principles quantum chemistry codes to obtain the potential ener… ▽ More Ultracold hybrid ion-atom traps offer the possibility of microscopic manipulation of quantum coherences in the gas using the ion as a probe. However, inelastic processes, particularly charge transfer can be a significant process of ion loss and has been measured experimentally for the Yb$^{+}$ ion immersed in a Rb vapour. We use first-principles quantum chemistry codes to obtain the potential energy curves and dipole moments for the lowest-lying energy states of this complex. Calculations for the radiative decay processes cross sections and rate coefficients are presented for the total decay processes. Comparing the semi-classical Langevin approximation with the quantum approach, we find it provides a very good estimate of the background at higher energies. The results demonstrate that radiative decay mechanisms are important over the energy and temperature region considered. In fact, the Langevin process of ion-atom collisions dominates cold ion-atom collisions. For spin dependent processes \cite{kohl13} the anisotropic magnetic dipole-dipole interaction and the second-order spin-orbit coupling can play important roles, inducing couplingbetween the spin and the orbital motion. They measured the spin-relaxing collision rate to be approximately 5 orders of magnitude higher than the charge-exchange collision rate \cite{kohl13}. Regarding the measured radiative charge transfer collision rate, we find that our calculation is in very good agreement with experiment and with previous calculations. Nonetheless, we find no broad resonances features that might underly a strong isotope effect. In conclusion, we find, in agreement with previous theory that the isotope anomaly observed in experiment remains an open question. △ Less

Submitted 25 April, 2014; originally announced April 2014.

Comments: 7 figures, 1 table accepted for publication in J. Phys. B: At. Mol. Opt. Phys. arXiv admin note: text overlap with arXiv:1107.1141

arXiv:1311.7081 [pdf]

doi 10.1103/PhysRevA.92.022511

Ultracold hydrogen and deuterium production via Doppler-cooled Feshbach molecules

Authors: Ian Lane

Abstract: A counterintuitive scheme to produce ultracold hydrogen via fragmentation of laser cooled diatomic hydrides is presented where the final atomic H temperature is inversely proportional to the mass of the molecular parent. In addition, the critical density for formation of a Bose-Einstein Condensate (BEC) at a fixed temperature is reduced by a factor ratio hydrogen mass: parent mass raised to power… ▽ More A counterintuitive scheme to produce ultracold hydrogen via fragmentation of laser cooled diatomic hydrides is presented where the final atomic H temperature is inversely proportional to the mass of the molecular parent. In addition, the critical density for formation of a Bose-Einstein Condensate (BEC) at a fixed temperature is reduced by a factor ratio hydrogen mass: parent mass raised to power 3/2 over directly cooled hydrogen atoms. The narrow Feshbach resonances between a singlet S atom and hydrogen are well suited to a tiny center of mass energy release necessary during fragmentation. With the support of ab initio quantum chemistry, it is demonstrated that BaH is an ideal diatomic precursor that can be laser cooled to a Doppler temperature of ~37 microKelvin with just two rovibronic transitions, the simplest molecular cooling scheme identified to date. Preparation of a hydrogen atom gas below the critical BEC temperature Tc is feasible with present cooling technology, with optical pulse control of the condensation process. △ Less

Submitted 27 November, 2013; originally announced November 2013.

Comments: 9 pages, 4 figures

Journal ref: Phys. Rev. A 92, 022511 (2015)

arXiv:1107.1141 [pdf, ps, other]

doi 10.1103/PhysRevA.86.022716

Structure and interactions of ultracold Yb ions and Rb atoms

Authors: H. D. L. Lamb, J. F. McCann, B. M. McLaughlin, J. Goold, N. Wells, I. Lane

Abstract: In order to study ultracold charge-transfer processes in hybrid atom-ion traps, we have mapped out the potential energy curves and molecular parameters for several low lying states of the Rb, Yb$^+$ system. We employ both a multi-reference configuration interaction (MRCI) and a full configuration interaction (FCI) approach. Turning points, crossing points, potential minima and spectroscopic molecu… ▽ More In order to study ultracold charge-transfer processes in hybrid atom-ion traps, we have mapped out the potential energy curves and molecular parameters for several low lying states of the Rb, Yb$^+$ system. We employ both a multi-reference configuration interaction (MRCI) and a full configuration interaction (FCI) approach. Turning points, crossing points, potential minima and spectroscopic molecular constants are obtained for the lowest five molecular states. Long-range parameters, including the dispersion coefficients are estimated from our {\it ab initio} data. The separated-atom ionization potentials and atomic polarizability of the ytterbium atom ($α_d=128.4$ atomic units) are in good agreement with experiment and previous calculations. We present some dynamical calculations for (adiabatic) scattering lengths for the two lowest (Yb,Rb$^+$) channels that were carried out in our work. However, we find that the pseudo potential approximation is rather limited in validity, and only applies to nK temperatures. The adiabatic scattering lengths for both the triplet and singlet channels indicate that both are large and negative in the FCI approximation. △ Less

Submitted 23 July, 2012; v1 submitted 6 July, 2011; originally announced July 2011.

Comments: 8 pages, 3 figures, 5 tables

arXiv:1006.0596 [pdf, other]

doi 10.1088/0953-4075/43/18/185504

Doppler cooling of gallium atoms: 2. Simulation in complex multilevel systems

Authors: L Rutherford, I C Lane, J F McCann

Abstract: This paper derives a general procedure for the numerical solution of the Lindblad equations that govern the coherences arising from multicoloured light interacting with a multilevel system. A systematic approach to finding the conservative and dissipative terms is derived and applied to the laser cooling of gallium. An improved numerical method is developed to solve the time-dependent master equat… ▽ More This paper derives a general procedure for the numerical solution of the Lindblad equations that govern the coherences arising from multicoloured light interacting with a multilevel system. A systematic approach to finding the conservative and dissipative terms is derived and applied to the laser cooling of gallium. An improved numerical method is developed to solve the time-dependent master equation and results are presented for transient cooling processes. The method is significantly more robust, efficient and accurate than the standard method and can be applied to a broad range of atomic and molecular systems. Radiation pressure forces and the formation of dynamic dark-states are studied in the gallium isotope 66Ga. △ Less

Submitted 3 June, 2010; originally announced June 2010.

Comments: 15 pages, 8 figures

arXiv:hep-ex/9907013 [pdf, ps, other]

doi 10.1103/PhysRevLett.84.1136

Measurement of the 1s-2s energy interval in muonium

Authors: V. Meyer, S. N. Bagayev, P. E. G. Baird, P. Bakule, M. G. Boshier, A. Breitrueck, S. L. Cornish, S. Dychkov, G. H. Eaton, A. Grossmann, D. Huebl, V. W. Hughes, K. Jungmann, I. C. Lane, Y. W. Liu, D. Lucas, Y. Matyugin, J. Merkel, G. zu Putlitz, I. Reinhard, P. G. H. Sandars, R. Santra, P. Schmidt, C. A. Scott, W. T. Toner , et al. (4 additional authors not shown)

Abstract: The 1s-2s interval has been measured in the muonium ({$μ^+e^-$}) atom by Doppler-free two-photon laser spectroscopy. The frequency separation of the states was determined to be 2 455 528 941.0(9.8) MHz in good agreement with quantum electrodynamics. The muon-electron mass ratio can be extracted and is found to be 206.768 38(17). The result may be interpreted as measurement of the muon-electron c… ▽ More The 1s-2s interval has been measured in the muonium ({$μ^+e^-$}) atom by Doppler-free two-photon laser spectroscopy. The frequency separation of the states was determined to be 2 455 528 941.0(9.8) MHz in good agreement with quantum electrodynamics. The muon-electron mass ratio can be extracted and is found to be 206.768 38(17). The result may be interpreted as measurement of the muon-electron charge ratio as $-1- 1.1(2.1)\cdot 10^{-9}$. △ Less

Submitted 12 July, 1999; originally announced July 1999.

Comments: 12 Pages, 4 figures

Report number: UHD-PI-MY-9908

Journal ref: Phys.Rev.Lett.84:1136,2000

Showing 1–34 of 34 results for author: Lane, I