Search | arXiv e-print repository

Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech

Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Noeth, Bjoern Heismann, Andreas Maier, Seung Hee Yang

Abstract: This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task le… ▽ More This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task learning setup, with the end-to-end goal of taking raw speech as input and estimating the corresponding articulatory movements, phoneme sequence, and phoneme alignment. While both proposed approaches share these same requirements, they differ in their way of achieving phoneme-related predictions: one is based on frame classification, the other on a two-staged training procedure and forced alignment. We reach competitive performance of 0.73 mean correlation for the AAI task and achieve up to approximately 87% frame overlap compared to a state-of-the-art text-dependent phoneme force aligner. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: to be published in Interspeech 2024 proceedings

arXiv:2404.08064 [pdf]

The Impact of Speech Anonymization on Pathology and Its Limits

Authors: Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

Abstract: Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where priva… ▽ More Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined. This study investigates anonymization's impact on pathological speech across over 2,700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods, and document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experienced minimal utility changes, while Dysglossia showed slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis revealed consistent anonymization effects across most of the demographics. This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks. △ Less

Submitted 22 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2312.14571 [pdf, other]

Data is Moody: Discovering Data Modification Rules from Process Event Logs

Authors: Marco Bjarne Schuster, Boris Wiegand, Jilles Vreeken

Abstract: Although event logs are a powerful source to gain insight about the behavior of the underlying business process, existing work primarily focuses on finding patterns in the activity sequences of an event log, while ignoring event attribute data. Event attribute data has mostly been used to predict event occurrences and process outcome, but the state of the art neglects to mine succinct and interpre… ▽ More Although event logs are a powerful source to gain insight about the behavior of the underlying business process, existing work primarily focuses on finding patterns in the activity sequences of an event log, while ignoring event attribute data. Event attribute data has mostly been used to predict event occurrences and process outcome, but the state of the art neglects to mine succinct and interpretable rules how event attribute data changes during process execution. Subgroup discovery and rule-based classification approaches lack the ability to capture the sequential dependencies present in event logs, and thus lead to unsatisfactory results with limited insight into the process behavior. Given an event log, we are interested in finding accurate yet succinct and interpretable if-then rules how the process modifies data. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we choose the model with the best lossless description of the data. Additionally, we propose the greedy Moody algorithm to efficiently search for rules. By extensive experiments on both synthetic and real-world data, we show Moody indeed finds compact and interpretable rules, needs little data for accurate discovery, and is robust to noise. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2204.06450 [pdf, other]

doi 10.1038/s41598-023-47711-7

The effect of speech pathology on automatic speaker verification -- a large-scale study

Authors: Soroosh Tayebi Arasteh, Tobias Weise, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

Abstract: Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3,800 test subjects… ▽ More Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3,800 test subjects spanning various age groups and speech disorders, we employed a deep-learning-driven automatic speaker verification (ASV) approach. This resulted in a notable mean equal error rate (EER) of 0.89% with a standard deviation of 0.06%, outstripping traditional benchmarks. Our comprehensive assessments demonstrate that pathological speech overall faces heightened privacy breach risks compared to healthy speech. Specifically, adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers. Crucially, speech intelligibility does not influence the ASV system's performance metrics. In pediatric cases, particularly those with cleft lip and palate, the recording environment plays a decisive role in re-identification. Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in ASV, accompanied by a logarithmic boost in ASV effectiveness. In essence, this research sheds light on the dynamics between pathological speech and speaker verification, emphasizing its crucial role in safeguarding patient confidentiality in our increasingly digitized healthcare era. △ Less

Submitted 22 November, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

Comments: Published in Scientific Reports

Journal ref: Sci Rep 13, 20476 (2023)

arXiv:2204.04016 [pdf, other]

Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Andreas Maier, Elmar Noeth, Bjoern Heismann, Maria Schuster, Seung Hee Yang

Abstract: Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech represen… ▽ More Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech representations of a parallel utterance pair, obtained from a healthy reference and a pathological speaker. Experiments on an English database of Cerebral Palsy patients, using all available utterances per speaker, show high and significant correlation values (R = -0.9) with subjective intelligibility measures, while having only minimal deviation (+-0.01) across four different reference speaker pairs. We also demonstrate the robustness of the proposed method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a significantly smaller amount of utterances per speaker. Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment, resulting in a reference speaker pair invariant method, applicable in scenarios with only few utterances available. △ Less

Submitted 27 June, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

Comments: Submitted and Accepted at INTERSPEECH2022

arXiv:2109.06596 [pdf, other]

doi 10.55417/fr.2022053

GPGM-SLAM: a Robust SLAM System for Unstructured Planetary Environments with Gaussian Process Gradient Maps

Authors: Riccardo Giubilato, Cedric Le Gentil, Mallikarjuna Vayugundla, Martin J. Schuster, Teresa Vidal-Calleja, Rudolph Triebel

Abstract: Simultaneous Localization and Mapping (SLAM) techniques play a key role towards long-term autonomy of mobile robots due to the ability to correct localization errors and produce consistent maps of an environment over time. Contrarily to urban or man-made environments, where the presence of unique objects and structures offer unique cues for localization, the appearance of unstructured natural envi… ▽ More Simultaneous Localization and Mapping (SLAM) techniques play a key role towards long-term autonomy of mobile robots due to the ability to correct localization errors and produce consistent maps of an environment over time. Contrarily to urban or man-made environments, where the presence of unique objects and structures offer unique cues for localization, the appearance of unstructured natural environments is often ambiguous and self-similar, hindering the performances of loop closure detection. In this paper, we present an approach to improve the robustness of place recognition in the context of a submap-based stereo SLAM based on Gaussian Process Gradient Maps (GPGMaps). GPGMaps embed a continuous representation of the gradients of the local terrain elevation by means of Gaussian Process regression and Structured Kernel Interpolation, given solely noisy elevation measurements. We leverage the image-like structure of GPGMaps to detect loop closures using traditional visual features and Bag of Words. GPGMap matching is performed as an SE(2) alignment to establish loop closure constraints within a pose graph. We evaluate the proposed pipeline on a variety of datasets recorded on Mt. Etna, Sicily and in the Morocco desert, respectively Moon- and Mars-like environments, and we compare the localization performances with state-of-the-art approaches for visual SLAM and visual loop closure detection. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: Submission to Field Robotics (www.journalfieldrobotics.org), under review

Journal ref: Field Robotics, Vol. 2, 2022

arXiv:2105.02020 [pdf, other]

Multi-Modal Loop Closing in Unstructured Planetary Environments with Visually Enriched Submaps

Authors: Riccardo Giubilato, Mallikarjuna Vayugundla, Wolfgang Stürzl, Martin J. Schuster, Armin Wedler, Rudolph Triebel

Abstract: Future planetary missions will rely on rovers that can autonomously explore and navigate in unstructured environments. An essential element is the ability to recognize places that were already visited or mapped. In this work, we leverage the ability of stereo cameras to provide both visual and depth information, guiding the search and validation of loop closures from a multi-modal perspective. We… ▽ More Future planetary missions will rely on rovers that can autonomously explore and navigate in unstructured environments. An essential element is the ability to recognize places that were already visited or mapped. In this work, we leverage the ability of stereo cameras to provide both visual and depth information, guiding the search and validation of loop closures from a multi-modal perspective. We propose to augment submaps that are created by aggregating stereo point clouds, with visual keyframes. Point clouds matches are found by comparing CSHOT descriptors and validated by clustering, while visual matches are established by comparing keyframes using Bag-of-Words (BoW) and ORB descriptors. The relative transformations resulting from both keyframe and point cloud matches are then fused to provide pose constraints between submaps in our graph-based SLAM framework. Using the LRU rover, we performed several tests in both an indoor laboratory environment as well as a challenging planetary analog environment on Mount Etna, Italy. These environments consist of areas where either keyframes or point clouds alone failed to provide adequate matches demonstrating the benefit of the proposed multi-modal approach. △ Less

Submitted 14 September, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

Comments: Accepted at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

arXiv:2008.06341 [pdf, other]

Probabilistic Cellular Automata for Granular Media in Video Games

Authors: Jonathan Devlin, Micah D. Schuster

Abstract: Granular materials are very common in the everyday world. Media such as sand, soil, gravel, food stuffs, pharmaceuticals, etc. all have similar irregular flow since they are composed of numerous small solid particles. In video games, simulating these materials increases immersion and can be used for various game mechanics. Computationally, full scale simulation is not typically feasible except o… ▽ More Granular materials are very common in the everyday world. Media such as sand, soil, gravel, food stuffs, pharmaceuticals, etc. all have similar irregular flow since they are composed of numerous small solid particles. In video games, simulating these materials increases immersion and can be used for various game mechanics. Computationally, full scale simulation is not typically feasible except on the most powerful hardware and tends to be reduced in priority to favor other, more integral, gameplay features. Here we study the computational and qualitative aspects of side profile flow of sand-like particles using cellular automata (CA). Our CA uses a standard square lattice that updates via a custom, modified Margolus neighborhood. Each update occurs using a set of probabilistic transitions that can be tuned to simulate friction between particles. We focus on the look of the sandpile structure created from an hourglass shape over time using different transition probabilities and the computational impact of such a simulation. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: Cellular Automata, Sandpile

arXiv:2002.04374 [pdf, other]

doi 10.1007/978-3-030-33904-3_66

Convolutional Neural Networks and a Transfer Learning Strategy to Classify Parkinson's Disease from Speech in Three Different Languages

Authors: J. C. Vásquez-Correa, T. Arias-Vergara, C. D. Rios-Urrego, M. Schuster, J. Rusz, J. R. Orozco-Arroyave, E. Nöth

Abstract: Parkinson's disease patients develop different speech impairments that affect their communication capabilities. The automatic assessment of the speech of the patients allows the development of computer aided tools to support the diagnosis and the evaluation of the disease severity. This paper introduces a methodology to classify Parkinson's disease from speech in three different languages: Spanish… ▽ More Parkinson's disease patients develop different speech impairments that affect their communication capabilities. The automatic assessment of the speech of the patients allows the development of computer aided tools to support the diagnosis and the evaluation of the disease severity. This paper introduces a methodology to classify Parkinson's disease from speech in three different languages: Spanish, German, and Czech. The proposed approach considers convolutional neural networks trained with time frequency representations and a transfer learning strategy among the three languages. The transfer learning scheme aims to improve the accuracy of the models when the weights of the neural network are initialized with utterances from a different language than the used for the test set. The results suggest that the proposed strategy improves the accuracy of the models in up to 8\% when the base model used to initialize the weights of the classifier is robust enough. In addition, the results obtained after the transfer learning are in most cases more balanced in terms of specificity-sensitivity than those trained without the transfer learning strategy. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Journal ref: In Iberoamerican Congress on Pattern Recognition (pp. 697-706) 2019

arXiv:1903.03070 [pdf, ps, other]

An algorithmic approach to the existence of ideal objects in commutative algebra

Authors: Thomas Powell, Peter M Schuster, Franziskus Wiesnet

Abstract: The existence of ideal objects, such as maximal ideals in nonzero rings, plays a crucial role in commutative algebra. These are typically justified using Zorn's lemma, and thus pose a challenge from a computational point of view. Giving a constructive meaning to ideal objects is a problem which dates back to Hilbert's program, and today is still a central theme in the area of dynamical algebra, wh… ▽ More The existence of ideal objects, such as maximal ideals in nonzero rings, plays a crucial role in commutative algebra. These are typically justified using Zorn's lemma, and thus pose a challenge from a computational point of view. Giving a constructive meaning to ideal objects is a problem which dates back to Hilbert's program, and today is still a central theme in the area of dynamical algebra, which focuses on the elimination of ideal objects via syntactic methods. In this paper, we take an alternative approach based on Kreisel's no counterexample interpretation and sequential algorithms. We first give a computational interpretation to an abstract maximality principle in the countable setting via an intuitive, state based algorithm. We then carry out a concrete case study, in which we give an algorithmic account of the result that in any commutative ring, the intersection of all prime ideals is contained in its nilradical. △ Less

Submitted 7 March, 2019; originally announced March 2019.

arXiv:1903.01462 [pdf, other]

doi 10.1140/epjc/s10052-019-6869-2

Deep learning based pulse shape discrimination for germanium detectors

Authors: P. Holl, L. Hauertmann, B. Majorovits, O. Schulz, M. Schuster, A. J. Zsigmond

Abstract: Experiments searching for rare processes like neutrinoless double beta decay heavily rely on the identification of background events to reduce their background level and increase their sensitivity. We present a novel machine learning based method to recognize one of the most abundant classes of background events in these experiments. By combining a neural network for feature extraction with a smal… ▽ More Experiments searching for rare processes like neutrinoless double beta decay heavily rely on the identification of background events to reduce their background level and increase their sensitivity. We present a novel machine learning based method to recognize one of the most abundant classes of background events in these experiments. By combining a neural network for feature extraction with a smaller classification network, our method can be trained with only a small number of labeled events. To validate our method, we use signals from a broad-energy germanium detector irradiated with a $^{228}$Th gamma source. We find that it matches the performance of state-of-the-art algorithms commonly used for this detector type. However, it requires less tuning and calibration and shows potential to identify certain types of background events missed by other methods. △ Less

Submitted 2 June, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

Comments: Published in Eur. Phys. J. C. 9 pages, 10 figures, 3 tables

Journal ref: Eur. Phys. J. C (2019) 79: 450

arXiv:1902.08295 [pdf, other]

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Authors: Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob , et al. (66 additional authors not shown)

Abstract: Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly w… ▽ More Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework. △ Less

Submitted 21 February, 2019; originally announced February 2019.

arXiv:1804.10292 [pdf, other]

Streaming Rewriting Games: Winning Strategies and Complexity

Authors: Christian Coester, Thomas Schwentick, Martin Schuster

Abstract: Context-free games on strings are two-player rewriting games based on a set of production rules and a regular target language. In each round, the first player selects a position of the current string; then the second player replaces the symbol at that position according to one of the production rules. The first player wins as soon as the current string belongs to the target language. In this paper… ▽ More Context-free games on strings are two-player rewriting games based on a set of production rules and a regular target language. In each round, the first player selects a position of the current string; then the second player replaces the symbol at that position according to one of the production rules. The first player wins as soon as the current string belongs to the target language. In this paper the one-pass setting for context-free games is studied, where the knowledge of the first player is incomplete, she selects positions in a left-to-right fashion and only sees the current symbol and the symbols from previous rounds. The paper studies conditions under which dominant or undominated strategies for the first player exist and investigates the complexity of some related algorithmic problems. △ Less

Submitted 26 April, 2018; originally announced April 2018.

arXiv:1804.09849 [pdf, other]

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

Authors: Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Niki Parmar, Mike Schuster, Zhifeng Chen, Yonghui Wu, Macduff Hughes

Abstract: The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training tec… ▽ More The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT'14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets. △ Less

Submitted 26 April, 2018; v1 submitted 25 April, 2018; originally announced April 2018.

arXiv:1802.09984 [pdf, ps, other]

Formal Semantics of the Language Cypher

Authors: Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Martin Schuster, Petra Selmer, Andrés Taylor

Abstract: Cypher is a query language for property graphs. It was originally designed and implemented as part of the Neo4j graph database, and it is currently used in a growing number of commercial systems, industrial applications and research projects. In this work, we provide denotational semantics of the core fragment of the read-only part of Cypher, which features in particular pattern matching, filterin… ▽ More Cypher is a query language for property graphs. It was originally designed and implemented as part of the Neo4j graph database, and it is currently used in a growing number of commercial systems, industrial applications and research projects. In this work, we provide denotational semantics of the core fragment of the read-only part of Cypher, which features in particular pattern matching, filtering, and most relational operations on tables. △ Less

Submitted 20 March, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

Comments: 22 pages

arXiv:1712.05884 [pdf, other]

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Authors: Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu

Abstract: This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion s… ▽ More This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of $4.53$ comparable to a MOS of $4.58$ for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and $F_0$ features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture. △ Less

Submitted 15 February, 2018; v1 submitted 15 December, 2017; originally announced December 2017.

Comments: Accepted to ICASSP 2018

arXiv:1701.01337 [pdf, ps, other]

New Abilities and Limitations of Spectral Graph Bisection

Authors: Martin R. Schuster, Maciej Liskiewicz

Abstract: Spectral based heuristics belong to well-known commonly used methods which determines provably minimal graph bisection or outputs "fail" when the optimality cannot be certified. In this paper we focus on Boppana's algorithm which belongs to one of the most prominent methods of this type. It is well known that the algorithm works well in the random \emph{planted bisection model} -- the standard cla… ▽ More Spectral based heuristics belong to well-known commonly used methods which determines provably minimal graph bisection or outputs "fail" when the optimality cannot be certified. In this paper we focus on Boppana's algorithm which belongs to one of the most prominent methods of this type. It is well known that the algorithm works well in the random \emph{planted bisection model} -- the standard class of graphs for analysis minimum bisection and relevant problems. In 2001 Feige and Kilian posed the question if Boppana's algorithm works well in the semirandom model by Blum and Spencer. In our paper we answer this question affirmatively. We show also that the algorithm achieves similar performance on graph classes which extend the semirandom model. Since the behavior of Boppana's algorithm on the semirandom graphs remained unknown, Feige and Kilian proposed a new semidefinite programming (SDP) based approach and proved that it works on this model. The relationship between the performance of the SDP based algorithm and Boppana's approach was left as an open problem. In this paper we solve the problem in a complete way by proving that the bisection algorithm of Feige and Kilian provides exactly the same results as Boppana's algorithm. As a consequence we get that Boppana's algorithm achieves the optimal threshold for exact cluster recovery in the \emph{stochastic block model}. On the other hand we prove some limitations of Boppana's approach: we show that if the density difference on the parameters of the planted bisection model is too small then the algorithm fails with high probability in the model. △ Less

Submitted 28 April, 2017; v1 submitted 5 January, 2017; originally announced January 2017.

arXiv:1611.04558 [pdf, other]

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Authors: Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean

Abstract: We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, rem… ▽ More We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model without any increase in parameters, which is significantly simpler than previous proposals for Multilingual NMT. Our method often improves the translation quality of all involved language pairs, even while keeping the total number of model parameters constant. On the WMT'14 benchmarks, a single multilingual model achieves comparable performance for English$\rightarrow$French and surpasses state-of-the-art results for English$\rightarrow$German. Similarly, a single multilingual model surpasses state-of-the-art results for French$\rightarrow$English and German$\rightarrow$English on WMT'14 and WMT'15 benchmarks respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages. △ Less

Submitted 21 August, 2017; v1 submitted 14 November, 2016; originally announced November 2016.

arXiv:1609.08144 [pdf, other]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Authors: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith , et al. (6 additional authors not shown)

Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NM… ▽ More Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system. △ Less

Submitted 8 October, 2016; v1 submitted 26 September, 2016; originally announced September 2016.

arXiv:1609.00150 [pdf, ps, other]

Reward Augmented Maximum Likelihood for Neural Structured Prediction

Authors: Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans

Abstract: A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation. This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework. By establishing a link between the log-likelihood and expected reward objectives, we show that an optimal regularized expected reward is… ▽ More A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation. This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework. By establishing a link between the log-likelihood and expected reward objectives, we show that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards. Accordingly, we present a framework to smooth the predictive probability of the outputs using their corresponding rewards. We optimize the conditional log-probability of augmented outputs that are sampled proportionally to their exponentiated scaled rewards. Experiments on neural sequence to sequence models for speech recognition and machine translation show notable improvements over a maximum likelihood baseline by using reward augmented maximum likelihood (RAML), where the rewards are defined as the negative edit distance between the outputs and the ground truth labels. △ Less

Submitted 4 January, 2017; v1 submitted 1 September, 2016; originally announced September 2016.

Comments: NIPS 2016

arXiv:1606.02879 [pdf, ps, other]

Transducer-based Rewriting Games for Active XML

Authors: Martin Schuster

Abstract: Context-free games are two-player rewriting games that are played on nested strings representing XML documents with embedded function symbols. These games were introduced to model rewriting processes for intensional documents in the Active XML framework, where input documents are to be rewritten into a given target schema by calls to external services. This paper studies the setting where depend… ▽ More Context-free games are two-player rewriting games that are played on nested strings representing XML documents with embedded function symbols. These games were introduced to model rewriting processes for intensional documents in the Active XML framework, where input documents are to be rewritten into a given target schema by calls to external services. This paper studies the setting where dependencies between inputs and outputs of service calls are modelled by transducers, which has not been examined previously. It defines transducer models operating on nested words and studies their properties, as well as the computational complexity of the winning problem for transducer-based context-free games in several scenarios. While the complexity of this problem is quite high in most settings (ranging from NP-complete to undecidable), some tractable restrictions are also identified. △ Less

Submitted 9 June, 2016; originally announced June 2016.

Comments: Extended version of MFCS 2016 conference paper

ACM Class: F.2.m; F.4.2; H.3.5

arXiv:1603.04467 [pdf, other]

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Authors: Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah , et al. (15 additional authors not shown)

Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational de… ▽ More TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org. △ Less

Submitted 16 March, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

Comments: Version 2 updates only the metadata, to correct the formatting of Martín Abadi's name

arXiv:1602.02410 [pdf, other]

Exploring the Limits of Language Modeling

Authors: Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu

Abstract: In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Lon… ▽ More In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon. △ Less

Submitted 11 February, 2016; v1 submitted 7 February, 2016; originally announced February 2016.

arXiv:1412.5910 [pdf, ps, other]

Games for Active XML Revisited

Authors: Martin Schuster, Thomas Schwentick

Abstract: The paper studies the rewriting mechanisms for intensional documents in the Active XML framework, abstracted in the form of active context-free games. The safe rewriting problem studied in this paper is to decide whether the first player, Juliet, has a winning strategy for a given game and (nested) word; this corresponds to a successful rewriting strategy for a given intensional document. The pape… ▽ More The paper studies the rewriting mechanisms for intensional documents in the Active XML framework, abstracted in the form of active context-free games. The safe rewriting problem studied in this paper is to decide whether the first player, Juliet, has a winning strategy for a given game and (nested) word; this corresponds to a successful rewriting strategy for a given intensional document. The paper examines several extensions to active context-free games. The primary extension allows more expressive schemas (namely XML schemas and regular nested word languages) for both target and replacement languages and has the effect that games are played on nested words instead of (flat) words as in previous studies. Other extensions consider validation of input parameters of web services, and an alternative semantics based on insertion of service call results. In general, the complexity of the safe rewriting problem is highly intractable (doubly exponential time), but the paper identifies interesting tractable cases. △ Less

Submitted 18 December, 2014; originally announced December 2014.

Comments: To be published in ICDT 2015

ACM Class: F.2.m; F.4.2; H.3.5

arXiv:1312.3005 [pdf, ps, other]

One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling

Authors: Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson

Abstract: We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the… ▽ More We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the best results achieved with a recurrent neural network based language model. The baseline unpruned Kneser-Ney 5-gram model achieves perplexity 67.6; a combination of techniques leads to 35% reduction in perplexity, or 10% reduction in cross-entropy (bits), over that baseline. The benchmark is available as a code.google.com project; besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the baseline n-gram models. △ Less

Submitted 4 March, 2014; v1 submitted 10 December, 2013; originally announced December 2013.

Comments: Accompanied by a code.google.com project allowing anyone to generate the benchmark data, and use it to compare their language model against the ones described in the paper

arXiv:1308.2690 [pdf, ps, other]

doi 10.2168/LMCS-9(3:20)2013

Induction in Algebra: a First Case Study

Authors: Peter M Schuster

Abstract: Many a concrete theorem of abstract algebra admits a short and elegant proof by contradiction but with Zorn's Lemma (ZL). A few of these theorems have recently turned out to follow in a direct and elementary way from the Principle of Open Induction distinguished by Raoult. The ideal objects characteristic of any invocation of ZL are eliminated, and it is made possible to pass from classical to in… ▽ More Many a concrete theorem of abstract algebra admits a short and elegant proof by contradiction but with Zorn's Lemma (ZL). A few of these theorems have recently turned out to follow in a direct and elementary way from the Principle of Open Induction distinguished by Raoult. The ideal objects characteristic of any invocation of ZL are eliminated, and it is made possible to pass from classical to intuitionistic logic. If the theorem has finite input data, then a finite partial order carries the required instance of induction, which thus is constructively provable. A typical example is the well-known theorem "every nonconstant coefficient of an invertible polynomial is nilpotent". △ Less

Submitted 20 September, 2013; v1 submitted 12 August, 2013; originally announced August 2013.

Journal ref: Logical Methods in Computer Science, Volume 9, Issue 3 (September 17, 2013) lmcs:959

arXiv:1212.3501 [pdf, ps, other]

On optimum left-to-right strategies for active context-free games

Authors: Henrik Björklund, Martin Schuster, Thomas Schwentick, Joscha Kulbatzki

Abstract: Active context-free games are two-player games on strings over finite alphabets with one player trying to rewrite the input string to match a target specification. These games have been investigated in the context of exchanging Active XML (AXML) data. While it was known that the rewriting problem is undecidable in general, it is shown here that it is EXPSPACE-complete to decide for a given context… ▽ More Active context-free games are two-player games on strings over finite alphabets with one player trying to rewrite the input string to match a target specification. These games have been investigated in the context of exchanging Active XML (AXML) data. While it was known that the rewriting problem is undecidable in general, it is shown here that it is EXPSPACE-complete to decide for a given context-free game, whether all safely rewritable strings can be safely rewritten in a left-to-right manner, a problem that was previously considered by Abiteboul et al. Furthermore, it is shown that the corresponding problem for games with finite replacement languages is EXPTIME-complete. △ Less

Submitted 14 December, 2012; originally announced December 2012.

Comments: To appear in ICDT 2013

arXiv:1207.4694 [pdf, ps, other]

A New Upper Bound for the Traveling Salesman Problem in Cubic Graphs

Authors: Maciej Liskiewicz, Martin R. Schuster

Abstract: We provide a new upper bound for traveling salesman problem (TSP) in cubic graphs, i.e. graphs with maximum vertex degree three, and prove that the problem for an $n$-vertex graph can be solved in $O(1.2553^n)$ time and in linear space. We show that the exact TSP algorithm of Eppstein, with some minor modifications, yields the stated result. The previous best known upper bound $O(1.251^n)$ was cla… ▽ More We provide a new upper bound for traveling salesman problem (TSP) in cubic graphs, i.e. graphs with maximum vertex degree three, and prove that the problem for an $n$-vertex graph can be solved in $O(1.2553^n)$ time and in linear space. We show that the exact TSP algorithm of Eppstein, with some minor modifications, yields the stated result. The previous best known upper bound $O(1.251^n)$ was claimed by Iwama and Nakashima [Proc. COCOON 2007]. Unfortunately, their analysis contains several mistakes that render the proof for the upper bound invalid. △ Less

Submitted 30 November, 2012; v1 submitted 19 July, 2012; originally announced July 2012.

arXiv:1203.6536 [pdf, other]

Computing the Ramsey Number $R(K_5-P_3,K_5)$

Authors: Jesse A. Calvert, Michael J. Schuster, Stanisław P. Radziszowski

Abstract: We give a computer-assisted proof of the fact that $R(K_5-P_3, K_5)=25$. This solves one of the three remaining open cases in Hendry's table, which listed the Ramsey numbers for pairs of graphs on 5 vertices. We find that there exist no $(K_5-P_3,K_5)$-good graphs containing a $K_4$ on 23 or 24 vertices, where a graph $F$ is $(G,H)$-good if $F$ does not contain $G$ and the complement of $F$ does n… ▽ More We give a computer-assisted proof of the fact that $R(K_5-P_3, K_5)=25$. This solves one of the three remaining open cases in Hendry's table, which listed the Ramsey numbers for pairs of graphs on 5 vertices. We find that there exist no $(K_5-P_3,K_5)$-good graphs containing a $K_4$ on 23 or 24 vertices, where a graph $F$ is $(G,H)$-good if $F$ does not contain $G$ and the complement of $F$ does not contain $H$. The unique $(K_5-P_3,K_5)$-good graph containing a $K_4$ on 22 vertices is presented. △ Less

Submitted 29 March, 2012; originally announced March 2012.

MSC Class: 05C55

Journal ref: Journal of Combinatorial Mathematics and Combinatorial Computing, 82 (2012) 131-140

Showing 1–29 of 29 results for author: Schuster, M