Zum Hauptinhalt springen

Showing 1–19 of 19 results for author: Schwarz, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.07104  [pdf, other

    cs.RO

    Uncertainty-Aware Shape Estimation of a Surgical Continuum Manipulator in Constrained Environments using Fiber Bragg Grating Sensors

    Authors: Alexander Schwarz, Arian Mehrfard, Golchehr Amirkhani, Henry Phalen, Justin H. Ma, Robert B. Grupp, Alejandro Martin-Gomez, Mehran Armand

    Abstract: Continuum Dexterous Manipulators (CDMs) are well-suited tools for minimally invasive surgery due to their inherent dexterity and reachability. Nonetheless, their flexible structure and non-linear curvature pose significant challenges for shape-based feedback control. The use of Fiber Bragg Grating (FBG) sensors for shape sensing has shown great potential in estimating the CDM's tip position and su… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  2. arXiv:2404.12703  [pdf, other

    cs.MS cs.CE

    GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems

    Authors: Daniel Kempf, Marius Kurz, Marcel Blind, Patrick Kopper, Philipp Offenhäuser, Anna Schwarz, Spencer Starr, Jens Keim, Andrea Beck

    Abstract: This work presents GALÆXI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GALÆXI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-o… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 19 pages, 12 figures, 3 tables. Code available at: https://github.com/flexi-framework/galaexi

  3. arXiv:2403.16736  [pdf, other

    cs.CV

    Creating a Digital Twin of Spinal Surgery: A Proof of Concept

    Authors: Jonas Hein, Frédéric Giraud, Lilian Calvet, Alexander Schwarz, Nicola Alessandro Cavalcanti, Sergey Prokudin, Mazda Farshad, Siyu Tang, Marc Pollefeys, Fabio Carrillo, Philipp Fürnstahl

    Abstract: Surgery digitalization is the process of creating a virtual replica of real-world surgery, also referred to as a surgical digital twin (SDT). It has significant applications in various fields such as education and training, surgical planning, and automation of surgical tasks. In addition, SDTs are an ideal foundation for machine learning methods, enabling the automatic generation of training data.… ▽ More

    Submitted 22 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted for the DCA in MI Workshop @ CVPR 2024. Project page: https://jonashein.github.io/surgerydigitization/

  4. arXiv:2403.03326  [pdf, other

    eess.IV cs.CV

    AnatoMix: Anatomy-aware Data Augmentation for Multi-organ Segmentation

    Authors: Chang Liu, Fuxin Fan, Annette Schwarz, Andreas Maier

    Abstract: Multi-organ segmentation in medical images is a widely researched task and can save much manual efforts of clinicians in daily routines. Automating the organ segmentation process using deep learning (DL) is a promising solution and state-of-the-art segmentation models are achieving promising accuracy. In this work, We proposed a novel data augmentation strategy for increasing the generalizibility… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  5. arXiv:2401.07360  [pdf, other

    cs.CL cs.SD eess.AS

    Promptformer: Prompted Conformer Transducer for ASR

    Authors: Sergio Duarte-Torres, Arunasish Sen, Aman Rana, Lukas Drude, Alejandro Gomez-Alanis, Andreas Schwarz, Leif Rädel, Volker Leutnant

    Abstract: Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rW… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  6. arXiv:2306.12891  [pdf, other

    cs.DC

    Towards Exascale CFD Simulations Using the Discontinuous Galerkin Solver FLEXI

    Authors: Marcel Blind, Min Gao, Daniel Kempf, Patrick Kopper, Marius Kurz, Anna Schwarz, Andrea Beck

    Abstract: Modern high-order discretizations bear considerable potential for the exascale era due to their high fidelity and the high, local computational load that allows for computational efficiency in massively parallel simulations. To this end, the discontinuous Galerkin (DG) framework FLEXI was selected to demonstrate exascale readiness within the Center of Excellence for Exascale CFD (CEEC) by simulati… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 15 pages, 5 figures

  7. arXiv:2305.13794  [pdf, other

    cs.CL eess.AS

    Personalized Predictive ASR for Latency Reduction in Voice Assistants

    Authors: Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow

    Abstract: Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation. Prefetching involves passing a preliminary ASR hypothesis to downstream systems in order to prefetch and cache a response. If the final ASR hypothesis after endpoint detection matches the preliminary one, the cached response can be delivered to the user, th… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted for Interspeech 2023

  8. arXiv:2210.16238  [pdf, ps, other

    eess.AS cs.LG cs.SD eess.SP

    Contextual-Utterance Training for Automatic Speech Recognition

    Authors: Alejandro Gomez-Alanis, Lukas Drude, Andreas Schwarz, Rupak Vignesh Swaminathan, Simon Wiesler

    Abstract: Recent studies of streaming automatic speech recognition (ASR) recurrent neural network transducer (RNN-T)-based systems have fed the encoder with past contextual information in order to improve its word error rate (WER) performance. In this paper, we first propose a contextual-utterance training technique which makes use of the previous and future contextual utterances in order to do an implicit… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  9. arXiv:2106.07994  [pdf, other

    eess.AS cs.SD

    Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget

    Authors: Lukas Drude, Jahn Heymann, Andreas Schwarz, Jean-Marc Valin

    Abstract: Automatic speech recognition (ASR) in the cloud allows the use of larger models and more powerful multi-channel signal processing front-ends compared to on-device processing. However, it also adds an inherent latency due to the transmission of the audio signal, especially when transmitting multiple channels of a microphone array. One way to reduce the network bandwidth requirements is client-side… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted at Interspeech 2021

  10. arXiv:2101.05063  [pdf, other

    cs.MS

    Robust level-3 BLAS Inverse Iteration from the Hessenberg Matrix

    Authors: Angelika Schwarz

    Abstract: Inverse iteration is known to be an effective method for computing eigenvectors corresponding to simple and well-separated eigenvalues. In the non-symmetric case, the solution of shifted Hessenberg systems is a central step. Existing inverse iteration solvers approach the solution of the shifted Hessenberg systems with either RQ or LU factorizations and, once factored, solve the corresponding syst… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

  11. arXiv:2011.10538  [pdf, other

    eess.AS cs.SD

    Improving RNN-T ASR Accuracy Using Context Audio

    Authors: Andreas Schwarz, Ilya Sklyar, Simon Wiesler

    Abstract: We present a training scheme for streaming automatic speech recognition (ASR) based on recurrent neural network transducers (RNN-T) which allows the encoder network to learn to exploit context audio from a stream, using segmented or partially labeled sequences of the stream during training. We show that the use of context audio during training and inference can lead to word error rate reductions o… ▽ More

    Submitted 15 June, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

  12. A Neural Network based Shock Detection and Localization Approach for Discontinuous Galerkin Methods

    Authors: Andrea D. Beck, Jonas Zeifang, Anna Schwarz, David G. Flad

    Abstract: The stable and accurate approximation of discontinuities such as shocks on a finite computational mesh is a challenging task. Detection of shocks or strong discontinuities in the flow solution is typically achieved through a priori troubled cell indicators, which guide the subsequent action of an appropriate shock capturing mechanism. Arriving at a stable and accurate solution often requires empir… ▽ More

    Submitted 20 January, 2020; originally announced January 2020.

  13. arXiv:1905.10574  [pdf, other

    cs.MS math.NA

    Robust Task-Parallel Solution of the Triangular Sylvester Equation

    Authors: Angelika Schwarz, Carl Christian Kjelgaard Mikkelsen

    Abstract: The Bartels-Stewart algorithm is a standard approach to solving the dense Sylvester equation. It reduces the problem to the solution of the triangular Sylvester equation. The triangular Sylvester equation is solved with a variant of backward substitution. Backward substitution is prone to overflow. Overflow can be avoided by dynamic scaling of the solution matrix. An algorithm which prevents overf… ▽ More

    Submitted 25 May, 2019; originally announced May 2019.

    Comments: 10 pages, 7 figures

  14. Robust coherence-based spectral enhancement for speech recognition in adverse real-world environments

    Authors: Hendrik Barfuss, Christian Huemmer, Andreas Schwarz, Walter Kellermann

    Abstract: Speech recognition in adverse real-world environments is highly affected by reverberation and nonstationary background noise. A well-known strategy to reduce such undesired signal components in multi-microphone scenarios is spatial filtering of the microphone signals. In this article, we demonstrate that an additional coherence-based postfilter, which is applied to the beamformer output signal to… ▽ More

    Submitted 7 August, 2017; v1 submitted 12 April, 2016; originally announced April 2016.

    Comments: 21 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:1509.06882, Elsevier Computer Speech & Language (CSL), 2017

  15. arXiv:1509.06882  [pdf, ps, other

    cs.SD

    Robust coherence-based spectral enhancement for distant speech recognition

    Authors: Hendrik Barfuss, Christian Huemmer, Andreas Schwarz, Walter Kellermann

    Abstract: In this contribution to the 3rd CHiME Speech Separation and Recognition Challenge (CHiME-3) we extend the acoustic front-end of the CHiME-3 baseline speech recognition system by a coherence-based Wiener filter which is applied to the output signal of the baseline beamformer. To compute the time- and frequency-dependent postfilter gains the ratio between direct and diffuse signal components at the… ▽ More

    Submitted 23 September, 2015; originally announced September 2015.

  16. A model for the temporal evolution of the spatial coherence in decaying reverberant sound fields

    Authors: Sam Nees, Andreas Schwarz, Walter Kellermann

    Abstract: Reverberant sound fields are often modeled as isotropic. However, it has been observed that spatial properties change during the decay of the sound field energy, due to non-isotropic attenuation in non-ideal rooms. In this letter, a model for the spatial coherence between two sensors in a decaying reverberant sound field is developed for rectangular rooms. The modeled coherence function depends on… ▽ More

    Submitted 27 July, 2015; originally announced July 2015.

    Comments: Accepted for JASA Express Letters

  17. arXiv:1506.03604  [pdf, other

    cs.SD

    Binaural coherent-to-diffuse-ratio estimation for dereverberation using an ITD model

    Authors: Chengshi Zheng, Andreas Schwarz, Walter Kellermann, Xiaodong Li

    Abstract: Most previously proposed dual-channel coherent-to-diffuse-ratio (CDR) estimators are based on a free-field model. When used for binaural signals, e.g., for dereverberation in binaural hearing aids, their performance may degrade due to the influence of the head, even when the direction-of-arrival of the desired speaker is exactly known. In this paper, the head shadowing effect is taken into account… ▽ More

    Submitted 11 June, 2015; originally announced June 2015.

    Comments: accepted for EUSIPCO 2015

  18. Coherent-to-Diffuse Power Ratio Estimation for Dereverberation

    Authors: Andreas Schwarz, Walter Kellermann

    Abstract: The estimation of the time- and frequency-dependent coherent-to-diffuse power ratio (CDR) from the measured spatial coherence between two omnidirectional microphones is investigated. Known CDR estimators are formulated in a common framework, illustrated using a geometric interpretation in the complex plane, and investigated with respect to bias and robustness towards model errors. Several novel un… ▽ More

    Submitted 13 February, 2015; v1 submitted 12 February, 2015; originally announced February 2015.

    Comments: submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015

  19. arXiv:1410.2479  [pdf, other

    cs.CL cs.NE cs.SD stat.ML

    Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

    Authors: Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann

    Abstract: We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin… ▽ More

    Submitted 16 February, 2015; v1 submitted 9 October, 2014; originally announced October 2014.

    Comments: accepted for ICASSP2015