Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Jukic, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03495  [pdf, other

    eess.AS cs.CL cs.LG

    Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

    Authors: Kunal Dhawan, Nithin Rao Koluguri, Ante Jukić, Ryan Langman, Jagadeesh Balam, Boris Ginsburg

    Abstract: Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint speech-text foundational models. In this work, we present a comprehensive analysis on building ASR systems with discrete codes. We investigate different method… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2405.06473  [pdf, other

    cs.RO cs.CV

    Autonomous Driving with a Deep Dual-Model Solution for Steering and Braking Control

    Authors: Ana Petra Jukić, Ana Šelek, Marija Seder, Ivana Podnar Žarko

    Abstract: The technology of autonomous driving is currently attracting a great deal of interest in both research and industry. In this paper, we present a deep learning dual-model solution that uses two deep neural networks for combined braking and steering in autonomous vehicles. Steering control is achieved by applying the NVIDIA's PilotNet model to predict the steering wheel angle, while braking control… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 6 pages, 2 figures, accepted for publication in Proceedings of International Conference on Smart and Sustainable Technologies (SpliTech 2024)

  3. arXiv:2310.12378  [pdf, other

    eess.AS cs.SD

    The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

    Authors: Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

    Abstract: We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays. The system predominantly comprises of the following integral modules: the Spea… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  4. arXiv:2310.12371  [pdf, other

    eess.AS cs.SD

    Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

    Authors: Tae Jin Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg

    Abstract: We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap via the adjustment of statistical parameters. This capability offers a tailored training environment for developing neural models suited for speaker diarization… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023