Zum Hauptinhalt springen

Showing 1–12 of 12 results for author: Park, T J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.12378  [pdf, other

    eess.AS cs.SD

    The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

    Authors: Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

    Abstract: We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays. The system predominantly comprises of the following integral modules: the Spea… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  2. arXiv:2310.12371  [pdf, other

    eess.AS cs.SD

    Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

    Authors: Tae Jin Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg

    Abstract: We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap via the adjustment of statistical parameters. This capability offers a tailored training environment for developing neural models suited for speaker diarization… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  3. arXiv:2309.05248  [pdf, other

    eess.AS cs.SD

    Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

    Authors: Tae Jin Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam

    Abstract: Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual cues in human dialogues. Our method builds upon an acoustic-based speaker diarization system by adding lexical information from an LLM in the inference stage. W… ▽ More

    Submitted 13 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: 4 pages 1 reference page, ICASSP format

  4. arXiv:2203.15974  [pdf, other

    eess.AS cs.CL

    Multi-scale Speaker Diarization with Dynamic Scale Weighting

    Authors: Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

    Abstract: Speaker diarization systems are challenged by a trade-off between the temporal resolution and the fidelity of the speaker representation. By obtaining a superior temporal resolution with an enhanced accuracy, a multi-scale approach is a way to cope with such a trade-off. In this paper, we propose a more advanced multi-scale diarization system based on a multi-scale diarization decoder. There are t… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  5. arXiv:2110.09695  [pdf, other

    cs.LG

    Tackling Dynamics in Federated Incremental Learning with Variational Embedding Rehearsal

    Authors: Tae Jin Park, Kenichi Kumatani, Dimitrios Dimitriadis

    Abstract: Federated Learning is a fast growing area of ML where the training datasets are extremely distributed, all while dynamically changing over time. Models need to be trained on clients' devices without any guarantees for either homogeneity or stationarity of the local private data. The need for continual training has also risen, due to the ever-increasing production of in-task data. However, pursuing… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  6. arXiv:2101.09624  [pdf, other

    eess.AS cs.CL cs.SD

    A Review of Speaker Diarization: Recent Advances with Deep Learning

    Authors: Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan

    Abstract: Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for speech recognition on multispeaker audio recordings to enable speaker adaptive processing. These algorithms also gained their own value as a standalone application o… ▽ More

    Submitted 26 November, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

    Comments: This article is a preprint version of the article published in Computer Speech & Language, Volume 72, March 2022, 101317

  7. arXiv:2007.09635  [pdf, other

    eess.AS cs.SD

    Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization

    Authors: Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae Jin Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan

    Abstract: The performance of most speaker diarization systems with x-vector embeddings is both vulnerable to noisy environments and lacks domain robustness. Earlier work on speaker diarization using generative adversarial network (GAN) with an encoder network (ClusterGAN) to project input x-vectors into a latent space has shown promising performance on meeting data. In this paper, we extend the ClusterGAN n… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

    Comments: Submitted to IEEE/ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

  8. Speaker Diarization with Lexical Information

    Authors: Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

    Abstract: This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Journal ref: Interspeech 2019, 391-395

  9. Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

    Authors: Tae Jin Park, Kyu J. Han, Manoj Kumar, Shrikanth Narayanan

    Abstract: In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization. The proposed framework uses normalized maximum eigengap (NME) values to estimate the number of clusters and the parameters for the threshold of the elements of each row in an affinity matrix during spectral clustering, without the use of… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: in IEEE Signal Processing Letters, 2020

  10. arXiv:1910.11398  [pdf, ps, other

    eess.AS cs.SD

    Speaker diarization using latent space clustering in generative adversarial network

    Authors: Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae Jin Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan

    Abstract: In this work, we propose deep latent space clustering for speaker diarization using generative adversarial network (GAN) backprojection with the help of an encoder network. The proposed diarization system is trained jointly with GAN loss, latent variable recovery loss, and a clustering-specific loss. It uses x-vector speaker embeddings at the input, while the latent variables are sampled from a co… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: Submitted to ICASSP 2020

  11. arXiv:1811.10761   

    cs.CL

    Speaker Diarization With Lexical Information

    Authors: Tae Jin Park, Kyu Han, Ian Lane, Panayiotis Georgiou

    Abstract: This work presents a novel approach to leverage lexical information for speaker diarization. We introduce a speaker diarization system that can directly integrate lexical as well as acoustic information into a speaker clustering process. Thus, we propose an adjacency matrix integration technique to integrate word level speaker turn probabilities with speaker embeddings in a comprehensive way. Our… ▽ More

    Submitted 28 November, 2018; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: This version removed by arXiv administrators because the author did not have the right to agree to our license at the time of submission

  12. arXiv:1805.10731  [pdf, other

    eess.AS cs.SD

    Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks

    Authors: Tae Jin Park, Panayiotis Georgiou

    Abstract: While there has been substantial amount of work in speaker diarization recently, there are few efforts in jointly employing lexical and acoustic information for speaker segmentation. Towards that, we investigate a speaker diarization system using a sequence-to-sequence neural network trained on both lexical and acoustic features. We also propose a loss function that allows for selecting not only t… ▽ More

    Submitted 27 May, 2018; originally announced May 2018.