Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: Heo, H S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2211.00437  [pdf, other

    eess.AS cs.SD

    Disentangled representation learning for multilingual speaker recognition

    Authors: Kihyun Nam, Youkyum Kim, Jaesung Huh, Hee Soo Heo, Jee-weon Jung, Joon Son Chung

    Abstract: The goal of this paper is to learn robust speaker representation for bilingual speaking scenario. The majority of the world's population speak at least two languages; however, most speaker recognition systems fail to recognise the same speaker when speaking in different languages. Popular speaker recognition evaluation sets do not consider the bilingual scenario, making it difficult to analyse t… ▽ More

    Submitted 6 June, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Interspeech 2023

  2. arXiv:2011.14885  [pdf, ps, other

    cs.SD eess.AS

    Look who's not talking

    Authors: Youngki Kwon, Hee Soo Heo, Jaesung Huh, Bong-Jin Lee, Joon Son Chung

    Abstract: The objective of this work is speaker diarisation of speech recordings 'in the wild'. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

    Comments: SLT 2021

  3. arXiv:2009.14153  [pdf, other

    eess.AS cs.SD

    Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020

    Authors: Hee Soo Heo, Bong-Jin Lee, Jaesung Huh, Joon Son Chung

    Abstract: This report describes our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020. We perform a careful analysis of speaker recognition models based on the popular ResNet architecture, and train a number of variants using a range of loss functions. Our results show significant improvements over most existing works without the use of model ensemble or post-processing.… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

  4. arXiv:2007.12085  [pdf, other

    cs.SD cs.LG eess.AS

    Augmentation adversarial training for self-supervised speaker recognition

    Authors: Jaesung Huh, Hee Soo Heo, Jingu Kang, Shinji Watanabe, Joon Son Chung

    Abstract: The goal of this work is to train robust speaker recognition models without speaker labels. Recent works on unsupervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be similar and across-utterance embeddings to be dissimilar. However, since the within-utterance segments share the same acoustic characteristics, it is difficult to… ▽ More

    Submitted 30 October, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: Workshop on Self-Supervised Learning for Speech and Audio Processing, NeurIPS

  5. arXiv:2005.08606  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    End-to-End Lip Synchronisation Based on Pattern Classification

    Authors: You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee

    Abstract: The goal of this work is to synchronise audio and video of a talking face using deep neural network models. Existing works have trained networks on proxy tasks such as cross-modal similarity learning, and then computed similarities between audio and video frames using a sliding window approach. While these methods demonstrate satisfactory performance, the networks are not trained directly on the t… ▽ More

    Submitted 19 March, 2021; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: slt 2021 accepted

  6. In defence of metric learning for speaker recognition

    Authors: Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han

    Abstract: The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distance. A popular belief in speaker recognition is that networks trained with classification objectives outperform metric learning methods. In this paper… ▽ More

    Submitted 24 April, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: The code can be found at https://github.com/clovaai/voxceleb_trainer