Showing 1–2 of 2 results for author: Kawano, H

Search v0.5.6 released 2020-02-24

arXiv:2312.13891 [pdf, other]

cs.RO

A Summarized History-based Dialogue System for Amnesia-Free Prompt Updates

Authors: Hyejin Hong, Hibiki Kawano, Takuto Maekawa, Naoki Yoshimaru, Takamasa Iio, Kenji Hatano

Abstract: In today's society, information overload presents challenges in providing optimal recommendations. Consequently, the importance of dialogue systems that can discern and provide the necessary information through dialogue is increasingly recognized. However, some concerns existing dialogue systems rely on pre-trained models and need help to cope with real-time or insufficient information. To address… ▽ More In today's society, information overload presents challenges in providing optimal recommendations. Consequently, the importance of dialogue systems that can discern and provide the necessary information through dialogue is increasingly recognized. However, some concerns existing dialogue systems rely on pre-trained models and need help to cope with real-time or insufficient information. To address these concerns, models that allow the addition of missing information to dialogue robots are being proposed. Yet, maintaining the integrity of previous conversation history while integrating new data remains a formidable challenge. This paper presents a novel system for dialogue robots designed to remember user-specific characteristics by retaining past conversation history even as new information is added. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: This paper is part of the proceedings of the Dialogue Robot Competition 2023
arXiv:2308.11241 [pdf]

cs.SD cs.AI cs.LG eess.AS

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Authors: Harunori Kawano, Sota Shimizu

Abstract: Wav2vec2 has achieved success in applying Transformer architecture and self-supervised learning to speech recognition. Recently, these have come to be used not only for speech recognition but also for the entire speech processing. This paper introduces an effective end-to-end speaker identification model applied Transformer-based contextual model. We explored the relationship between the hyper-par… ▽ More Wav2vec2 has achieved success in applying Transformer architecture and self-supervised learning to speech recognition. Recently, these have come to be used not only for speech recognition but also for the entire speech processing. This paper introduces an effective end-to-end speaker identification model applied Transformer-based contextual model. We explored the relationship between the hyper-parameters and the performance in order to discern the structure of an effective model. Furthermore, we propose a pooling method, Temporal Gate Pooling, with powerful learning ability for speaker identification. We applied Conformer as encoder and BEST-RQ for pre-training and conducted an evaluation utilizing the speaker identification of VoxCeleb1. The proposed method has achieved an accuracy of 87.1% with 28.5M parameters, demonstrating comparable precision to wav2vec2 with 317.7M parameters. Code is available at https://github.com/HarunoriKawano/speaker-identification-with-tgp. △ Less

Submitted 10 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 5 pages, 3 figures

Search v0.5.6 released 2020-02-24