Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Bando, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08396  [pdf, other

    eess.AS cs.AI

    Neural Blind Source Separation and Diarization for Distant Speech Recognition

    Authors: Yoshiaki Bando, Tomohiko Nakamura, Shinji Watanabe

    Abstract: This paper presents a neural method for distant speech recognition (DSR) that jointly separates and diarizes speech mixtures without supervision by isolated signals. A standard separation method for multi-talker DSR is a statistical multichannel method called guided source separation (GSS). While GSS does not require signal-level supervision, it relies on speaker diarization results to handle unkn… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures, accepted to INTERSPEECH 2024

  2. Infrastructure-less Localization from Indoor Environmental Sounds Based on Spectral Decomposition and Spatial Likelihood Model

    Authors: Satoki Ogiso, Yoshiaki Bando, Takeshi Kurata, Takashi Okuma

    Abstract: Human and/or asset tracking using an attached sensor units helps understand their activities. Most common indoor localization methods for human tracking technologies require expensive infrastructures, deployment and maintenance. To overcome this problem, environmental sounds have been used for infrastructure-free localization. While they achieve room-level classification, they suffer from two prob… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 6 pages, 6 figures, accepted to IEEE/SICE SII 2023

  3. Real-time Neuron Segmentation for Voltage Imaging

    Authors: Yosuke Bando, Ramdas Pillai, Atsushi Kajita, Farhan Abdul Hakeem, Yves Quemener, Hua-an Tseng, Kiryl D. Piatkevich, Changyang Linghu, Xue Han, Edward S. Boyden

    Abstract: In voltage imaging, where the membrane potentials of individual neurons are recorded at from hundreds to thousand frames per second using fluorescence microscopy, data processing presents a challenge. Even a fraction of a minute of recording with a limited image size yields gigabytes of video data consisting of tens of thousands of frames, which can be time-consuming to process. Moreover, millisec… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Journal ref: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 813-818, 2023

  4. Implementing and Evaluating E2LSH on Storage

    Authors: Yu Nakanishi, Kazuhiro Hiwada, Yosuke Bando, Tomoya Suzuki, Hirotsugu Kajihara, Shintaro Sano, Tatsuro Endo, Tatsuo Shiozawa

    Abstract: Locality sensitive hashing (LSH) is one of the widely-used approaches to approximate nearest neighbor search (ANNS) in high-dimensional spaces. The first work on LSH for the Euclidean distance, E2LSH, showed how ANNS can be solved efficiently at a sublinear query time in the database size with theoretically-guaranteed accuracy, although it required a large hash index size. Since then, several LSH… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Journal ref: 26th International Conference on Extending Database Technology (EDBT), 437-449, 2023

  5. GPU Graph Processing on CXL-Based Microsecond-Latency External Memory

    Authors: Shintaro Sano, Yosuke Bando, Kazuhiro Hiwada, Hirotsugu Kajihara, Tomoya Suzuki, Yu Nakanishi, Daisuke Taki, Akiyuki Kaneko, Tatsuo Shiozawa

    Abstract: In GPU graph analytics, the use of external memory such as the host DRAM and solid-state drives is a cost-effective approach to processing large graphs beyond the capacity of the GPU onboard memory. This paper studies the use of Compute Express Link (CXL) memory as alternative external memory for GPU graph processing in order to see if this emerging memory expansion technology enables graph proces… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Journal ref: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W '23), pp. 962-972, November 2023

  6. arXiv:2306.10240  [pdf, other

    cs.SD cs.LG eess.AS

    Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation

    Authors: Yoshiaki Bando, Yoshiki Masuyama, Aditya Arie Nugraha, Kazuyoshi Yoshii

    Abstract: This paper describes an efficient unsupervised learning method for a neural source separation model that utilizes a probabilistic generative model of observed multichannel mixtures proposed for blind source separation (BSS). For this purpose, amortized variational inference (AVI) has been used for directly solving the inverse problem of BSS with full-rank spatial covariance analysis (FCA). Althoug… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, accepted to EUSIPCO 2023

  7. arXiv:2207.10934  [pdf, other

    eess.AS cs.SD

    DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF

    Authors: Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii

    Abstract: This paper describes a practical dual-process speech enhancement system that adapts environment-sensitive frame-online beamforming (front-end) with help from environment-free block-online source separation (back-end). To use minimum variance distortionless response (MVDR) beamforming, one may train a deep neural network (DNN) that estimates time-frequency masks used for computing the covariance ma… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: IWAENC 2022

  8. arXiv:2207.07296  [pdf, other

    eess.AS cs.LG cs.SD

    Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments

    Authors: Kouhei Sekiguchi, Aditya Arie Nugraha, Yicheng Du, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

    Abstract: This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e.g., cocktail party). One may use a state-of-the-art blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) that works well i… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: IEEE/RSJ IROS 2022

  9. arXiv:2207.07273  [pdf, other

    eess.AS cs.LG cs.SD

    Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

    Authors: Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

    Abstract: This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments. A major approach that has actively been studied in simulated environments is to sequentially perform speech enhancement and automatic speech recognition (ASR) based on deep neural networks (DNNs) trained in a supervised manner. In our ta… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: INTERSPEECH 2022

  10. arXiv:2205.05330  [pdf, other

    cs.SD eess.AS eess.SP stat.ML

    Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation

    Authors: Mathieu Fontaine, Kouhei Sekiguchi, Aditya Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

    Abstract: This paper describes heavy-tailed extensions of a state-of-the-art versatile blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) from a unified point of view. The common way of deriving such an extension is to replace the multivariate complex Gaussian distribution in the likelihood function with its heavy-tailed generalization, e.g., the multivariate… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2022, pp.1-1

  11. arXiv:2007.13976  [pdf, other

    cs.SD cs.CV eess.AS

    Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling

    Authors: Yoshiki Masuyama, Yoshiaki Bando, Kohei Yatabe, Yoko Sasaki, Masaki Onishi, Yasuhiro Oikawa

    Abstract: Detecting sound source objects within visual observation is important for autonomous robots to comprehend surrounding environments. Since sounding objects have a large variety with different appearances in our living environments, labeling all sounding objects is impossible in practice. This calls for self-supervised learning which does not require manual labeling. Most of conventional self-superv… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: Accepted for publication in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  12. arXiv:1908.11307  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model

    Authors: Yoshiaki Bando, Yoko Sasaki, Kazuyoshi Yoshii

    Abstract: This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals. Conventional neural separation methods require a lot of supervised data to achieve excellent performance. Although multichannel methods based on spatial information can work without such training data, they are often sensitive to parameter initialization and degraded with the… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

    Comments: 6 pages, 2 figures, accepted for publication in 2019 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)

  13. arXiv:1903.09341  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

    Authors: Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

    Abstract: This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take… ▽ More

    Submitted 31 March, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

  14. arXiv:1903.03237  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices

    Authors: Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

    Abstract: This paper describes a versatile method that accelerates multichannel source separation methods based on full-rank spatial modeling. A popular approach to multichannel source separation is to integrate a spatial model with a source model for estimating the spatial covariance matrices (SCMs) and power spectral densities (PSDs) of each sound source in the time-frequency domain. One of the most succe… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  15. arXiv:1710.11439  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

    Authors: Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

    Abstract: This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network (DNN) to take noisy speech as input and output clean speech. Although this supervised approach requires a very large amount of pair data for training, it is not ro… ▽ More

    Submitted 19 March, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

    Comments: 5 pages, 3 figures, version that Eqs. (9), (19), and (20) in v2 (submitted to ICASSP 2018) are corrected. Samples available here: http://sap.ist.i.kyoto-u.ac.jp/members/yoshiaki/demo/vae-nmf/