Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Hiroe, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.15310  [pdf, other

    eess.SP cs.SD eess.AS

    Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?

    Authors: Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai

    Abstract: This study investigates mask-based beamformers (BFs), which estimate filters for target sound extraction (TSE) using time-frequency masks. Although multiple mask-based BFs have been proposed, no consensus has been established on the best one for target-extracting performance. Previously, we found that maximum signal-to-noise ratio and minimum mean square error (MSE) BFs can achieve the same extrac… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Submitted to EURASIP journal on Audio, Speech, and Music Processing

  2. arXiv:2312.16449  [pdf

    eess.AS cs.SD eess.SP

    Online Similarity-and-Independence-Aware Beamformer for Low-latency Target Sound Extraction

    Authors: Atsuo Hiroe

    Abstract: This study describes an online target sound extraction (TSE) process, derived from the iterative batch algorithm using the similarity-and-independence-aware beamformer (SIBF), to achieve both latency reduction and extraction accuracy maintenance. The SIBF is a linear method that estimates the target more accurately compared with a reference, an approximate magnitude spectrogram of the target. Evid… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Open Journal of Signal Processing

  3. arXiv:2309.12065  [pdf, other

    eess.AS cs.SD eess.SP

    Is the Ideal Ratio Mask Really the Best? -- Exploring the Best Extraction Performance and Optimal Mask of Mask-based Beamformers

    Authors: Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai

    Abstract: This study investigates mask-based beamformers (BFs), which estimate filters to extract target speech using time-frequency masks. Although several BF methods have been proposed, the following aspects are yet to be comprehensively investigated. 1) Which BF can provide the best extraction performance in terms of the closeness of the BF output to the target speech? 2) Is the optimal mask for the best… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted in APSIPA 2023

  4. Similarity-and-Independence-Aware Beamformer with Iterative Casting and Boost Start for Target Source Extraction Using Reference

    Authors: Atsuo Hiroe

    Abstract: Target source extraction is significant for improving human speech intelligibility and the speech recognition performance of computers. This study describes a method for target source extraction, called the similarity-and-independence-aware beamformer (SIBF). The SIBF extracts the target source using a rough magnitude spectrogram as the reference signal. The advantage of the SIBF is that it can ob… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: Accepted for publication as a regular paper in the IEEE Open Journal of Signal Processing (2021)

    Journal ref: A. Hiroe, "Similarity-and-Independence-Aware Beam-former with Iterative Casting and Boost Start for Target Source Extraction Using Reference," in IEEE Open Journal of Signal Processing, 2021

  5. arXiv:2006.00772  [pdf

    eess.AS cs.SD eess.SP

    Similarity-and-Independence-Aware Beamformer: Method for Target Source Extraction using Magnitude Spectrogram as Reference

    Authors: Atsuo Hiroe

    Abstract: This study presents a novel method for source extraction, referred to as the similarity-and-independence-aware beamformer (SIBF). The SIBF extracts the target signal using a rough magnitude spectrogram as the reference signal. The advantage of the SIBF is that it can obtain an accurate target signal, compared to the spectrogram generated by target-enhancing methods such as the speech enhancement b… ▽ More

    Submitted 24 August, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: Accepted in INTERSPEECH 2020