Skip to main content

Showing 1–8 of 8 results for author: Itoyama, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.19813  [pdf, other

    cs.RO

    SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization

    Authors: Jiang Wang, Yuanzheng He, Daobilige Su, Katsutoshi Itoyama, Kazuhiro Nakadai, Junfeng Wu, Shoudong Huang, Youfu Li, He Kong

    Abstract: Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: This paper was accepted to and going to appear in the IEEE Transactions on Robotics

  2. arXiv:2401.14661  [pdf, other

    cs.CV cs.LG

    From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection with Super Resolution

    Authors: Ragib Amin Nihal, Benjamin Yen, Katsutoshi Itoyama, Kazuhiro Nakadai

    Abstract: The demand for accurate object detection in aerial imagery has surged with the widespread use of drones and satellite technology. Traditional object detection models, trained on datasets biased towards large objects, struggle to perform optimally in aerial scenarios where small, densely clustered objects are prevalent. To address this challenge, we present an innovative approach that combines supe… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  3. arXiv:2309.12065  [pdf, other

    eess.AS cs.SD eess.SP

    Is the Ideal Ratio Mask Really the Best? -- Exploring the Best Extraction Performance and Optimal Mask of Mask-based Beamformers

    Authors: Atsuo Hiroe, Katsutoshi Itoyama, Kazuhiro Nakadai

    Abstract: This study investigates mask-based beamformers (BFs), which estimate filters to extract target speech using time-frequency masks. Although several BF methods have been proposed, the following aspects are yet to be comprehensively investigated. 1) Which BF can provide the best extraction performance in terms of the closeness of the BF output to the target speech? 2) Is the optimal mask for the best… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted in APSIPA 2023

  4. arXiv:2111.07979  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SY q-bio.NC

    Metric-based multimodal meta-learning for human movement identification via footstep recognition

    Authors: Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

    Abstract: We describe a novel metric-based learning approach that introduces a multimodal framework and uses deep audio and geophone encoders in siamese configuration to design an adaptable and lightweight supervised model. This framework eliminates the need for expensive data labeling procedures and learns general-purpose representations from low multisensory data obtained from omnipresent sensing systems.… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

  5. arXiv:1903.09341  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

    Authors: Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

    Abstract: This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take… ▽ More

    Submitted 31 March, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

  6. arXiv:1710.11439  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

    Authors: Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

    Abstract: This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network (DNN) to take noisy speech as input and output clean speech. Although this supervised approach requires a very large amount of pair data for training, it is not ro… ▽ More

    Submitted 19 March, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

    Comments: 5 pages, 3 figures, version that Eqs. (9), (19), and (20) in v2 (submitted to ICASSP 2018) are corrected. Samples available here: http://sap.ist.i.kyoto-u.ac.jp/members/yoshiaki/demo/vae-nmf/

  7. arXiv:1708.02255  [pdf, other

    cs.AI cs.CL cs.SD

    Generative Statistical Models with Self-Emergent Grammar of Chord Sequences

    Authors: Hiroaki Tsushima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

    Abstract: Generative statistical models of chord sequences play crucial roles in music processing. To capture syntactic similarities among certain chords (e.g. in C major key, between G and G7 and between F and Dm), we study hidden Markov models and probabilistic context-free grammar models with latent variables describing syntactic categories of chord symbols and their unsupervised learning techniques for… ▽ More

    Submitted 2 March, 2018; v1 submitted 7 August, 2017; originally announced August 2017.

    Comments: 22 pages, 14 figures, version accepted to JNMR, minor revision

  8. Singing Voice Separation and Vocal F0 Estimation based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation

    Authors: Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii

    Abstract: This paper presents a new method of singing voice analysis that performs mutually-dependent singing voice separation and vocal fundamental frequency (F0) estimation. Vocal F0 estimation is considered to become easier if singing voices can be separated from a music audio signal, and vocal F0 contours are useful for singing voice separation. This calls for an approach that improves the performance o… ▽ More

    Submitted 1 April, 2016; originally announced April 2016.

    Comments: 11 pages