Zum Hauptinhalt springen

Showing 1–25 of 25 results for author: Kobayashi, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2403.09920  [pdf

    eess.IV cs.AI cs.CV cs.CY

    Predicting Generalization of AI Colonoscopy Models to Unseen Data

    Authors: Joel Shor, Carson McNeil, Yotam Intrator, Joseph R Ledsam, Hiro-o Yamano, Daisuke Tsurumaru, Hiroki Kayama, Atsushi Hamabe, Koji Ando, Mitsuhiko Ota, Haruei Ogino, Hiroshi Nakase, Kaho Kobayashi, Masaaki Miyo, Eiji Oki, Ichiro Takemasa, Ehud Rivlin, Roman Goldenberg

    Abstract: $\textbf{Background}$: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. $\textbf{Methods}… ▽ More

    Submitted 22 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  2. arXiv:2312.09529  [pdf, other

    eess.IV cs.CV

    Can Physician Judgment Enhance Model Trustworthiness? A Case Study on Predicting Pathological Lymph Nodes in Rectal Cancer

    Authors: Kazuma Kobayashi, Yasuyuki Takamizawa, Mototaka Miyake, Sono Ito, Lin Gu, Tatsuya Nakatsuka, Yu Akagi, Tatsuya Harada, Yukihide Kanemitsu, Ryuji Hamamoto

    Abstract: Explainability is key to enhancing artificial intelligence's trustworthiness in medicine. However, several issues remain concerning the actual benefit of explainable models for clinical decision-making. Firstly, there is a lack of consensus on an evaluation framework for quantitatively assessing the practical benefits that effective explainability should provide to practitioners. Secondly, physici… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  3. arXiv:2309.09627  [pdf, other

    cs.SD eess.AS

    Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

    Authors: Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conv… ▽ More

    Submitted 20 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024. Demo page: lesterphillip.github.io/icassp2024_el_sie

  4. arXiv:2309.07598  [pdf, other

    cs.SD eess.AS

    AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion

    Authors: Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: Non-autoregressive (non-AR) sequence-to-seqeunce (seq2seq) models for voice conversion (VC) is attractive in its ability to effectively model the temporal structure while enjoying boosted intelligibility and fast inference thanks to non-AR modeling. However, the dependency of current non-AR seq2seq VC models on ground truth durations extracted from an external AR model greatly limits its generaliz… ▽ More

    Submitted 15 September, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024. Demo: https://unilight.github.io/Publication-Demos/publications/aas-vc/index.html. Code: https://github.com/unilight/seq2seq-vc

  5. arXiv:2309.03331  [pdf, other

    cs.CV eess.IV

    Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning

    Authors: Mengliang Zhang, Xinyue Hu, Lin Gu, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Ronald M. Summers, Yingying Zhu

    Abstract: Patients undergoing chest X-rays (CXR) often endure multiple lung diseases. When evaluating a patient's condition, due to the complex pathologies, subtle texture changes of different lung lesions in images, and patient condition differences, radiologists may make uncertain even when they have experienced long-term clinical training and professional guidance, which makes much noise in extracting di… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  6. arXiv:2210.10314  [pdf, other

    cs.SD eess.AS

    Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

    Authors: Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insuffici… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Accepted to SLT 2022

  7. arXiv:2210.09055  [pdf, other

    cs.CY eess.SY

    Data-driven multi-scale modeling and robust optimization of composite structure with uncertainty quantification

    Authors: Kazuma Kobayashi, Shoaib Usman, Carlos Castano, Dinesh Kumar, Syed Alam

    Abstract: It is important to accurately model materials' properties at lower length scales (micro-level) while translating the effects to the components and/or system level (macro-level) can significantly reduce the amount of experimentation required to develop new technologies. Robustness analysis of fuel and structural performance for harsh environments (such as power uprated reactor systems or aerospace… ▽ More

    Submitted 4 November, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

    Journal ref: Handbook of Smart Energy Systems, 2022

  8. arXiv:2209.12146  [pdf

    eess.SY cs.LG stat.ML

    Machine Learning and Artificial Intelligence-Driven Multi-Scale Modeling for High Burnup Accident-Tolerant Fuels for Light Water-Based SMR Applications

    Authors: Md. Shamim Hassan, Abid Hossain Khan, Richa Verma, Dinesh Kumar, Kazuma Kobayashi, Shoaib Usman, Syed Alam

    Abstract: The concept of small modular reactor has changed the outlook for tackling future energy crises. This new reactor technology is very promising considering its lower investment requirements, modularity, design simplicity, and enhanced safety features. The application of artificial intelligence-driven multi-scale modeling (neutronics, thermal hydraulics, fuel performance, etc.) incorporating Digital… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

    Journal ref: Handbook of Smart Energy Systems, 2022

  9. arXiv:2106.01415  [pdf, other

    cs.SD cs.CL eess.AS

    A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

    Authors: Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda

    Abstract: We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC). The poor quality of dysarthric speech can be greatly improved by statistical VC, but as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient. In light of this, we suggest a novel, two-stage approach for D… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021. 5 pages, 3 figures, 1 table

  10. arXiv:2104.06793  [pdf, other

    cs.SD cs.CL eess.AS

    Non-autoregressive sequence-to-sequence voice conversion

    Authors: Tomoki Hayashi, Wen-Chin Huang, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: This paper proposes a novel voice conversion (VC) method based on non-autoregressive sequence-to-sequence (NAR-S2S) models. Inspired by the great success of NAR-S2S models such as FastSpeech in text-to-speech (TTS), we extend the FastSpeech2 model for the VC problem. We introduce the convolution-augmented Transformer (Conformer) instead of the Transformer, making it possible to capture both local… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: Accepted to ICASSP2021. Demo HP: https://kan-bayashi.github.io/NonARSeq2SeqVC/

  11. arXiv:2103.02858  [pdf, ps, other

    eess.AS cs.SD

    crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder

    Authors: Kazuhiro Kobayashi, Wen-Chin Huang, Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Tomoki Toda

    Abstract: In this paper, we present an open-source software for developing a nonparallel voice conversion (VC) system named crank. Although we have released an open-source VC software based on the Gaussian mixture model named sprocket in the last VC Challenge, it is not straightforward to apply any speech corpus because it is necessary to prepare parallel utterances of source and target speakers to model a… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: Accepted to ICASSP 2021

  12. arXiv:2011.10196  [pdf, other

    eess.SY

    Deep unfolding-based output feedback control design for linear systems with input saturation

    Authors: Koki Kobayashi, Masaki Ogura, Taisuke Kobayashi, Kenji Sugimoto

    Abstract: In this paper, we propose a deep unfolding-based framework for the output feedback control of systems with input saturation. Although saturation commonly arises in several practical control systems, there is still a scarce of effective design methodologies that can directly deal with the severe non-linearity of the saturation operator. In this paper, we aim to design an anti-windup controller for… ▽ More

    Submitted 27 January, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

    Comments: 7 pages, 5 figures

  13. arXiv:2011.06224  [pdf, other

    eess.IV cs.CV

    Decomposing Normal and Abnormal Features of Medical Images for Content-based Image Retrieval

    Authors: Kazuma Kobayashi, Ryuichiro Hataya, Yusuke Kurose, Tatsuya Harada, Ryuji Hamamoto

    Abstract: Medical images can be decomposed into normal and abnormal features, which is considered as the compositionality. Based on this idea, we propose an encoder-decoder network to decompose a medical image into two discrete latent codes: a normal anatomy code and an abnormal anatomy code. Using these latent codes, we demonstrate a similarity retrieval by focusing on either normal or abnormal features of… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

  14. arXiv:2010.04446  [pdf, other

    eess.AS cs.CL cs.SD

    The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders

    Authors: Wen-Chin Huang, Patrick Lumban Tobing, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: In this paper, we present the voice conversion (VC) systems developed at Nagoya University (NU) for the Voice Conversion Challenge 2020 (VCC2020). We aim to determine the effectiveness of two recent significant technologies in VC: sequence-to-sequence (seq2seq) models and autoregressive (AR) neural vocoders. Two respective systems were developed for the two tasks in the challenge: for task 1, we a… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: Accepted to the ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

  15. Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

    Authors: Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: In this paper, a pitch-adaptive waveform generative model named Quasi-Periodic WaveNet (QPNet) is proposed to improve the limited pitch controllability of vanilla WaveNet (WN) using pitch-dependent dilated convolution neural networks (PDCNNs). Specifically, as a probabilistic autoregressive generation model with stacked dilated convolution layers, WN achieves high-fidelity audio waveform generatio… ▽ More

    Submitted 27 March, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

    Comments: 15 pages, 12 figures, 11 tables

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1134-1148, 2021

  16. arXiv:2005.12573  [pdf, other

    eess.IV cs.CV

    Learning Global and Local Features of Normal Brain Anatomy for Unsupervised Abnormality Detection

    Authors: Kazuma Kobayashi, Ryuichiro Hataya, Yusuke Kurose, Amina Bolatkan, Mototaka Miyake, Hirokazu Watanabe, Masamichi Takahashi, Jun Itami, Tatsuya Harada, Ryuji Hamamoto

    Abstract: In real-world clinical practice, overlooking unanticipated findings can result in serious consequences. However, supervised learning, which is the foundation for the current success of deep learning, only encourages models to identify abnormalities that are defined in datasets in advance. Therefore, abnormality detection must be implemented in medical images that are not limited to a specific dise… ▽ More

    Submitted 8 May, 2021; v1 submitted 26 May, 2020; originally announced May 2020.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  17. Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

    Authors: Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, Tomoki Toda

    Abstract: In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic featu… ▽ More

    Submitted 6 April, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: 13 pages, 13 figures, 1 table, accepted to publish in IEEE Access

  18. arXiv:1907.11898  [pdf, other

    eess.AS eess.SP

    Generalization of Spectrum Differential based Direct Waveform Modification for Voice Conversion

    Authors: Wen-Chin Huang, Yi-Chiao Wu, Kazuhiro Kobayashi, Yu-Huai Peng, Hsin-Te Hwang, Patrick Lumban Tobing, Yu Tsao, Hsin-Min Wang, Tomoki Toda

    Abstract: We present a modification to the spectrum differential based direct waveform modification for voice conversion (DIFFVC) so that it can be directly applied as a waveform generation module to voice conversion models. The recently proposed DIFFVC avoids the use of a vocoder, meanwhile preserves rich spectral details hence capable of generating high quality converted voice. To apply the DIFFVC framewo… ▽ More

    Submitted 27 July, 2019; originally announced July 2019.

    Comments: 6 pages, 4 figures, 1 table; accepted to the 10th ISCA speech synthesis workshop (SSW10)

  19. arXiv:1907.10185  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

    Authors: Patrick Lumban Tobing, Yi-Chiao Wu, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: In this paper, we present a novel technique for a non-parallel voice conversion (VC) with the use of cyclic variational autoencoder (CycleVAE)-based spectral modeling. In a variational autoencoder(VAE) framework, a latent space, usually with a Gaussian prior, is used to encode a set of input features. In a VAE-based VC, the encoded latent features are fed into a decoder, along with speaker-coding… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: Accepted to INTERSPEECH 2019

  20. arXiv:1907.08940  [pdf

    eess.AS cs.SD

    Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder

    Authors: Yi-Chiao Wu, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: In this paper, we investigate the effectiveness of a quasi-periodic WaveNet (QPNet) vocoder combined with a statistical spectral conversion technique for a voice conversion task. The WaveNet (WN) vocoder has been applied as the waveform generation module in many different voice conversion frameworks and achieves significant improvement over conventional vocoders. However, because of the fixed dila… ▽ More

    Submitted 22 March, 2020; v1 submitted 21 July, 2019; originally announced July 2019.

    Comments: 6pages, 7figures, Proc. SSW10, 2019

  21. arXiv:1907.00797  [pdf

    eess.AS cs.SD

    Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation

    Authors: Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda

    Abstract: In this paper, we propose a quasi-periodic neural network (QPNet) vocoder with a novel network architecture named pitch-dependent dilated convolution (PDCNN) to improve the pitch controllability of WaveNet (WN) vocoder. The effectiveness of the WN vocoder to generate high-fidelity speech samples from given acoustic features has been proved recently. However, because of the fixed dilated convolutio… ▽ More

    Submitted 22 March, 2020; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: 5 pages, 4 figures, Proc. Interspeech, 2019

  22. arXiv:1905.00615  [pdf, other

    eess.AS cs.SD

    Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

    Authors: Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

    Abstract: In this work, we investigate the effectiveness of two techniques for improving variational autoencoder (VAE) based voice conversion (VC). First, we reconsider the relationship between vocoder features extracted using the high quality vocoders adopted in conventional VC systems, and hypothesize that the spectral features are in fact F0 dependent. Such hypothesis implies that during the conversion p… ▽ More

    Submitted 8 July, 2019; v1 submitted 2 May, 2019; originally announced May 2019.

    Comments: 5 pages, 6 figures, 3 tables; Accepted to Interspeech 2019

  23. Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

    Authors: Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

    Abstract: This paper presents a refinement framework of WaveNet vocoders for variational autoencoder (VAE) based voice conversion (VC), which reduces the quality distortion caused by the mismatch between the training data and testing data. Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch oft… ▽ More

    Submitted 8 July, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: 5 pages, 7 figures, 1 table. Accepted to EUSIPCO 2019

  24. arXiv:1810.09137  [pdf, other

    stat.ML cs.LG cs.SD eess.AS

    DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score

    Authors: Yuma Koizumi, Kenta Niwa, Yusuke Hioka, Kazunori Kobayashi, Yoichi Haneda

    Abstract: We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squa… ▽ More

    Submitted 22 October, 2018; originally announced October 2018.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol.26, Issue.10, 2018

  25. arXiv:1804.11055  [pdf

    eess.AS cs.SD

    Collapsed speech segment detection and suppression for WaveNet vocoder

    Authors: Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing, Tomoki Toda

    Abstract: In this paper, we propose a technique to alleviate the quality degradation caused by collapsed speech segments sometimes generated by the WaveNet vocoder. The effectiveness of the WaveNet vocoder for generating natural speech from acoustic features has been proved in recent works. However, it sometimes generates very noisy speech with collapsed speech segments when only a limited amount of trainin… ▽ More

    Submitted 9 August, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

    Comments: 5 pages, 6 figures. Proc. Interspeech, 2018