Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Ishikawa, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.06227  [pdf

    cs.CL cs.AI cs.SD eess.AS

    FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks

    Authors: Min Ma, Yuma Koizumi, Shigeki Karita, Heiga Zen, Jason Riesa, Haruko Ishikawa, Michiel Bacchiani

    Abstract: This paper introduces FLEURS-R, a speech restoration applied version of the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) corpus. FLEURS-R maintains an N-way parallel speech corpus in 102 languages as FLEURS, with improved audio quality and fidelity by applying the speech restoration model Miipher. The aim of FLEURS-R is to advance speech technology in more languages… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Journal ref: INTERSPEECH 2024

  2. arXiv:2403.12530  [pdf, other

    cs.CV

    PCT: Perspective Cue Training Framework for Multi-Camera BEV Segmentation

    Authors: Haruya Ishikawa, Takumi Iida, Yoshinori Konishi, Yoshimitsu Aoki

    Abstract: Generating annotations for bird's-eye-view (BEV) segmentation presents significant challenges due to the scenes' complexity and the high manual annotation cost. In this work, we address these challenges by leveraging the abundance of unlabeled data available. We propose the Perspective Cue Training (PCT) framework, a novel training framework that utilizes pseudo-labels generated from unlabeled per… ▽ More

    Submitted 15 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 13 pages, 5 figures; Accepted to IROS 2024

  3. Data-Dependent Higher-Order Clique Selection for Artery-Vein Segmentation by Energy Minimization

    Authors: Yoshiro Kitamura, Yuanzhong Li, Wataru Ito, Hiroshi Ishikawa

    Abstract: We propose a novel segmentation method based on energy minimization of higher-order potentials. We introduce higher-order terms into the energy to incorporate prior knowledge on the shape of the segments. The terms encourage certain sets of pixels to be entirely in one segment or the other. The sets can for instance be smooth curves in order to help delineate pulmonary vessels, which are known to… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Journal ref: International Journal of Computer Vision 117, 142-158(2016)

  4. arXiv:2306.04530  [pdf

    cs.CL

    Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency

    Authors: Shigeki Karita, Richard Sproat, Haruko Ishikawa

    Abstract: Word error rate (WER) and character error rate (CER) are standard metrics in Speech Recognition (ASR), but one problem has always been alternative spellings: If one's system transcribes adviser whereas the ground truth has advisor, this will count as an error even though the two spellings really represent the same word. Japanese is notorious for ``lacking orthography'': most words can be spelled… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: ACL Workshop on Computation and Written Language (CAWL) 2023

  5. arXiv:2304.09427  [pdf, other

    cs.CV

    Boosting Semantic Segmentation with Semantic Boundaries

    Authors: Haruya Ishikawa, Yoshimitsu Aoki

    Abstract: In this paper, we present the Semantic Boundary Conditioned Backbone (SBCB) framework, a simple yet effective training framework that is model-agnostic and boosts segmentation performance, especially around the boundaries. Motivated by the recent development in improving semantic segmentation by incorporating boundaries as auxiliary tasks, we propose a multi-task framework that uses semantic bound… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: 28 pages, Code available at https://github.com/haruishi43/boundary_boost_mmseg

  6. arXiv:2303.09054  [pdf, other

    cs.CV cs.RO

    FindView: Precise Target View Localization Task for Look Around Agents

    Authors: Haruya Ishikawa, Yoshimitsu Aoki

    Abstract: With the increase in demands for service robots and automated inspection, agents need to localize in its surrounding environment to achieve more natural communication with humans by shared contexts. In this work, we propose a novel but straightforward task of precise target view localization for look around agents called the FindView task. This task imitates the movements of PTZ cameras or user in… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 19 pages, 7 figures, preprint, code available in https://github.com/haruishi43/look_around

  7. arXiv:2212.00567  [pdf, other

    cs.CV cs.RO

    P2Net: A Post-Processing Network for Refining Semantic Segmentation of LiDAR Point Cloud based on Consistency of Consecutive Frames

    Authors: Yutaka Momma, Weimin Wang, Edgar Simo-Serra, Satoshi Iizuka, Ryosuke Nakamura, Hiroshi Ishikawa

    Abstract: We present a lightweight post-processing method to refine the semantic segmentation results of point cloud sequences. Most existing methods usually segment frame by frame and encounter the inherent ambiguity of the problem: based on a measurement in a single frame, labels are sometimes difficult to predict even for humans. To remedy this problem, we propose to explicitly train a network to refine… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

  8. Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation

    Authors: Llion Jones, Richard Sproat, Haruko Ishikawa, Alexander Gutkin

    Abstract: If one sees the place name Houston Mercer Dog Run in New York, how does one know how to pronounce it? Assuming one knows that Houston in New York is pronounced "how-ston" and not like the Texas city, then one can probably guess that "how-ston" is also used in the name of the dog park. We present a novel architecture that learns to use the pronunciations of neighboring names in order to guess the p… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: 16 pages, to appear Transactions of the Association for Computational Linguistics

  9. arXiv:2103.02083  [pdf, other

    cs.CV

    Uncertainty guided semi-supervised segmentation of retinal layers in OCT images

    Authors: Suman Sedai, Bhavna Antony, Ravneet Rai, Katie Jones, Hiroshi Ishikawa, Joel Schuman, Wollstein Gadi, Rahil Garnavi

    Abstract: Deep convolutional neural networks have shown outstanding performance in medical image segmentation tasks. The usual problem when training supervised deep learning methods is the lack of labeled data which is time-consuming and costly to obtain. In this paper, we propose a novel uncertainty-guided semi-supervised learning based on a student-teacher approach for training the segmentation network us… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: MICCAI,19

    Journal ref: MICCAI 2019 pp 282-290

  10. arXiv:2008.08024  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Self-supervised Denoising via Diffeomorphic Template Estimation: Application to Optical Coherence Tomography

    Authors: Guillaume Gisbert, Neel Dey, Hiroshi Ishikawa, Joel Schuman, James Fishbaugh, Guido Gerig

    Abstract: Optical Coherence Tomography (OCT) is pervasive in both the research and clinical practice of Ophthalmology. However, OCT images are strongly corrupted by noise, limiting their interpretation. Current OCT denoisers leverage assumptions on noise distributions or generate targets for training deep supervised denoisers via averaging of repeat acquisitions. However, recent self-supervised advances all… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: To be published in MICCAI Ophthalmic Medical Image Analysis 2020. 11 pages, 4 figures, 1 table

  11. arXiv:2007.01522  [pdf, other

    cs.LG stat.ML

    Dueling Deep Q-Network for Unsupervised Inter-frame Eye Movement Correction in Optical Coherence Tomography Volumes

    Authors: Yasmeen M. George, Suman Sedai, Bhavna J. Antony, Hiroshi Ishikawa, Gadi Wollstein, Joel S. Schuman, Rahil Garnavi

    Abstract: In optical coherence tomography (OCT) volumes of retina, the sequential acquisition of the individual slices makes this modality prone to motion artifacts, misalignments between adjacent slices being the most noticeable. Any distortion in OCT volumes can bias structural analysis and influence the outcome of longitudinal studies. On the other hand, presence of speckle noise that is characteristic o… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

  12. arXiv:1908.01428  [pdf, other

    cs.CV

    Inference of visual field test performance from OCT volumes using deep learning

    Authors: Stefan Maetschke, Bhavna Antony, Hiroshi Ishikawa, Gadi Wollstein, Joel Schuman, Rahil Garnavi

    Abstract: Visual field tests (VFT) are pivotal for glaucoma diagnosis and conducted regularly to monitor disease progression. Here we address the question to what degree aggregate VFT measurements such as Visual Field Index (VFI) and Mean Deviation (MD) can be inferred from Optical Coherence Tomography (OCT) scans of the Optic Nerve Head (ONH) or the macula. Accurate inference of VFT measurements from OCT c… ▽ More

    Submitted 10 October, 2019; v1 submitted 4 August, 2019; originally announced August 2019.

    Comments: 12 pages, 3 figures

  13. A feature agnostic approach for glaucoma detection in OCT volumes

    Authors: Stefan Maetschke, Bhavna Antony, Hiroshi Ishikawa, Gadi Wollstein, Joel S. Schuman, Rahil Garnavi

    Abstract: Optical coherence tomography (OCT) based measurements of retinal layer thickness, such as the retinal nerve fibre layer (RNFL) and the ganglion cell with inner plexiform layer (GCIPL) are commonly used for the diagnosis and monitoring of glaucoma. Previously, machine learning techniques have utilized segmentation-based imaging features such as the peripapillary RNFL thickness and the cup-to-disc r… ▽ More

    Submitted 23 October, 2019; v1 submitted 12 July, 2018; originally announced July 2018.

    Comments: 13 pages,3 figures

  14. arXiv:1703.08966  [pdf, other

    cs.CV

    Mastering Sketching: Adversarial Augmentation for Structured Prediction

    Authors: Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    Abstract: We present an integral framework for training sketch simplification networks that convert challenging rough sketches into clean line drawings. Our approach augments a simplification network with a discriminator network, training both networks jointly so that the discriminator network discerns whether a line drawing is a real training data or the output of the simplification network, which in turn… ▽ More

    Submitted 27 March, 2017; originally announced March 2017.

    Comments: 12 pages, 14 figures

  15. arXiv:0711.4508  [pdf, ps, other

    cs.CC cs.CV cs.IT

    Representation and Measure of Structural Information

    Authors: Hiroshi Ishikawa

    Abstract: We introduce a uniform representation of general objects that captures the regularities with respect to their structure. It allows a representation of a general class of objects including geometric patterns and images in a sparse, modular, hierarchical, and recursive manner. The representation can exploit any computable regularity in objects to compactly describe them, while also being capable o… ▽ More

    Submitted 11 June, 2008; v1 submitted 28 November, 2007; originally announced November 2007.

    Comments: Second version. Revised the Introduction and added more discussion in the last section. The technical content is mostly unchanged. 51 pages, 4 figures

    ACM Class: F.1.1; H.1.1; I.2.10