Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Saito, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11436  [pdf, other

    cs.RO

    APriCoT: Action Primitives based on Contact-state Transition for In-Hand Tool Manipulation

    Authors: Daichi Saito, Atsushi Kanehira, Kazuhiro Sasabuchi, Naoki Wake, Jun Takamatsu, Hideki Koike, Katsushi Ikeuchi

    Abstract: In-hand tool manipulation is an operation that not only manipulates a tool within the hand (i.e., in-hand manipulation) but also achieves a grasp suitable for a task after the manipulation. This study aims to achieve an in-hand tool manipulation skill through deep reinforcement learning. The difficulty of learning the skill arises because this manipulation requires (A) exploring long-term contact-… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11370  [pdf, other

    cs.SD cs.CL eess.AS

    A Pilot Study of GSLM-based Simulation of Foreign Accentuation Only Using Native Speech Corpora

    Authors: Kentaro Onda, Joonyong Park, Nobuaki Minematsu, Daisuke Saito

    Abstract: We propose a method of simulating the human process of foreign accentuation using Generative Spoken Language Model (GSLM) only with native speech corpora. When one listens to spoken words of a foreign language and repeats them, the repeated speech is often with the accent of that listener's L1. This is said to be because the spoken words are mentally represented as a sequence of phonological units… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH2024

  3. arXiv:2403.02316  [pdf, other

    cs.RO

    Designing Library of Skill-Agents for Hardware-Level Reusability

    Authors: Jun Takamatsu, Daichi Saito, Katsushi Ikeuchi, Atsushi Kanehira, Kazuhiro Sasabuchi, Naoki Wake

    Abstract: To use new robot hardware in a new environment, it is necessary to develop a control program tailored to that specific robot in that environment. Considering the reusability of software among robots is crucial to minimize the effort involved in this process and maximize software reuse across different robots in different environments. This paper proposes a method to remedy this process by consider… ▽ More

    Submitted 20 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  4. arXiv:2402.18091  [pdf, other

    cs.CV cs.AI cs.CL

    Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

    Authors: Yuiga Wada, Kanta Kaneda, Daichi Saito, Komei Sugiura

    Abstract: Establishing an automatic evaluation metric that closely aligns with human judgments is essential for effectively developing image captioning models. Recent data-driven metrics have demonstrated a stronger correlation with human judgments than classic metrics such as CIDEr; however they lack sufficient capabilities to handle hallucinations and generalize across diverse images and texts partially b… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: CVPR 2024

  5. arXiv:2311.11007  [pdf, other

    cs.RO

    Constraint-aware Policy for Compliant Manipulation

    Authors: Daichi Saito, Kazuhiro Sasabuchi, Naoki Wake, Atsushi Kanehira, Jun Takamatsu, Hideki Koike, Katsushi Ikeuchi

    Abstract: Robot manipulation in a physically-constrained environment requires compliant manipulation. Compliant manipulation is a manipulation skill to adjust hand motion based on the force imposed by the environment. Recently, reinforcement learning (RL) has been applied to solve household operations involving compliant manipulation. However, previous RL methods have primarily focused on designing a policy… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  6. arXiv:2309.09690  [pdf, other

    cs.CL cs.SD eess.AS

    Do learned speech symbols follow Zipf's law?

    Authors: Shinnosuke Takamichi, Hiroki Maeda, Joonyong Park, Daisuke Saito, Hiroshi Saruwatari

    Abstract: In this study, we investigate whether speech symbols, learned through deep learning, follow Zipf's law, akin to natural language symbols. Zipf's law is an empirical law that delineates the frequency distribution of words, forming fundamentals for statistical analysis in natural language processing. Natural language symbols, which are invented by humans to symbolize speech content, are recognized t… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  7. arXiv:2304.00227  [pdf, other

    cs.RO

    Tracker: Model-based Reinforcement Learning for Tracking Control of Human Finger Attached with Thin McKibben Muscles

    Authors: Daichi Saito, Eri Nagatomo, Jefferson Pardomuan, Hideki Koike

    Abstract: To adopt the soft hand exoskeleton to support activities of daily livings, it is necessary to control finger joints precisely with the exoskeleton. The problem of controlling joints to follow a given trajectory is called the tracking control problem. In this study, we focus on the tracking control problem of a human finger attached with thin McKibben muscles. To achieve precise control with thin M… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: 8 pages, 7 figures

  8. arXiv:2301.01382  [pdf, other

    cs.RO

    Task-sequencing Simulator: Integrated Machine Learning to Execution Simulation for Robot Manipulation

    Authors: Kazuhiro Sasabuchi, Daichi Saito, Atsushi Kanehira, Naoki Wake, Jun Takamatsu, Katsushi Ikeuchi

    Abstract: A task-sequencing simulator in robotics manipulation to integrate simulation-for-learning and simulation-for-execution is introduced. Unlike existing machine-learning simulation where a non-decomposed simulation is used to simulate a training scenario, the task-sequencing simulator runs a composed simulation using building blocks. This way, the simulation-for-learning is structured similarly to a… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: 7 pages, 6 figures

  9. arXiv:2203.00733  [pdf, other

    cs.RO

    Task-grasping from human demonstration

    Authors: Daichi Saito, Kazuhiro Sasabuchi, Naoki Wake, Jun Takamatsu, Hideki Koike, Katsushi Ikeuchi

    Abstract: A challenge in robot grasping is to achieve task-grasping which is to select a grasp that is advantageous to the success of tasks before and after grasps. One of the frameworks to address this difficulty is Learning-from-Observation (LfO), which obtains various hints from human demonstrations. This paper solves three issues in the grasping skills in the LfO framework: 1) how to functionally mimic… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

    Comments: 7 pages, 8 figures

  10. Text-driven object affordance for guiding grasp-type recognition in multimodal robot teaching

    Authors: Naoki Wake, Daichi Saito, Kazuhiro Sasabuchi, Hideki Koike, Katsushi Ikeuchi

    Abstract: This study investigates how text-driven object affordance, which provides prior knowledge about grasp types for each object, affects image-based grasp-type recognition in robot teaching. The researchers created labeled datasets of first-person hand images to examine the impact of object affordance on recognition performance. They evaluated scenarios with real and illusory objects, considering mixe… ▽ More

    Submitted 12 May, 2023; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: 8 pages, 11 figures. Last updated March 12, 2023 Accepted for publication in Machine Vision and Applications

  11. arXiv:1910.05528  [pdf, ps, other

    cs.LG

    Preliminary Systematic Literature Review of Machine Learning System Development Process

    Authors: Yasuhiro Watanabe, Hironori Washizaki, Kazunori Sakamoto, Daisuke Saito, Kiyoshi Honda, Naohiko Tsuda, Yoshiaki Fukazawa, Nobukazu Yoshioka

    Abstract: Previous machine learning (ML) system development research suggests that emerging software quality attributes are a concern due to the probabilistic behavior of ML systems. Assuming that detailed development processes depend on individual developers and are not discussed in detail. To help developers to standardize their ML system development processes, we conduct a preliminary systematic literatu… ▽ More

    Submitted 12 October, 2019; originally announced October 2019.

  12. arXiv:1807.11679  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

    Authors: Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu

    Abstract: Recent neural networks such as WaveNet and sampleRNN that learn directly from speech waveform samples have achieved very high-quality synthetic speech in terms of both naturalness and speaker similarity even in multi-speaker text-to-speech synthesis systems. Such neural networks are being used as an alternative to vocoders and hence they are often called neural vocoders. The neural vocoder uses ac… ▽ More

    Submitted 31 July, 2018; originally announced July 2018.

  13. arXiv:1804.08438  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

    Authors: Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhenhua Ling

    Abstract: Voice conversion (VC) aims at conversion of speaker characteristic without altering content. Due to training data limitations and modeling imperfections, it is difficult to achieve believable speaker mimicry without introducing processing artifacts; performance assessment of VC, therefore, usually involves both speaker similarity and quality evaluation by a human panel. As a time-consuming, expens… ▽ More

    Submitted 4 September, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

    Comments: Correction (bug fix) of a published ODYSSEY 2018 publication with the same title and author list; more details in footnote in page 1

  14. arXiv:1804.04262  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

    Authors: Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhenhua Ling

    Abstract: We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems. The objective of the challenge was to perform speaker conversion (i.e. transform the vocal identity) of a source speaker to a target speaker while maintaining linguistic inform… ▽ More

    Submitted 11 April, 2018; originally announced April 2018.

    Comments: Accepted for Speaker Odyssey 2018