Search | arXiv e-print repository

World-Model-Based Control for Industrial box-packing of Multiple Objects using NewtonianVAE

Authors: Yusuke Kato, Ryo Okumura, Tadahiro Taniguchi

Abstract: The process of industrial box-packing, which involves the accurate placement of multiple objects, requires high-accuracy positioning and sequential actions. When a robot is tasked with placing an object at a specific location with high accuracy, it is important not only to have information about the location of the object to be placed, but also the posture of the object grasped by the robotic hand… ▽ More The process of industrial box-packing, which involves the accurate placement of multiple objects, requires high-accuracy positioning and sequential actions. When a robot is tasked with placing an object at a specific location with high accuracy, it is important not only to have information about the location of the object to be placed, but also the posture of the object grasped by the robotic hand. Often, industrial box-packing requires the sequential placement of identically shaped objects into a single box. The robot's action should be determined by the same learned model. In factories, new kinds of products often appear and there is a need for a model that can easily adapt to them. Therefore, it should be easy to collect data to train the model. In this study, we designed a robotic system to automate real-world industrial tasks, employing a vision-based learning control model. We propose in-hand-view-sensitive Newtonian variational autoencoder (ihVS-NVAE), which employs an RGB camera to obtain in-hand postures of objects. We demonstrate that our model, trained for a single object-placement task, can handle sequential tasks without additional training. To evaluate efficacy of the proposed model, we employed a real robot to perform sequential industrial box-packing of multiple objects. Results showed that the proposed model achieved a 100% success rate in industrial box-packing tasks, thereby outperforming the state-of-the-art and conventional approaches, underscoring its superior effectiveness and potential in industrial tasks. △ Less

Submitted 3 April, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: 7 pages, 8 figures

arXiv:2307.15345 [pdf, other]

Learning Compliant Stiffness by Impedance Control-Aware Task Segmentation and Multi-objective Bayesian Optimization with Priors

Authors: Masashi Okada, Mayumi Komatsu, Ryo Okumura, Tadahiro Taniguchi

Abstract: Rather than traditional position control, impedance control is preferred to ensure the safe operation of industrial robots programmed from demonstrations. However, variable stiffness learning studies have focused on task performance rather than safety (or compliance). Thus, this paper proposes a novel stiffness learning method to satisfy both task performance and compliance requirements. The propo… ▽ More Rather than traditional position control, impedance control is preferred to ensure the safe operation of industrial robots programmed from demonstrations. However, variable stiffness learning studies have focused on task performance rather than safety (or compliance). Thus, this paper proposes a novel stiffness learning method to satisfy both task performance and compliance requirements. The proposed method optimizes the task and compliance objectives (T/C objectives) simultaneously via multi-objective Bayesian optimization. We define the stiffness search space by segmenting a demonstration into task phases, each with constant responsible stiffness. The segmentation is performed by identifying impedance control-aware switching linear dynamics (IC-SLD) from the demonstration. We also utilize the stiffness obtained by proposed IC-SLD as priors for efficient optimization. Experiments on simulated tasks and a real robot demonstrate that IC-SLD-based segmentation and the use of priors improve the optimization efficiency compared to existing baseline methods. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: Accepted to IROS2023

arXiv:2305.19936 [pdf, ps, other]

Metropolis-Hastings algorithm in joint-attention naming game: Experimental semiotics study

Authors: Ryota Okumura, Tadahiro Taniguchi, Yosinobu Hagiwara, Akira Taniguchi

Abstract: In this study, we explore the emergence of symbols during interactions between individuals through an experimental semiotic study. Previous studies investigate how humans organize symbol systems through communication using artificially designed subjective experiments. In this study, we have focused on a joint attention-naming game (JA-NG) in which participants independently categorize objects and… ▽ More In this study, we explore the emergence of symbols during interactions between individuals through an experimental semiotic study. Previous studies investigate how humans organize symbol systems through communication using artificially designed subjective experiments. In this study, we have focused on a joint attention-naming game (JA-NG) in which participants independently categorize objects and assign names while assuming their joint attention. In the theory of the Metropolis-Hastings naming game (MHNG), listeners accept provided names according to the acceptance probability computed using the Metropolis-Hastings (MH) algorithm. The theory of MHNG suggests that symbols emerge as an approximate decentralized Bayesian inference of signs, which is represented as a shared prior variable if the conditions of MHNG are satisfied. This study examines whether human participants exhibit behavior consistent with MHNG theory when playing JA-NG. By comparing human acceptance decisions of a partner's naming with acceptance probabilities computed in the MHNG, we tested whether human behavior is consistent with the MHNG theory. The main contributions of this study are twofold. First, we reject the null hypothesis that humans make acceptance judgments with a constant probability, regardless of the acceptance probability calculated by the MH algorithm. This result suggests that people followed the acceptance probability computed by the MH algorithm to some extent. Second, the MH-based model predicted human acceptance/rejection behavior more accurately than the other four models: Constant, Numerator, Subtraction, and Binary. This result indicates that symbol emergence in JA-NG can be explained using MHNG and is considered an approximate decentralized Bayesian inference. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2206.04003 [pdf, other]

Patch-based Object-centric Transformers for Efficient Video Generation

Authors: Wilson Yan, Ryo Okumura, Stephen James, Pieter Abbeel

Abstract: In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos. We build upon prior work in video prediction via an autoregressive transformer over the discrete latent space of compressed videos, with an added modification to model object-cent… ▽ More In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos. We build upon prior work in video prediction via an autoregressive transformer over the discrete latent space of compressed videos, with an added modification to model object-centric information via bounding boxes. Due to better compressibility of object-centric representations, we can improve training efficiency by allowing the model to only access object information for longer horizon temporal information. When evaluated on various difficult object-centric datasets, our method achieves better or equal performance to other video generation models, while remaining computationally more efficient and scalable. In addition, we show that our method is able to perform object-centric controllability through bounding box manipulation, which may aid downstream tasks such as video editing, or visual planning. Samples are available at https://sites.google.com/view/povt-public △ Less

Submitted 18 June, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: Project Website: https://sites.google.com/view/povt-public

arXiv:2203.12952 [pdf, other]

Feasibility Study of Magnetism-based Indoor Positioning Methods in an Incineration Plant

Authors: Rei Okumura, Ismail Arai, Atarashi Yutaro, Kawabata Kaoru, Kazutoshi Fujikawa

Abstract: In an incineration plant, remote operation from a centralized control room is now possible, but inspection and cleaning of equipment still require a worker to visit the site. When the plant owner reduces the number of workers due to operation costs, it will be standard for a single worker to visit the site. Therefore, it is necessary to monitor the location of workers in real-time to detect unexpe… ▽ More In an incineration plant, remote operation from a centralized control room is now possible, but inspection and cleaning of equipment still require a worker to visit the site. When the plant owner reduces the number of workers due to operation costs, it will be standard for a single worker to visit the site. Therefore, it is necessary to monitor the location of workers in real-time to detect unexpected human accidents quickly. Conventional methods use radio waves, such as Wi-Fi and Bluetooth, but there is little demand for communication equipment in the incineration plant. However, there is not enough demand for communication facilities in the incineration plant. It is too large to bear the cost of installing wireless access points, and Bluetooth Low Energy (BLE) beacons just for positioning. Therefore, we are focusing on magnetism using for indoor positioning method. In addition, the incineration plant has a lot of types of equipment that contains a wide range of magnetized metals, large motors, and generators. We could observe the magnetic peculiarity at each point. Based on these assumptions, we have developed a new indoor positioning method at the incineration plant. This paper describes the development of an indoor positioning system for an incineration plant. And we propose three methods for fingerprinting matching: Point matching, Path matching, and DTW matching. The average positioning errors of these methods are 6.89 m, 0.05 m, and 0.06 m, respectively. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: 6 pages

arXiv:2203.11024 [pdf, other]

Multi-View Dreaming: Multi-View World Model with Contrastive Learning

Authors: Akira Kinose, Masashi Okada, Ryo Okumura, Tadahiro Taniguchi

Abstract: In this paper, we propose Multi-View Dreaming, a novel reinforcement learning agent for integrated recognition and control from multi-view observations by extending Dreaming. Most current reinforcement learning method assumes a single-view observation space, and this imposes limitations on the observed data, such as lack of spatial information and occlusions. This makes obtaining ideal observation… ▽ More In this paper, we propose Multi-View Dreaming, a novel reinforcement learning agent for integrated recognition and control from multi-view observations by extending Dreaming. Most current reinforcement learning method assumes a single-view observation space, and this imposes limitations on the observed data, such as lack of spatial information and occlusions. This makes obtaining ideal observational information from the environment difficult and is a bottleneck for real-world robotics applications. In this paper, we use contrastive learning to train a shared latent space between different viewpoints, and show how the Products of Experts approach can be used to integrate and control the probability distributions of latent states for multiple viewpoints. We also propose Multi-View DreamingV2, a variant of Multi-View Dreaming that uses a categorical distribution to model the latent state instead of the Gaussian distribution. Experiments show that the proposed method outperforms simple extensions of existing methods in a realistic robot control task. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 7 pages, 8 figures

arXiv:2203.05955 [pdf, other]

Tactile-Sensitive NewtonianVAE for High-Accuracy Industrial Connector Insertion

Authors: Ryo Okumura, Nobuki Nishio, Tadahiro Taniguchi

Abstract: An industrial connector insertion task requires submillimeter positioning and grasp pose compensation for a plug. Thus, highly accurate estimation of the relative pose between a plug and socket is fundamental for achieving the task. World models are promising technologies for visuomotor control because they obtain appropriate state representation to jointly optimize feature extraction and latent d… ▽ More An industrial connector insertion task requires submillimeter positioning and grasp pose compensation for a plug. Thus, highly accurate estimation of the relative pose between a plug and socket is fundamental for achieving the task. World models are promising technologies for visuomotor control because they obtain appropriate state representation to jointly optimize feature extraction and latent dynamics model. Recent studies show that the NewtonianVAE, a type of the world model, acquires latent space equivalent to mapping from images to physical coordinates. Proportional control can be achieved in the latent space of NewtonianVAE. However, applying NewtonianVAE to high-accuracy industrial tasks in physical environments is an open problem. Moreover, the existing framework does not consider the grasp pose compensation in the obtained latent space. In this work, we proposed tactile-sensitive NewtonianVAE and applied it to a USB connector insertion with grasp pose variation in the physical environments. We adopted a GelSight-type tactile sensor and estimated the insertion position compensated by the grasp pose of the plug. Our method trains the latent space in an end-to-end manner, and no additional engineering and annotation are required. Simple proportional control is available in the obtained latent space. Moreover, we showed that the original NewtonianVAE fails in some situations, and demonstrated that domain knowledge induction improves model accuracy. This domain knowledge can be easily obtained using robot specification and grasp pose error measurement. We demonstrated that our proposed method achieved a 100\% success rate and 0.3 mm positioning accuracy in the USB connector insertion task in the physical environment. It outperformed SOTA CNN-based two-stage goal pose regression with grasp pose compensation using coordinate transformation. △ Less

Submitted 2 August, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: 7 pages, 4 figures

arXiv:2001.11628 [pdf, other]

Domain-Adversarial and Conditional State Space Model for Imitation Learning

Authors: Ryo Okumura, Masashi Okada, Tadahiro Taniguchi

Abstract: State representation learning (SRL) in partially observable Markov decision processes has been studied to learn abstract features of data useful for robot control tasks. For SRL, acquiring domain-agnostic states is essential for achieving efficient imitation learning. Without these states, imitation learning is hampered by domain-dependent information useless for control. However, existing methods… ▽ More State representation learning (SRL) in partially observable Markov decision processes has been studied to learn abstract features of data useful for robot control tasks. For SRL, acquiring domain-agnostic states is essential for achieving efficient imitation learning. Without these states, imitation learning is hampered by domain-dependent information useless for control. However, existing methods fail to remove such disturbances from the states when the data from experts and agents show large domain shifts. To overcome this issue, we propose a domain-adversarial and conditional state space model (DAC-SSM) that enables control systems to obtain domain-agnostic and task- and dynamics-aware states. DAC-SSM jointly optimizes the state inference, observation reconstruction, forward dynamics, and reward models. To remove domain-dependent information from the states, the model is trained with domain discriminators in an adversarial manner, and the reconstruction is conditioned on domain labels. We experimentally evaluated the model predictive control performance via imitation learning for continuous control of sparse reward tasks in simulators and compared it with the performance of the existing SRL method. The agents from DAC-SSM achieved performance comparable to experts and more than twice the baselines. We conclude domain-agnostic states are essential for imitation learning that has large domain shifts and can be obtained using DAC-SSM. △ Less

Submitted 4 June, 2021; v1 submitted 30 January, 2020; originally announced January 2020.

Comments: Published at IROS 2020

Showing 1–8 of 8 results for author: Okumura, R