Search | arXiv e-print repository

You Only Look at One: Category-Level Object Representations for Pose Estimation From a Single Example

Authors: Walter Goodwin, Ioannis Havoutis, Ingmar Posner

Abstract: In order to meaningfully interact with the world, robot manipulators must be able to interpret objects they encounter. A critical aspect of this interpretation is pose estimation: inferring quantities that describe the position and orientation of an object in 3D space. Most existing approaches to pose estimation make limiting assumptions, often working only for specific, known object instances, or… ▽ More In order to meaningfully interact with the world, robot manipulators must be able to interpret objects they encounter. A critical aspect of this interpretation is pose estimation: inferring quantities that describe the position and orientation of an object in 3D space. Most existing approaches to pose estimation make limiting assumptions, often working only for specific, known object instances, or at best generalising to an object category using large pose-labelled datasets. In this work, we present a method for achieving category-level pose estimation by inspection of just a single object from a desired category. We show that we can subsequently perform accurate pose estimation for unseen objects from an inspected category, and considerably outperform prior work by exploiting multi-view correspondences. We demonstrate that our method runs in real-time, enabling a robot manipulator equipped with an RGBD sensor to perform online 6D pose estimation for novel objects. Finally, we showcase our method in a continual learning setting, with a robot able to determine whether objects belong to known categories, and if not, use active perception to produce a one-shot category representation for subsequent pose estimation. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 16 pages, 6 figures, CoRL 2022

arXiv:2210.11779 [pdf, other]

doi 10.1109/LRA.2022.3152697

Reaching Through Latent Space: From Joint Statistics to Path Planning in Manipulation

Authors: Chia-Man Hung, Shaohong Zhong, Walter Goodwin, Oiwi Parker Jones, Martin Engelcke, Ioannis Havoutis, Ingmar Posner

Abstract: We present a novel approach to path planning for robotic manipulators, in which paths are produced via iterative optimisation in the latent space of a generative model of robot poses. Constraints are incorporated through the use of constraint satisfaction classifiers operating on the same space. Optimisation leverages gradients through our learned models that provide a simple way to combine goal r… ▽ More We present a novel approach to path planning for robotic manipulators, in which paths are produced via iterative optimisation in the latent space of a generative model of robot poses. Constraints are incorporated through the use of constraint satisfaction classifiers operating on the same space. Optimisation leverages gradients through our learned models that provide a simple way to combine goal reaching objectives with constraint satisfaction, even in the presence of otherwise non-differentiable constraints. Our models are trained in a task-agnostic manner on randomly sampled robot poses. In baseline comparisons against a number of widely used planners, we achieve commensurate performance in terms of task success, planning time and path length, performing successful path planning with obstacle avoidance on a real 7-DoF robot arm. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: 10 pages, 6 figures, 4 tables

ACM Class: I.2.6; I.2.9; I.2.10

Journal ref: IEEE Robotics and Automation Letters 7.2 (2022): 5334-5341

arXiv:2204.03635 [pdf, other]

Zero-Shot Category-Level Object Pose Estimation

Authors: Walter Goodwin, Sagar Vaze, Ioannis Havoutis, Ingmar Posner

Abstract: Object pose estimation is an important component of most vision pipelines for embodied agents, as well as in 3D vision more generally. In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. This extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models for training or inference. Spec… ▽ More Object pose estimation is an important component of most vision pipelines for embodied agents, as well as in 3D vision more generally. In this paper we tackle the problem of estimating the pose of novel object categories in a zero-shot manner. This extends much of the existing literature by removing the need for pose-labelled datasets or category-specific CAD models for training or inference. Specifically, we make the following contributions. First, we formalise the zero-shot, category-level pose estimation problem and frame it in a way that is most applicable to real-world embodied agents. Secondly, we propose a novel method based on semantic correspondences from a self-supervised vision transformer to solve the pose estimation problem. We further re-purpose the recent CO3D dataset to present a controlled and realistic test setting. Finally, we demonstrate that all baselines for our proposed task perform poorly, and show that our method provides a six-fold improvement in average rotation accuracy at 30 degrees. Our code is available at https://github.com/applied-ai-lab/zero-shot-pose. △ Less

Submitted 2 October, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: 28 pages, 6 figures

Journal ref: ECCV 2022

arXiv:2201.08115 [pdf, other]

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Authors: Sasha Salter, Kristian Hartikainen, Walter Goodwin, Ingmar Posner

Abstract: The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized reinforcement learning individually hold promise here, arguably a hybr… ▽ More The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized reinforcement learning individually hold promise here, arguably a hybrid approach could combine their respective benefits. Key to these fields is the use of information asymmetry across architectural modules to bias which skills are learnt. While asymmetry choice has a large influence on transferability, existing methods base their choice primarily on intuition in a domain-independent, potentially sub-optimal, manner. In this paper, we theoretically and empirically show the crucial expressivity-transferability trade-off of skills across sequential tasks, controlled by information asymmetry. Given this insight, we introduce Attentive Priors for Expressive and Transferable Skills (APES), a hierarchical KL-regularized method, heavily benefiting from both priors and hierarchy. Unlike existing approaches, APES automates the choice of asymmetry by learning it in a data-driven, domain-dependent, way based on our expressivity-transferability theorems. Experiments over complex transfer domains of varying levels of extrapolation and sparsity, such as robot block stacking, demonstrate the criticality of the correct asymmetric choice, with APES drastically outperforming previous methods. △ Less

Submitted 24 April, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

Journal ref: Published at the International Conference on Learning Representations, 2023

arXiv:2111.07975 [pdf, other]

Semantically Grounded Object Matching for Robust Robotic Scene Rearrangement

Authors: Walter Goodwin, Sagar Vaze, Ioannis Havoutis, Ingmar Posner

Abstract: Object rearrangement has recently emerged as a key competency in robot manipulation, with practical solutions generally involving object detection, recognition, grasping and high-level planning. Goal-images describing a desired scene configuration are a promising and increasingly used mode of instruction. A key outstanding challenge is the accurate inference of matches between objects in front of… ▽ More Object rearrangement has recently emerged as a key competency in robot manipulation, with practical solutions generally involving object detection, recognition, grasping and high-level planning. Goal-images describing a desired scene configuration are a promising and increasingly used mode of instruction. A key outstanding challenge is the accurate inference of matches between objects in front of a robot, and those seen in a provided goal image, where recent works have struggled in the absence of object-specific training data. In this work, we explore the deterioration of existing methods' ability to infer matches between objects as the visual shift between observed and goal scenes increases. We find that a fundamental limitation of the current setting is that source and target images must contain the same $\textit{instance}$ of every object, which restricts practical deployment. We present a novel approach to object matching that uses a large pre-trained vision-language model to match objects in a cross-instance setting by leveraging semantics together with visual features as a more robust, and much more general, measure of similarity. We demonstrate that this provides considerably improved matching performance in cross-instance settings, and can be used to guide multi-object rearrangement with a robot manipulator from an image that shares no object $\textit{instances}$ with the robot's scene. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: 8 pages, 5 figures

arXiv:1304.1495 [pdf]

Uncertainty and Incompleteness

Authors: Piero P. Bonissone, David A. Cyrluk, James W. Goodwin, Jonathan Stillman

Abstract: Two major difficulties in using default logics are their intractability and the problem of selecting among multiple extensions. We propose an approach to these problems based on integrating nommonotonic reasoning with plausible reasoning based on triangular norms. A previously proposed system for reasoning with uncertainty (RUM) performs uncertain monotonic inferences on an acyclic graph. We ha… ▽ More Two major difficulties in using default logics are their intractability and the problem of selecting among multiple extensions. We propose an approach to these problems based on integrating nommonotonic reasoning with plausible reasoning based on triangular norms. A previously proposed system for reasoning with uncertainty (RUM) performs uncertain monotonic inferences on an acyclic graph. We have extended RUM to allow nommonotonic inferences and cycles within nonmonotonic rules. By restricting the size and complexity of the nommonotonic cycles we can still perform efficient inferences. Uncertainty measures provide a basis for deciding among multiple defaults. Different algorithms and heuristics for finding the optimal defaults are discussed. △ Less

Submitted 27 March, 2013; originally announced April 2013.

Comments: Appears in Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence (UAI1989)

Report number: UAI-P-1989-PG-34-45

Showing 1–6 of 6 results for author: Goodwin, W