Search | arXiv e-print repository

Tilde: Teleoperation for Dexterous In-Hand Manipulation Learning with a DeltaHand

Authors: Zilin Si, Kevin Lee Zhang, Zeynep Temel, Oliver Kroemer

Abstract: Dexterous robotic manipulation remains a challenging domain due to its strict demands for precision and robustness on both hardware and software. While dexterous robotic hands have demonstrated remarkable capabilities in complex tasks, efficiently learning adaptive control policies for hands still presents a significant hurdle given the high dimensionalities of hands and tasks. To bridge this gap,… ▽ More Dexterous robotic manipulation remains a challenging domain due to its strict demands for precision and robustness on both hardware and software. While dexterous robotic hands have demonstrated remarkable capabilities in complex tasks, efficiently learning adaptive control policies for hands still presents a significant hurdle given the high dimensionalities of hands and tasks. To bridge this gap, we propose Tilde, an imitation learning-based in-hand manipulation system on a dexterous DeltaHand. It leverages 1) a low-cost, configurable, simple-to-control, soft dexterous robotic hand, DeltaHand, 2) a user-friendly, precise, real-time teleoperation interface, TeleHand, and 3) an efficient and generalizable imitation learning approach with diffusion policies. Our proposed TeleHand has a kinematic twin design to the DeltaHand that enables precise one-to-one joint control of the DeltaHand during teleoperation. This facilitates efficient high-quality data collection of human demonstrations in the real world. To evaluate the effectiveness of our system, we demonstrate the fully autonomous closed-loop deployment of diffusion policies learned from demonstrations across seven dexterous manipulation tasks with an average 90% success rate. △ Less

Submitted 21 August, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2403.11313 [pdf, other]

Leveraging Simulation-Based Model Preconditions for Fast Action Parameter Optimization with Multiple Models

Authors: M. Yunus Seker, Oliver Kroemer

Abstract: Optimizing robotic action parameters is a significant challenge for manipulation tasks that demand high levels of precision and generalization. Using a model-based approach, the robot must quickly reason about the outcomes of different actions using a predictive model to find a set of parameters that will have the desired effect. The model may need to capture the behaviors of rigid and deformable… ▽ More Optimizing robotic action parameters is a significant challenge for manipulation tasks that demand high levels of precision and generalization. Using a model-based approach, the robot must quickly reason about the outcomes of different actions using a predictive model to find a set of parameters that will have the desired effect. The model may need to capture the behaviors of rigid and deformable objects, as well as objects of various shapes and sizes. Predictive models often need to trade-off speed for prediction accuracy and generalization. This paper proposes a framework that leverages the strengths of multiple predictive models, including analytical, learned, and simulation-based models, to enhance the efficiency and accuracy of action parameter optimization. Our approach uses Model Deviation Estimators (MDEs) to determine the most suitable predictive model for any given state-action parameters, allowing the robot to select models to make fast and precise predictions. We extend the MDE framework by not only learning sim-to-real MDEs, but also sim-to-sim MDEs. Our experiments show that these sim-to-sim MDEs provide significantly faster parameter optimization as well as a basis for efficiently learning sim-to-real MDEs through finetuning. The ease of collecting sim-to-sim training data also allows the robot to learn MDEs based directly on visual inputs and local material properties. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2402.18710 [pdf, other]

Hefty: A Modular Reconfigurable Robot for Advancing Robot Manipulation in Agriculture

Authors: Dominic Guri, Moonyoung Lee, Oliver Kroemer, George Kantor

Abstract: This paper presents a modular, reconfigurable robot platform for robot manipulation in agriculture. While robot manipulation promises great advancements in automating challenging, complex tasks that are currently best left to humans, it is also an expensive capital investment for researchers and users because it demands significantly varying robot configurations depending on the task. Modular robo… ▽ More This paper presents a modular, reconfigurable robot platform for robot manipulation in agriculture. While robot manipulation promises great advancements in automating challenging, complex tasks that are currently best left to humans, it is also an expensive capital investment for researchers and users because it demands significantly varying robot configurations depending on the task. Modular robots provide a way to obtain multiple configurations and reduce costs by enabling incremental acquisition of only the necessary modules. The robot we present, Hefty, is designed to be modular and reconfigurable. It is designed for both researchers and end-users as a means to improve technology transfer from research to real-world application. This paper provides a detailed design and integration process, outlining the critical design decisions that enable modularity in the mobility of the robot as well as its sensor payload, power systems, computing, and fixture mounting. We demonstrate the utility of the robot by presenting five configurations used in multiple real-world agricultural robotics applications. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 8 pages, 11 figures

arXiv:2401.14502 [pdf, other]

MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models

Authors: Saumya Saxena, Mohit Sharma, Oliver Kroemer

Abstract: Leveraging sensing modalities across diverse spatial and temporal resolutions can improve performance of robotic manipulation tasks. Multi-spatial resolution sensing provides hierarchical information captured at different spatial scales and enables both coarse and precise motions. Simultaneously multi-temporal resolution sensing enables the agent to exhibit high reactivity and real-time control. I… ▽ More Leveraging sensing modalities across diverse spatial and temporal resolutions can improve performance of robotic manipulation tasks. Multi-spatial resolution sensing provides hierarchical information captured at different spatial scales and enables both coarse and precise motions. Simultaneously multi-temporal resolution sensing enables the agent to exhibit high reactivity and real-time control. In this work, we propose a framework, MResT (Multi-Resolution Transformer), for learning generalizable language-conditioned multi-task policies that utilize sensing at different spatial and temporal resolutions using networks of varying capacities to effectively perform real time control of precise and reactive tasks. We leverage off-the-shelf pretrained vision-language models to operate on low-frequency global features along with small non-pretrained models to adapt to high frequency local feedback. Through extensive experiments in 3 domains (coarse, precise and dynamic manipulation tasks), we show that our approach significantly improves (2X on average) over recent multi-task baselines. Further, our approach generalizes well to visual and geometric variations in target objects and to varying interaction forces. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: CoRL'23, Project website: http://tinyurl.com/multi-res-realtime-control

arXiv:2401.04007 [pdf, other]

Task-Oriented Active Learning of Model Preconditions for Inaccurate Dynamics Models

Authors: Alex LaGrassa, Moonyoung Lee, Oliver Kroemer

Abstract: When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a \textit{model precondition}. Empirical real-world trajectory data is valuable for defining data-driven model preconditions regardless of the model form (analytical, simulator, learned, etc...). However, real-world data is often… ▽ More When planning with an inaccurate dynamics model, a practical strategy is to restrict planning to regions of state-action space where the model is accurate: also known as a \textit{model precondition}. Empirical real-world trajectory data is valuable for defining data-driven model preconditions regardless of the model form (analytical, simulator, learned, etc...). However, real-world data is often expensive and dangerous to collect. In order to achieve data efficiency, this paper presents an algorithm for actively selecting trajectories to learn a model precondition for an inaccurate pre-specified dynamics model. Our proposed techniques address challenges arising from the sequential nature of trajectories, and potential benefit of prioritizing task-relevant data. The experimental analysis shows how algorithmic properties affect performance in three planning scenarios: icy gridworld, simulated plant watering, and real-world plant watering. Results demonstrate an improvement of approximately 80% after only four real-world trajectories when using our proposed techniques. △ Less

Submitted 23 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: Accepted to International Conference on Robotics and Automation 2024. Will be presented May 2024

arXiv:2311.07479 [pdf, other]

Towards Robotic Tree Manipulation: Leveraging Graph Representations

Authors: Chung Hee Kim, Moonyoung Lee, Oliver Kroemer, George Kantor

Abstract: There is growing interest in automating agricultural tasks that require intricate and precise interaction with specialty crops, such as trees and vines. However, developing robotic solutions for crop manipulation remains a difficult challenge due to complexities involved in modeling their deformable behavior. In this study, we present a framework for learning the deformation behavior of tree-like… ▽ More There is growing interest in automating agricultural tasks that require intricate and precise interaction with specialty crops, such as trees and vines. However, developing robotic solutions for crop manipulation remains a difficult challenge due to complexities involved in modeling their deformable behavior. In this study, we present a framework for learning the deformation behavior of tree-like crops under contact interaction. Our proposed method involves encoding the state of a spring-damper modeled tree crop as a graph. This representation allows us to employ graph networks to learn both a forward model for predicting resulting deformations, and a contact policy for inferring actions to manipulate tree crops. We conduct a comprehensive set of experiments in a simulated environment and demonstrate generalizability of our method on previously unseen trees. Videos can be found on the project website: https://kantor-lab.github.io/tree_gnn △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 7 pages, 10 figures

arXiv:2311.03697 [pdf, other]

Towards Autonomous Crop Monitoring: Inserting Sensors in Cluttered Environments

Authors: Moonyoung Lee, Aaron Berger, Dominic Guri, Kevin Zhang, Lisa Coffee, George Kantor, Oliver Kroemer

Abstract: We present a contact-based phenotyping robot platform that can autonomously insert nitrate sensors into cornstalks to proactively monitor macronutrient levels in crops. This task is challenging because inserting such sensors requires sub-centimeter precision in an environment which contains high levels of clutter, lighting variation, and occlusion. To address these challenges, we develop a robust… ▽ More We present a contact-based phenotyping robot platform that can autonomously insert nitrate sensors into cornstalks to proactively monitor macronutrient levels in crops. This task is challenging because inserting such sensors requires sub-centimeter precision in an environment which contains high levels of clutter, lighting variation, and occlusion. To address these challenges, we develop a robust perception-action pipeline to detect and grasp stalks, and create a custom robot gripper which mechanically aligns the sensor before inserting it into the stalk. Through experimental validation on 48 unique stalks in a cornfield in Iowa, we demonstrate our platform's capability of detecting a stalk with 94% success, grasping a stalk with 90% success, and inserting a sensor with 60% success. In addition to developing an autonomous phenotyping research platform, we share key challenges and insights obtained from deployment in the field. Our research platform is open-sourced, with additional information available at https://kantor-lab.github.io/cornbot. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.11749 [pdf, other]

Estimating Material Properties of Interacting Objects Using Sum-GP-UCB

Authors: M. Yunus Seker, Oliver Kroemer

Abstract: Robots need to estimate the material and dynamic properties of objects from observations in order to simulate them accurately. We present a Bayesian optimization approach to identifying the material property parameters of objects based on a set of observations. Our focus is on estimating these properties based on observations of scenes with different sets of interacting objects. We propose an appr… ▽ More Robots need to estimate the material and dynamic properties of objects from observations in order to simulate them accurately. We present a Bayesian optimization approach to identifying the material property parameters of objects based on a set of observations. Our focus is on estimating these properties based on observations of scenes with different sets of interacting objects. We propose an approach that exploits the structure of the reward function by modeling the reward for each observation separately and using only the parameters of the objects in that scene as inputs. The resulting lower-dimensional models generalize better over the parameter space, which in turn results in a faster optimization. To speed up the optimization process further, and reduce the number of simulation runs needed to find good parameter values, we also propose partial evaluations of the reward function, wherein the selected parameters are only evaluated on a subset of real world evaluations. The approach was successfully evaluated on a set of scenes with a wide range of object interactions, and we showed that our method can effectively perform incremental learning without resetting the rewards of the gathered observations. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2310.05266 [pdf, other]

DELTAHANDS: A Synergistic Dexterous Hand Framework Based on Delta Robots

Authors: Zilin Si, Kevin Zhang, Oliver Kroemer, F. Zeynep Temel

Abstract: Dexterous robotic manipulation in unstructured environments can aid in everyday tasks such as cleaning and caretaking. Anthropomorphic robotic hands are highly dexterous and theoretically well-suited for working in human domains, but their complex designs and dynamics often make them difficult to control. By contrast, parallel-jaw grippers are easy to control and are used extensively in industrial… ▽ More Dexterous robotic manipulation in unstructured environments can aid in everyday tasks such as cleaning and caretaking. Anthropomorphic robotic hands are highly dexterous and theoretically well-suited for working in human domains, but their complex designs and dynamics often make them difficult to control. By contrast, parallel-jaw grippers are easy to control and are used extensively in industrial applications, but they lack the dexterity for various kinds of grasps and in-hand manipulations. In this work, we present DELTAHANDS, a synergistic dexterous hand framework with Delta robots. The DELTAHANDS are soft, easy to reconfigure, simple to manufacture with low-cost off-the-shelf materials, and possess high degrees of freedom that can be easily controlled. DELTAHANDS' dexterity can be adjusted for different applications by leveraging actuation synergies, which can further reduce the control complexity, overall cost, and energy consumption. We characterize the Delta robots' kinematics accuracy, force profiles, and workspace range to assist with hand design. Finally, we evaluate the versatility of DELTAHANDS by grasping a diverse set of objects and by using teleoperation to complete three dexterous manipulation tasks: cloth folding, cap opening, and cable arrangement. We open-source our hand framework at https://sites.google.com/view/deltahands/. △ Less

Submitted 24 December, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2307.00383 [pdf, other]

Enhancing Dexterity in Robotic Manipulation via Hierarchical Contact Exploration

Authors: Xianyi Cheng, Sarvesh Patil, Zeynep Temel, Oliver Kroemer, Matthew T. Mason

Abstract: Planning robot dexterity is challenging due to the non-smoothness introduced by contacts, intricate fine motions, and ever-changing scenarios. We present a hierarchical planning framework for dexterous robotic manipulation (HiDex). This framework explores in-hand and extrinsic dexterity by leveraging contacts. It generates rigid-body motions and complex contact sequences. Our framework is based on… ▽ More Planning robot dexterity is challenging due to the non-smoothness introduced by contacts, intricate fine motions, and ever-changing scenarios. We present a hierarchical planning framework for dexterous robotic manipulation (HiDex). This framework explores in-hand and extrinsic dexterity by leveraging contacts. It generates rigid-body motions and complex contact sequences. Our framework is based on Monte-Carlo Tree Search and has three levels: 1) planning object motions and environment contact modes; 2) planning robot contacts; 3) path evaluation and control optimization. This framework offers two main advantages. First, it allows efficient global reasoning over high-dimensional complex space created by contacts. It solves a diverse set of manipulation tasks that require dexterity, both intrinsic (using the fingers) and extrinsic (also using the environment), mostly in seconds. Second, our framework allows the incorporation of expert knowledge and customizable setups in task mechanics and models. It requires minor modifications to accommodate different scenarios and robots. Hence, it provides a flexible and generalizable solution for various manipulation tasks. As examples, we analyze the results on 7 hand configurations and 15 scenarios. We demonstrate 8 tasks on two robot platforms. △ Less

Submitted 8 November, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

Comments: The IEEE Robotics and Automation Letters

arXiv:2209.15132 [pdf, other]

Dynamic Inference on Graphs using Structured Transition Models

Authors: Saumya Saxena, Oliver Kroemer

Abstract: Enabling robots to perform complex dynamic tasks such as picking up an object in one sweeping motion or pushing off a wall to quickly turn a corner is a challenging problem. The dynamic interactions implicit in these tasks are critical towards the successful execution of such tasks. Graph neural networks (GNNs) provide a principled way of learning the dynamics of interactive systems but can suffer… ▽ More Enabling robots to perform complex dynamic tasks such as picking up an object in one sweeping motion or pushing off a wall to quickly turn a corner is a challenging problem. The dynamic interactions implicit in these tasks are critical towards the successful execution of such tasks. Graph neural networks (GNNs) provide a principled way of learning the dynamics of interactive systems but can suffer from scaling issues as the number of interactions increases. Furthermore, the problem of using learned GNN-based models for optimal control is insufficiently explored. In this work, we present a method for efficiently learning the dynamics of interacting systems by simultaneously learning a dynamic graph structure and a stable and locally linear forward model of the system. The dynamic graph structure encodes evolving contact modes along a trajectory by making probabilistic predictions over the edges of the graph. Additionally, we introduce a temporal dependence in the learned graph structure which allows us to incorporate contact measurement updates during execution thus enabling more accurate forward predictions. The learned stable and locally linear dynamics enable the use of optimal control algorithms such as iLQR for long-horizon planning and control for complex interactive tasks. Through experiments in simulation and in the real world, we evaluate the performance of our method by using the learned interaction dynamics for control and demonstrate generalization to more objects and interactions not seen during training. We introduce a control scheme that takes advantage of contact measurement updates and hence is robust to prediction inaccuracies during execution. △ Less

Submitted 29 September, 2022; originally announced September 2022.

arXiv:2209.14261 [pdf, other]

Focused Adaptation of Dynamics Models for Deformable Object Manipulation

Authors: Peter Mitrano, Alex LaGrassa, Oliver Kroemer, Dmitry Berenson

Abstract: In order to efficiently learn a dynamics model for a task in a new environment, one can adapt a model learned in a similar source environment. However, existing adaptation methods can fail when the target dataset contains transitions where the dynamics are very different from the source environment. For example, the source environment dynamics could be of a rope manipulated in free-space, whereas… ▽ More In order to efficiently learn a dynamics model for a task in a new environment, one can adapt a model learned in a similar source environment. However, existing adaptation methods can fail when the target dataset contains transitions where the dynamics are very different from the source environment. For example, the source environment dynamics could be of a rope manipulated in free-space, whereas the target dynamics could involve collisions and deformation on obstacles. Our key insight is to improve data efficiency by focusing model adaptation on only the regions where the source and target dynamics are similar. In the rope example, adapting the free-space dynamics requires significantly fewer data than adapting the free-space dynamics while also learning collision dynamics. We propose a new method for adaptation that is effective in adapting to regions of similar dynamics. Additionally, we combine this adaptation method with prior work on planning with unreliable dynamics to make a method for data-efficient online adaptation, called FOCUS. We first demonstrate that the proposed adaptation method achieves statistically significantly lower prediction error in regions of similar dynamics on simulated rope manipulation and plant watering tasks. We then show on a bimanual rope manipulation task that FOCUS achieves data-efficient online learning, in simulation and in the real world. △ Less

Submitted 15 March, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: Project Website: https://sites.google.com/view/focused-adaptation-dynamics/home

arXiv:2209.13605 [pdf, other]

Efficient Recovery Learning using Model Predictive Meta-Reasoning

Authors: Shivam Vats, Maxim Likhachev, Oliver Kroemer

Abstract: Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by execution errors and state uncertainty. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furnit… ▽ More Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by execution errors and state uncertainty. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm called Meta-Reasoning for Skill Learning (MetaReSkill) that monitors the progress of all recovery policies during training and allocates training resources to recoveries that are likely to improve the task performance the most. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71% to 92.4% in simulation and from 75% to 90% on a real robot. △ Less

Submitted 9 March, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

Comments: To appear in the International Conference on Robotics and Automation (ICRA) 2023

arXiv:2207.11196 [pdf, other]

Learning to Singulate Layers of Cloth using Tactile Feedback

Authors: Sashank Tirumala, Thomas Weng, Daniel Seita, Oliver Kroemer, Zeynep Temel, David Held

Abstract: Robotic manipulation of cloth has applications ranging from fabrics manufacturing to handling blankets and laundry. Cloth manipulation is challenging for robots largely due to their high degrees of freedom, complex dynamics, and severe self-occlusions when in folded or crumpled configurations. Prior work on robotic manipulation of cloth relies primarily on vision sensors alone, which may pose chal… ▽ More Robotic manipulation of cloth has applications ranging from fabrics manufacturing to handling blankets and laundry. Cloth manipulation is challenging for robots largely due to their high degrees of freedom, complex dynamics, and severe self-occlusions when in folded or crumpled configurations. Prior work on robotic manipulation of cloth relies primarily on vision sensors alone, which may pose challenges for fine-grained manipulation tasks such as grasping a desired number of cloth layers from a stack of cloth. In this paper, we propose to use tactile sensing for cloth manipulation; we attach a tactile sensor (ReSkin) to one of the two fingertips of a Franka robot and train a classifier to determine whether the robot is grasping a specific number of cloth layers. During test-time experiments, the robot uses this classifier as part of its policy to grasp one or two cloth layers using tactile feedback to determine suitable grasping points. Experimental results over 180 physical trials suggest that the proposed method outperforms baselines that do not use tactile feedback and has better generalization to unseen cloth compared to methods that use image classifiers. Code, data, and videos are available at https://sites.google.com/view/reskin-cloth. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: IROS 2022. See https://sites.google.com/view/reskin-cloth for supplementary material

arXiv:2207.00721 [pdf, other]

doi 10.1109/IROS47612.2022.9981257

DeltaZ: An Accessible Compliant Delta Robot Manipulator for Research and Education

Authors: Sarvesh Patil, Samuel C. Alvares, Pragna Mannam, Oliver Kroemer, F. Zeynep Temel

Abstract: This paper presents the DeltaZ robot, a centimeter-scale, low-cost, delta-style robot that allows for a broad range of capabilities and robust functionalities. Current technologies allow DeltaZ to be 3D-printed from soft and rigid materials so that it is easy to assemble and maintain, and lowers the barriers to utilize. Functionality of the robot stems from its three translational degrees of freed… ▽ More This paper presents the DeltaZ robot, a centimeter-scale, low-cost, delta-style robot that allows for a broad range of capabilities and robust functionalities. Current technologies allow DeltaZ to be 3D-printed from soft and rigid materials so that it is easy to assemble and maintain, and lowers the barriers to utilize. Functionality of the robot stems from its three translational degrees of freedom and a closed form kinematic solution which makes manipulation problems more intuitive compared to other manipulators. Moreover, the low cost of the robot presents an opportunity to democratize manipulators for a research setting. We also describe how the robot can be used as a reinforcement learning benchmark. Open-source 3D-printable designs and code are available to the public. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: IROS 2022, first two authors contributed equally

Journal ref: IEEE International Conference on Robotics and Automation (ICRA), 2023, 10324-10330

arXiv:2206.12728 [pdf, other]

Learning Preconditions of Hybrid Force-Velocity Controllers for Contact-Rich Manipulation

Authors: Jacky Liang, Xianyi Cheng, Oliver Kroemer

Abstract: Robots need to manipulate objects in constrained environments like shelves and cabinets when assisting humans in everyday settings like homes and offices. These constraints make manipulation difficult by reducing grasp accessibility, so robots need to use non-prehensile strategies that leverage object-environment contacts to perform manipulation tasks. To tackle the challenge of planning and contr… ▽ More Robots need to manipulate objects in constrained environments like shelves and cabinets when assisting humans in everyday settings like homes and offices. These constraints make manipulation difficult by reducing grasp accessibility, so robots need to use non-prehensile strategies that leverage object-environment contacts to perform manipulation tasks. To tackle the challenge of planning and controlling contact-rich behaviors in such settings, this work uses Hybrid Force-Velocity Controllers (HFVCs) as the skill representation and plans skill sequences with learned preconditions. While HFVCs naturally enable robust and compliant contact-rich behaviors, solvers that synthesize them have traditionally relied on precise object models and closed-loop feedback on object pose, which are difficult to obtain in constrained environments due to occlusions. We first relax HFVCs' need for precise models and feedback with our HFVC synthesis framework, then learn a point-cloud-based precondition function to classify where HFVC executions will still be successful despite modeling inaccuracies. Finally, we use the learned precondition in a search-based task planner to complete contact-rich manipulation tasks in a shelf domain. Our method achieves a task success rate of $73.2\%$, outperforming the $51.5\%$ achieved by a baseline without the learned precondition. While the precondition function is trained in simulation, it can also transfer to a real-world setup without further fine-tuning. See supplementary materials and videos at https://sites.google.com/view/constrained-manipulation/ △ Less

Submitted 28 October, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

arXiv:2206.05573 [pdf, other]

Learning Model Preconditions for Planning with Multiple Models

Authors: Alex LaGrassa, Oliver Kroemer

Abstract: Different models can provide differing levels of fidelity when a robot is planning. Analytical models are often fast to evaluate but only work in limited ranges of conditions. Meanwhile, physics simulators are effective at modeling complex interactions between objects but are typically more computationally expensive. Learning when to switch between the various models can greatly improve the speed… ▽ More Different models can provide differing levels of fidelity when a robot is planning. Analytical models are often fast to evaluate but only work in limited ranges of conditions. Meanwhile, physics simulators are effective at modeling complex interactions between objects but are typically more computationally expensive. Learning when to switch between the various models can greatly improve the speed of planning and task success reliability. In this work, we learn model deviation estimators (MDEs) to predict the error between real-world states and the states outputted by transition models. MDEs can be used to define a model precondition that describes which transitions are accurately modeled. We then propose a planner that uses the learned model preconditions to switch between various models in order to use models in conditions where they are accurate, prioritizing faster models when possible. We evaluate our method on two real-world tasks: placing a rod into a box and placing a rod into a closed drawer. △ Less

Submitted 11 June, 2022; originally announced June 2022.

Comments: Presented at Conference on Robot Learning (CoRL 2021). Revised for clarity

Journal ref: Proceedings of the 5th Conference on Robot Learning, PMLR 164 (2022) 491-500

arXiv:2206.04596 [pdf, other]

doi 10.1109/ICRA48891.2023.10160578

Linear Delta Arrays for Compliant Dexterous Distributed Manipulation

Authors: Sarvesh Patil, Tony Tao, Tess Hellebrekers, Oliver Kroemer, F. Zeynep Temel

Abstract: This paper presents a new type of distributed dexterous manipulator: delta arrays. Our delta array setup consists of 64 linearly-actuated delta robots with 3D-printed compliant linkages. Through the design of the individual delta robots, the modular array structure, and distributed communication and control, we study a wide range of in-plane and out-of-plane manipulations, as well as prehensile ma… ▽ More This paper presents a new type of distributed dexterous manipulator: delta arrays. Our delta array setup consists of 64 linearly-actuated delta robots with 3D-printed compliant linkages. Through the design of the individual delta robots, the modular array structure, and distributed communication and control, we study a wide range of in-plane and out-of-plane manipulations, as well as prehensile manipulations among subsets of neighboring delta robots. We also demonstrate dexterous manipulation capabilities of the delta array using reinforcement learning while leveraging the compliance to not break the end-effectors. Our evaluations show that the resulting 192 DoF compliant robot is capable of performing various coordinated distributed manipulations of a variety of objects, including translation, alignment, prehensile squeezing, lifting, and grasping. △ Less

Submitted 14 August, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: ICRA 2023

arXiv:2203.07478 [pdf, other]

Synergistic Scheduling of Learning and Allocation of Tasks in Human-Robot Teams

Authors: Shivam Vats, Oliver Kroemer, Maxim Likhachev

Abstract: We consider the problem of completing a set of $n$ tasks with a human-robot team using minimum effort. In many domains, teaching a robot to be fully autonomous can be counterproductive if there are finitely many tasks to be done. Rather, the optimal strategy is to weigh the cost of teaching a robot and its benefit -- how many new tasks it allows the robot to solve autonomously. We formulate this a… ▽ More We consider the problem of completing a set of $n$ tasks with a human-robot team using minimum effort. In many domains, teaching a robot to be fully autonomous can be counterproductive if there are finitely many tasks to be done. Rather, the optimal strategy is to weigh the cost of teaching a robot and its benefit -- how many new tasks it allows the robot to solve autonomously. We formulate this as a planning problem where the goal is to decide what tasks the robot should do autonomously (act), what tasks should be delegated to a human (delegate) and what tasks the robot should be taught (learn) so as to complete all the given tasks with minimum effort. This planning problem results in a search tree that grows exponentially with $n$ -- making standard graph search algorithms intractable. We address this by converting the problem into a mixed integer program that can be solved efficiently using off-the-shelf solvers with bounds on solution quality. To predict the benefit of learning, we propose a precondition prediction classifier. Given two tasks, this classifier predicts whether a skill trained on one will transfer to the other. Finally, we evaluate our approach on peg insertion and Lego stacking tasks, both in simulation and real-world, showing substantial savings in human effort. △ Less

Submitted 6 July, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: Camera ready version for ICRA, 2022

arXiv:2202.06052 [pdf, other]

Learning by Doing: Controlling a Dynamical System using Causality, Control, and Reinforcement Learning

Authors: Sebastian Weichwald, Søren Wengel Mogensen, Tabitha Edith Lee, Dominik Baumann, Oliver Kroemer, Isabelle Guyon, Sebastian Trimpe, Jonas Peters, Niklas Pfister

Abstract: Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system… ▽ More Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system by excitation strategies to then apply model-based design techniques to control the system. In (non-model-based) reinforcement learning, one directly optimizes a reward. In causality, one focus is on identifiability of causal structure. We believe that combining the different views might create synergies and this competition is meant as a first step toward such synergies. The participants had access to observational and (offline) interventional data generated by dynamical systems. Track CHEM considers an open-loop problem in which a single impulse at the beginning of the dynamics can be set, while Track ROBO considers a closed-loop problem in which control variables can be set at each time step. The goal in both tracks is to infer controls that drive the system to a desired state. Code is open-sourced ( https://github.com/LearningByDoingCompetition/learningbydoing-comp ) to reproduce the winning solutions of the competition and to facilitate trying out new methods on the competition tasks. △ Less

Submitted 12 February, 2022; originally announced February 2022.

Comments: https://learningbydoingcompetition.github.io/

arXiv:2109.08771 [pdf, other]

Search-Based Task Planning with Learned Skill Effect Models for Lifelong Robotic Manipulation

Authors: Jacky Liang, Mohit Sharma, Alex LaGrassa, Shivam Vats, Saumya Saxena, Oliver Kroemer

Abstract: Robots deployed in many real-world settings need to be able to acquire new skills and solve new tasks over time. Prior works on planning with skills often make assumptions on the structure of skills and tasks, such as subgoal skills, shared skill implementations, or task-specific plan skeletons, which limit adaptation to new skills and tasks. By contrast, we propose doing task planning by jointly… ▽ More Robots deployed in many real-world settings need to be able to acquire new skills and solve new tasks over time. Prior works on planning with skills often make assumptions on the structure of skills and tasks, such as subgoal skills, shared skill implementations, or task-specific plan skeletons, which limit adaptation to new skills and tasks. By contrast, we propose doing task planning by jointly searching in the space of parameterized skills using high-level skill effect models learned in simulation. We use an iterative training procedure to efficiently generate relevant data to train such models. Our approach allows flexible skill parameterizations and task specifications to facilitate lifelong learning in general-purpose domains. Experiments demonstrate the ability of our planner to integrate new skills in a lifelong manner, finding new task strategies with lower costs in both train and test tasks. We additionally show that our method can transfer to the real world without further fine-tuning. △ Less

Submitted 13 April, 2022; v1 submitted 17 September, 2021; originally announced September 2021.

Comments: To appear in the International Conference on Robotics and Automation (ICRA) 2022

arXiv:2107.01507 [pdf]

doi 10.55417/fr.2022007

Mission-level Robustness with Rapidly-deployed, Autonomous Aerial Vehicles by Carnegie Mellon Team Tartan at MBZIRC 2020

Authors: Anish Bhattacharya, Akshit Gandhi, Lukas Merkle, Rohan Tiwari, Karun Warrior, Stanley Winata, Andrew Saba, Kevin Zhang, Oliver Kroemer, Sebastian Scherer

Abstract: For robotic systems to succeed in high risk, real-world situations, they have to be quickly deployable and robust to environmental changes, under-performing hardware, and mission subtask failures. These robots are often designed to consider a single sequence of mission events, with complex algorithms lowering individual subtask failure rates under some critical constraints. Our approach utilizes c… ▽ More For robotic systems to succeed in high risk, real-world situations, they have to be quickly deployable and robust to environmental changes, under-performing hardware, and mission subtask failures. These robots are often designed to consider a single sequence of mission events, with complex algorithms lowering individual subtask failure rates under some critical constraints. Our approach utilizes common techniques in vision and control, and encodes robustness into mission structure through outcome monitoring and recovery strategies. In addition, our system infrastructure enables rapid deployment and requires no central communication. This report also includes lessons in rapid field robotic development and testing. We developed and evaluated our systems through real-robot experiments at an outdoor test site in Pittsburgh, Pennsylvania, USA, as well as in the 2020 Mohamed Bin Zayed International Robotics Challenge. All competition trials were completed in fully autonomous mode without RTK-GPS. Our system placed fourth in Challenge 2 and seventh in the Grand Challenge, with notable achievements such as popping five balloons (Challenge 1), successfully picking and placing a block (Challenge 2), and dispensing the most water onto an outdoor, real fire with an autonomous UAV (Challenge 3). △ Less

Submitted 13 September, 2022; v1 submitted 3 July, 2021; originally announced July 2021.

Comments: 29 pages, 26 figures. Published in Field Robotics, Special Issue on MBZIRC 2020

Journal ref: Field Robotics, 2, 172-200 (2022)

arXiv:2104.11962 [pdf, other]

Adaptive Sampling: Algorithmic vs. Human Waypoint Selection

Authors: Stephanie Kemna, Sara Kangaslahti, Oliver Kroemer, Gaurav S. Sukhatme

Abstract: Robots are used for collecting samples from natural environments to create models of, for example, temperature or algae fields in the ocean. Adaptive informative sampling is a proven technique for this kind of spatial field modeling. This paper compares the performance of humans versus adaptive informative sampling algorithms for selecting informative waypoints. The humans and simulated robot are… ▽ More Robots are used for collecting samples from natural environments to create models of, for example, temperature or algae fields in the ocean. Adaptive informative sampling is a proven technique for this kind of spatial field modeling. This paper compares the performance of humans versus adaptive informative sampling algorithms for selecting informative waypoints. The humans and simulated robot are given the same information for selecting waypoints, and both are evaluated on the accuracy of the resulting model. We developed a graphical user interface for selecting waypoints and visualizing samples. Eleven participants iteratively picked waypoints for twelve scenarios. Our simulated robot used Gaussian Process regression with two entropy-based optimization criteria to iteratively choose waypoints. Our results show that the robot can on average perform better than the average human, and approximately as good as the best human, when the model assumptions correspond to the actual field. However, when the model assumptions do not correspond as well to the characteristics of the field, both human and robot performance are no better than random sampling. △ Less

Submitted 24 April, 2021; originally announced April 2021.

Comments: 12 pages, 7 figures, not yet published

arXiv:2103.16772 [pdf, other]

Causal Reasoning in Simulation for Structure and Transfer Learning of Robot Manipulation Policies

Authors: Tabitha Edith Lee, Jialiang Zhao, Amrita S. Sawhney, Siddharth Girdhar, Oliver Kroemer

Abstract: We present CREST, an approach for causal reasoning in simulation to learn the relevant state space for a robot manipulation policy. Our approach conducts interventions using internal models, which are simulations with approximate dynamics and simplified assumptions. These interventions elicit the structure between the state and action spaces, enabling construction of neural network policies with o… ▽ More We present CREST, an approach for causal reasoning in simulation to learn the relevant state space for a robot manipulation policy. Our approach conducts interventions using internal models, which are simulations with approximate dynamics and simplified assumptions. These interventions elicit the structure between the state and action spaces, enabling construction of neural network policies with only relevant states as input. These policies are pretrained using the internal model with domain randomization over the relevant states. The policy network weights are then transferred to the target domain (e.g., the real world) for fine tuning. We perform extensive policy transfer experiments in simulation for two representative manipulation tasks: block stacking and crate opening. Our policies are shown to be more robust to domain shifts, more sample efficient to learn, and scale to more complex settings with larger state spaces. We also show improved zero-shot sim-to-real transfer of our policies for the block stacking task. △ Less

Submitted 13 March, 2022; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: Presented at the 2021 IEEE International Conference on Robotics and Automation (ICRA 2021). This version contains the camera-ready paper with supplementary materials and minor typo fixes. Project page: https://sites.google.com/view/crest-causal-struct-xfer-manip

arXiv:2103.14256 [pdf, other]

Learning Reactive and Predictive Differentiable Controllers for Switching Linear Dynamical Models

Authors: Saumya Saxena, Alex LaGrassa, Oliver Kroemer

Abstract: Humans leverage the dynamics of the environment and their own bodies to accomplish challenging tasks such as grasping an object while walking past it or pushing off a wall to turn a corner. Such tasks often involve switching dynamics as the robot makes and breaks contact. Learning these dynamics is a challenging problem and prone to model inaccuracies, especially near contact regions. In this work… ▽ More Humans leverage the dynamics of the environment and their own bodies to accomplish challenging tasks such as grasping an object while walking past it or pushing off a wall to turn a corner. Such tasks often involve switching dynamics as the robot makes and breaks contact. Learning these dynamics is a challenging problem and prone to model inaccuracies, especially near contact regions. In this work, we present a framework for learning composite dynamical behaviors from expert demonstrations. We learn a switching linear dynamical model with contacts encoded in switching conditions as a close approximation of our system dynamics. We then use discrete-time LQR as the differentiable policy class for data-efficient learning of control to develop a control strategy that operates over multiple dynamical modes and takes into account discontinuities due to contact. In addition to predicting interactions with the environment, our policy effectively reacts to inaccurate predictions such as unanticipated contacts. Through simulation and real world experiments, we demonstrate generalization of learned behaviors to different scenarios and robustness to model inaccuracies during execution. △ Less

Submitted 26 March, 2021; originally announced March 2021.

arXiv:2103.10524 [pdf, other]

Generalizing Object-Centric Task-Axes Controllers using Keypoints

Authors: Mohit Sharma, Oliver Kroemer

Abstract: To perform manipulation tasks in the real world, robots need to operate on objects with various shapes, sizes and without access to geometric models. It is often unfeasible to train monolithic neural network policies across such large variance in object properties. Towards this generalization challenge, we propose to learn modular task policies which compose object-centric task-axes controllers. T… ▽ More To perform manipulation tasks in the real world, robots need to operate on objects with various shapes, sizes and without access to geometric models. It is often unfeasible to train monolithic neural network policies across such large variance in object properties. Towards this generalization challenge, we propose to learn modular task policies which compose object-centric task-axes controllers. These task-axes controllers are parameterized by properties associated with underlying objects in the scene. We infer these controller parameters directly from visual input using multi-view dense correspondence learning. Our overall approach provides a simple, modular and yet powerful framework for learning manipulation tasks. We empirically evaluate our approach on multiple different manipulation tasks and show its ability to generalize to large variance in object size, shape and geometry. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: International Conference on Robotics and Automation (ICRA'21). For results see https://sites.google.com/view/robotic-manip-task-axes-ctrlrs

arXiv:2101.02252 [pdf, other]

Playing with Food: Learning Food Item Representations through Interactive Exploration

Authors: Amrita Sawhney, Steven Lee, Kevin Zhang, Manuela Veloso, Oliver Kroemer

Abstract: A key challenge in robotic food manipulation is modeling the material properties of diverse and deformable food items. We propose using a multimodal sensory approach to interact and play with food that facilitates the ability to distinguish these properties across food items. First, we use a robotic arm and an array of sensors, which are synchronized using ROS, to collect a diverse dataset consist… ▽ More A key challenge in robotic food manipulation is modeling the material properties of diverse and deformable food items. We propose using a multimodal sensory approach to interact and play with food that facilitates the ability to distinguish these properties across food items. First, we use a robotic arm and an array of sensors, which are synchronized using ROS, to collect a diverse dataset consisting of 21 unique food items with varying slices and properties. Afterwards, we learn visual embedding networks that utilize a combination of proprioceptive, audio, and visual data to encode similarities among food items using a triplet loss formulation. Our evaluations show that embeddings learned through interactions can successfully increase performance in a wide range of material and shape classification tasks. We envision that these learned embeddings can be utilized as a basis for planning and selecting optimal parameters for more material-aware robotic food manipulation skills. Furthermore, we hope to stimulate further innovations in the field of food robotics by sharing this food playing dataset with the research community. △ Less

Submitted 6 January, 2021; originally announced January 2021.

Comments: 12 pages, 8 figures, 2 tables, to be published in Proceedings of International Symposium on Experimental Robotics (ISER) 2020, project website located here: https://sites.google.com/view/playing-with-food/home

arXiv:2012.01693 [pdf, other]

Relational Learning for Skill Preconditions

Authors: Mohit Sharma, Oliver Kroemer

Abstract: To determine if a skill can be executed in any given environment, a robot needs to learn the preconditions for the skill. As robots begin to operate in dynamic and unstructured environments, precondition models will need to generalize to variable number of objects with different shapes and sizes. In this work, we focus on learning precondition models for manipulation skills in unconstrained enviro… ▽ More To determine if a skill can be executed in any given environment, a robot needs to learn the preconditions for the skill. As robots begin to operate in dynamic and unstructured environments, precondition models will need to generalize to variable number of objects with different shapes and sizes. In this work, we focus on learning precondition models for manipulation skills in unconstrained environments. Our work is motivated by the intuition that many complex manipulation tasks, with multiple objects, can be simplified by focusing on less complex pairwise object relations. We propose an object-relation model that learns continuous representations for these pairwise object relations. Our object-relation model is trained completely in simulation, and once learned, is used by a separate precondition model to predict skill preconditions for real world tasks. We evaluate our precondition model on $3$ different manipulation tasks: sweeping, cutting, and unstacking. We show that our approach leads to significant improvements in predicting preconditions for all 3 tasks, across objects of different shapes and sizes. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: CoRL'20

arXiv:2012.00284 [pdf, other]

Visual Identification of Articulated Object Parts

Authors: Vicky Zeng, Tabitha Edith Lee, Jacky Liang, Oliver Kroemer

Abstract: As autonomous robots interact and navigate around real-world environments such as homes, it is useful to reliably identify and manipulate articulated objects, such as doors and cabinets. Many prior works in object articulation identification require manipulation of the object, either by the robot or a human. While recent works have addressed predicting articulation types from visual observations… ▽ More As autonomous robots interact and navigate around real-world environments such as homes, it is useful to reliably identify and manipulate articulated objects, such as doors and cabinets. Many prior works in object articulation identification require manipulation of the object, either by the robot or a human. While recent works have addressed predicting articulation types from visual observations alone, they often assume prior knowledge of category-level kinematic motion models or sequence of observations where the articulated parts are moving according to their kinematic constraints. In this work, we propose FormNet, a neural network that identifies the articulation mechanisms between pairs of object parts from a single frame of an RGB-D image and segmentation masks. The network is trained on 100k synthetic images of 149 articulated objects from 6 categories. Synthetic images are rendered via a photorealistic simulator with domain randomization. Our proposed model predicts motion residual flows of object parts, and these flows are used to determine the articulation type and parameters. The network achieves an articulation type classification accuracy of 82.5% on novel object instances in trained categories. Experiments also show how this method enables generalization to novel categories and be applied to real-world images without fine-tuning. △ Less

Submitted 9 December, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: Presented at the 2021 IEEE International Conference on Intelligent Robots and Systems (IROS 2021). This version contains the camera-ready paper with an appendix. Project page: https://sites.google.com/view/articulated-objects/home

arXiv:2011.04627 [pdf, other]

Learning to Compose Hierarchical Object-Centric Controllers for Robotic Manipulation

Authors: Mohit Sharma, Jacky Liang, Jialiang Zhao, Alex LaGrassa, Oliver Kroemer

Abstract: Manipulation tasks can often be decomposed into multiple subtasks performed in parallel, e.g., sliding an object to a goal pose while maintaining contact with a table. Individual subtasks can be achieved by task-axis controllers defined relative to the objects being manipulated, and a set of object-centric controllers can be combined in an hierarchy. In prior works, such combinations are defined m… ▽ More Manipulation tasks can often be decomposed into multiple subtasks performed in parallel, e.g., sliding an object to a goal pose while maintaining contact with a table. Individual subtasks can be achieved by task-axis controllers defined relative to the objects being manipulated, and a set of object-centric controllers can be combined in an hierarchy. In prior works, such combinations are defined manually or learned from demonstrations. By contrast, we propose using reinforcement learning to dynamically compose hierarchical object-centric controllers for manipulation tasks. Experiments in both simulation and real world show how the proposed approach leads to improved sample efficiency, zero-shot generalization to novel test environments, and simulation-to-reality transfer without fine-tuning. △ Less

Submitted 13 November, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: Accepted as Plenary Talk at CoRL'20. First two authors contributed equally. For results see https://sites.google.com/view/compositional-object-control/

arXiv:2011.03142 [pdf, other]

Contact Localization for Robot Arms in Motion without Torque Sensing

Authors: Jacky Liang, Oliver Kroemer

Abstract: Detecting and localizing contacts is essential for robot manipulators to perform contact-rich tasks in unstructured environments. While robot skins can localize contacts on the surface of robot arms, these sensors are not yet robust or easily accessible. As such, prior works have explored using proprioceptive observations, such as joint velocities and torques, to perform contact localization. Many… ▽ More Detecting and localizing contacts is essential for robot manipulators to perform contact-rich tasks in unstructured environments. While robot skins can localize contacts on the surface of robot arms, these sensors are not yet robust or easily accessible. As such, prior works have explored using proprioceptive observations, such as joint velocities and torques, to perform contact localization. Many past approaches assume the robot is static during contact incident, a single contact is made at a time, or having access to accurate dynamics models and joint torque sensing. In this work, we relax these assumptions and propose using Domain Randomization to train a neural network to localize contacts of robot arms in motion without joint torque observations. Our method uses a novel cylindrical projection encoding of the robot arm surface, which allows the network to use convolution layers to process input features and transposed convolution layers to predict contacts. The trained network achieves a contact detection accuracy of 91.5% and a mean contact localization error of 3.0cm. We further demonstrate an application of the contact localization model in an obstacle mapping task, evaluated in both simulation and the real world. △ Less

Submitted 25 March, 2021; v1 submitted 5 November, 2020; originally announced November 2020.

Comments: International Conference on Robotics and Automation (ICRA). May 2021

arXiv:2011.02462 [pdf, other]

Towards Robotic Assembly by Predicting Robust, Precise and Task-oriented Grasps

Authors: Jialiang Zhao, Daniel Troniak, Oliver Kroemer

Abstract: Robust task-oriented grasp planning is vital for autonomous robotic precision assembly tasks. Knowledge of the objects' geometry and preconditions of the target task should be incorporated when determining the proper grasp to execute. However, several factors contribute to the challenges of realizing these grasps such as noise when controlling the robot, unknown object properties, and difficulties… ▽ More Robust task-oriented grasp planning is vital for autonomous robotic precision assembly tasks. Knowledge of the objects' geometry and preconditions of the target task should be incorporated when determining the proper grasp to execute. However, several factors contribute to the challenges of realizing these grasps such as noise when controlling the robot, unknown object properties, and difficulties modeling complex object-object interactions. We propose a method that decomposes this problem and optimizes for grasp robustness, precision, and task performance by learning three cascaded networks. We evaluate our method in simulation on three common assembly tasks: inserting gears onto pegs, aligning brackets into corners, and inserting shapes into slots. Our policies are trained using a curriculum based on large-scale self-supervised grasp simulations with procedurally generated objects. Finally, we evaluate the performance of the first two tasks with a real robot where our method achieves 4.28mm error for bracket insertion and 1.44mm error for gear insertion. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: Submitted and accepted to 4th Conference on Robot Learning (CoRL2020)

arXiv:2011.02398 [pdf, other]

A Modular Robotic Arm Control Stack for Research: Franka-Interface and FrankaPy

Authors: Kevin Zhang, Mohit Sharma, Jacky Liang, Oliver Kroemer

Abstract: We designed a modular robotic control stack that provides a customizable and accessible interface to the Franka Emika Panda Research robot. This framework abstracts high-level robot control commands as skills, which are decomposed into combinations of trajectory generators, feedback controllers, and termination handlers. Low-level control is implemented in C++ and runs at $1$kHz, and high-level co… ▽ More We designed a modular robotic control stack that provides a customizable and accessible interface to the Franka Emika Panda Research robot. This framework abstracts high-level robot control commands as skills, which are decomposed into combinations of trajectory generators, feedback controllers, and termination handlers. Low-level control is implemented in C++ and runs at $1$kHz, and high-level commands are exposed in Python. In addition, external sensor feedback, like estimated object poses, can be streamed to the low-level controllers in real time. This modular approach allows us to quickly prototype new control methods, which is essential for research applications. We have applied this framework across a variety of real-world robot tasks in more than $5$ published research papers. The framework is currently shared internally with other robotics labs at Carnegie Mellon University, and we plan for a public release in the near future. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: 7 pages, 4 figures

arXiv:2009.13732 [pdf, other]

Learning Skills to Patch Plans Based on Inaccurate Models

Authors: Alex LaGrassa, Steven Lee, Oliver Kroemer

Abstract: Planners using accurate models can be effective for accomplishing manipulation tasks in the real world, but are typically highly specialized and require significant fine-tuning to be reliable. Meanwhile, learning is useful for adaptation, but can require a substantial amount of data collection. In this paper, we propose a method that improves the efficiency of sub-optimal planners with approximate… ▽ More Planners using accurate models can be effective for accomplishing manipulation tasks in the real world, but are typically highly specialized and require significant fine-tuning to be reliable. Meanwhile, learning is useful for adaptation, but can require a substantial amount of data collection. In this paper, we propose a method that improves the efficiency of sub-optimal planners with approximate but simple and fast models by switching to a model-free policy when unexpected transitions are observed. Unlike previous work, our method specifically addresses when the planner fails due to transition model error by patching with a local policy only where needed. First, we use a sub-optimal model-based planner to perform a task until model failure is detected. Next, we learn a local model-free policy from expert demonstrations to complete the task in regions where the model failed. To show the efficacy of our method, we perform experiments with a shape insertion puzzle and compare our results to both pure planning and imitation learning approaches. We then apply our method to a door opening task. Our experiments demonstrate that our patch-enhanced planner performs more reliably than pure planning and with lower overall sample complexity than pure imitation learning. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: 8 pages, 10 figures, accepted to Intelligent Robots and Systems (IROS) 2020

Journal ref: International Conference on Intelligent Robots and Systems (IROS). 9441-9448. Sept. 2020

arXiv:2006.01952 [pdf, other]

Learning Active Task-Oriented Exploration Policies for Bridging the Sim-to-Real Gap

Authors: Jacky Liang, Saumya Saxena, Oliver Kroemer

Abstract: Training robotic policies in simulation suffers from the sim-to-real gap, as simulated dynamics can be different from real-world dynamics. Past works tackled this problem through domain randomization and online system-identification. The former is sensitive to the manually-specified training distribution of dynamics parameters and can result in behaviors that are overly conservative. The latter re… ▽ More Training robotic policies in simulation suffers from the sim-to-real gap, as simulated dynamics can be different from real-world dynamics. Past works tackled this problem through domain randomization and online system-identification. The former is sensitive to the manually-specified training distribution of dynamics parameters and can result in behaviors that are overly conservative. The latter requires learning policies that concurrently perform the task and generate useful trajectories for system identification. In this work, we propose and analyze a framework for learning exploration policies that explicitly perform task-oriented exploration actions to identify task-relevant system parameters. These parameters are then used by model-based trajectory optimization algorithms to perform the task in the real world. We instantiate the framework in simulation with the Linear Quadratic Regulator as well as in the real world with pouring and object dragging tasks. Experiments show that task-oriented exploration helps model-based policies adapt to systems with initially unknown parameters, and it leads to better task performance than task-agnostic exploration. △ Less

Submitted 5 November, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

Comments: Published at Robotics: Science and Systems 2020

arXiv:2006.00028 [pdf, other]

doi 10.1109/LRA.2020.2974686

Multi-modal Transfer Learning for Grasping Transparent and Specular Objects

Authors: Thomas Weng, Amith Pallankize, Yimin Tang, Oliver Kroemer, David Held

Abstract: State-of-the-art object grasping methods rely on depth sensing to plan robust grasps, but commercially available depth sensors fail to detect transparent and specular objects. To improve grasping performance on such objects, we introduce a method for learning a multi-modal perception model by bootstrapping from an existing uni-modal model. This transfer learning approach requires only a pre-existi… ▽ More State-of-the-art object grasping methods rely on depth sensing to plan robust grasps, but commercially available depth sensors fail to detect transparent and specular objects. To improve grasping performance on such objects, we introduce a method for learning a multi-modal perception model by bootstrapping from an existing uni-modal model. This transfer learning approach requires only a pre-existing uni-modal grasping model and paired multi-modal image data for training, foregoing the need for ground-truth grasp success labels nor real grasp attempts. Our experiments demonstrate that our approach is able to reliably grasp transparent and reflective objects. Video and supplementary material are available at https://sites.google.com/view/transparent-specular-grasping. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: RA-L with presentation at ICRA 2020

Journal ref: IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 5, NO. 3, JULY 2020. 3791-3798

arXiv:2002.12160 [pdf, other]

In-Hand Object Pose Tracking via Contact Feedback and GPU-Accelerated Robotic Simulation

Authors: Jacky Liang, Ankur Handa, Karl Van Wyk, Viktor Makoviychuk, Oliver Kroemer, Dieter Fox

Abstract: Tracking the pose of an object while it is being held and manipulated by a robot hand is difficult for vision-based methods due to significant occlusions. Prior works have explored using contact feedback and particle filters to localize in-hand objects. However, they have mostly focused on the static grasp setting and not when the object is in motion, as doing so requires modeling of complex conta… ▽ More Tracking the pose of an object while it is being held and manipulated by a robot hand is difficult for vision-based methods due to significant occlusions. Prior works have explored using contact feedback and particle filters to localize in-hand objects. However, they have mostly focused on the static grasp setting and not when the object is in motion, as doing so requires modeling of complex contact dynamics. In this work, we propose using GPU-accelerated parallel robot simulations and derivative-free, sample-based optimizers to track in-hand object poses with contact feedback during manipulation. We use physics simulation as the forward model for robot-object interactions, and the algorithm jointly optimizes for the state and the parameters of the simulations, so they better match with those of the real world. Our method runs in real-time (30Hz) on a single GPU, and it achieves an average point cloud distance error of 6mm in simulation experiments and 13mm in the real-world ones. View experiment videos at https://sites.google.com/view/in-hand-object-pose-tracking/ △ Less

Submitted 5 November, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

Comments: Accepted to the International Conference on Robotics and Automation (ICRA) 2020

arXiv:1911.09231 [pdf, other]

Camera-to-Robot Pose Estimation from a Single Image

Authors: Timothy E. Lee, Jonathan Tremblay, Thang To, Jia Cheng, Terry Mosier, Oliver Kroemer, Dieter Fox, Stan Birchfield

Abstract: We present an approach for estimating the pose of an external camera with respect to a robot using a single RGB image of the robot. The image is processed by a deep neural network to detect 2D projections of keypoints (such as joints) associated with the robot. The network is trained entirely on simulated data using domain randomization to bridge the reality gap. Perspective-n-point (PnP) is then… ▽ More We present an approach for estimating the pose of an external camera with respect to a robot using a single RGB image of the robot. The image is processed by a deep neural network to detect 2D projections of keypoints (such as joints) associated with the robot. The network is trained entirely on simulated data using domain randomization to bridge the reality gap. Perspective-n-point (PnP) is then used to recover the camera extrinsics, assuming that the camera intrinsics and joint configuration of the robot manipulator are known. Unlike classic hand-eye calibration systems, our method does not require an off-line calibration step. Rather, it is capable of computing the camera extrinsics from a single frame, thus opening the possibility of on-line calibration. We show experimental results for three different robots and camera sensors, demonstrating that our approach is able to achieve accuracy with a single frame that is comparable to that of classic off-line hand-eye calibration using multiple frames. With additional frames from a static pose, accuracy improves even further. Code, datasets, and pretrained models for three widely-used robot manipulators are made available. △ Less

Submitted 23 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: ICRA 2020. Project page is at https://research.nvidia.com/publication/2020-03_DREAM

arXiv:1909.12460 [pdf, other]

Leveraging Multimodal Haptic Sensory Data for Robust Cutting

Authors: Kevin Zhang, Mohit Sharma, Manuela Veloso, Oliver Kroemer

Abstract: Cutting is a common form of manipulation when working with divisible objects such as food, rope, or clay. Cooking in particular relies heavily on cutting to divide food items into desired shapes. However, cutting food is a challenging task due to the wide range of material properties exhibited by food items. Due to this variability, the same cutting motions cannot be used for all food items. Sensa… ▽ More Cutting is a common form of manipulation when working with divisible objects such as food, rope, or clay. Cooking in particular relies heavily on cutting to divide food items into desired shapes. However, cutting food is a challenging task due to the wide range of material properties exhibited by food items. Due to this variability, the same cutting motions cannot be used for all food items. Sensations from contact events, e.g., when placing the knife on the food item, will also vary depending on the material properties, and the robot will need to adapt accordingly. In this paper, we propose using vibrations and force-torque feedback from the interactions to adapt the slicing motions and monitor for contact events. The robot learns neural networks for performing each of these tasks and generalizing across different material properties. By adapting and monitoring the skill executions, the robot is able to reliably cut through more than 20 different types of food items and even detect whether certain food items are fresh or old. △ Less

Submitted 26 September, 2019; originally announced September 2019.

Comments: Accepted as conference paper at Humanoids'19

arXiv:1909.02129 [pdf, other]

Towards Precise Robotic Grasping by Probabilistic Post-grasp Displacement Estimation

Authors: Jialiang Zhao, Jacky Liang, Oliver Kroemer

Abstract: Precise robotic grasping is important for many industrial applications, such as assembly and palletizing, where the location of the object needs to be controlled and known. However, achieving precise grasps is challenging due to noise in sensing and control, as well as unknown object properties. We propose a method to plan robotic grasps that are both robust and precise by training two convolution… ▽ More Precise robotic grasping is important for many industrial applications, such as assembly and palletizing, where the location of the object needs to be controlled and known. However, achieving precise grasps is challenging due to noise in sensing and control, as well as unknown object properties. We propose a method to plan robotic grasps that are both robust and precise by training two convolutional neural networks - one to predict the robustness of a grasp and another to predict a distribution of post-grasp object displacements. Our networks are trained with depth images in simulation on a dataset of over 1000 industrial parts and were successfully deployed on a real robot without having to be further fine-tuned. The proposed displacement estimator achieves a mean prediction errors of 0.68cm and 3.42deg on novel objects in real world experiments. △ Less

Submitted 4 September, 2019; originally announced September 2019.

Comments: Submitted and accepted to 12th Conference on Field and Service Robotics (FSR 2019)

arXiv:1907.05518 [pdf, other]

Graph-Structured Visual Imitation

Authors: Maximilian Sieb, Zhou Xian, Audrey Huang, Oliver Kroemer, Katerina Fragkiadaki

Abstract: We cast visual imitation as a visual correspondence problem. Our robotic agent is rewarded when its actions result in better matching of relative spatial configurations for corresponding visual entities detected in its workspace and teacher's demonstration. We build upon recent advances in Computer Vision,such as human finger keypoint detectors, object detectors trained on-the-fly with synthetic a… ▽ More We cast visual imitation as a visual correspondence problem. Our robotic agent is rewarded when its actions result in better matching of relative spatial configurations for corresponding visual entities detected in its workspace and teacher's demonstration. We build upon recent advances in Computer Vision,such as human finger keypoint detectors, object detectors trained on-the-fly with synthetic augmentations, and point detectors supervised by viewpoint changes and learn multiple visual entity detectors for each demonstration without human annotations or robot interactions. We empirically show the proposed factorized visual representations of entities and their spatial arrangements drive successful imitation of a variety of manipulation skills within minutes, using a single demonstration and without any environment instrumentation. It is robust to background clutter and can effectively generalize across environment variations between demonstrator and imitator, greatly outperforming unstructured non-factorized full-frame CNN encodings of previous works. △ Less

Submitted 4 March, 2020; v1 submitted 11 July, 2019; originally announced July 2019.

Comments: 8 pages, 3 figures, 1 table

arXiv:1907.03146 [pdf, other]

A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms

Authors: Oliver Kroemer, Scott Niekum, George Konidaris

Abstract: A key challenge in intelligent robotics is creating robots that are capable of directly interacting with the world around them to achieve their goals. The last decade has seen substantial growth in research on the problem of robot manipulation, which aims to exploit the increasing availability of affordable robot arms and grippers to create robots capable of directly interacting with the world to… ▽ More A key challenge in intelligent robotics is creating robots that are capable of directly interacting with the world around them to achieve their goals. The last decade has seen substantial growth in research on the problem of robot manipulation, which aims to exploit the increasing availability of affordable robot arms and grippers to create robots capable of directly interacting with the world to achieve their goals. Learning will be central to such autonomous systems, as the real world contains too much variation for a robot to expect to have an accurate model of its environment, the objects in it, or the skills required to manipulate them, in advance. We aim to survey a representative subset of that research which uses machine learning for manipulation. We describe a formalization of the robot manipulation learning problem that synthesizes existing research into a single coherent framework and highlight the many remaining research opportunities and challenges. △ Less

Submitted 6 November, 2020; v1 submitted 6 July, 2019; originally announced July 2019.

arXiv:1904.00303 [pdf, other]

Learning Semantic Embedding Spaces for Slicing Vegetables

Authors: Mohit Sharma, Kevin Zhang, Oliver Kroemer

Abstract: In this work, we present an interaction-based approach to learn semantically rich representations for the task of slicing vegetables. Unlike previous approaches, we focus on object-centric representations and use auxiliary tasks to learn rich representations using a two-step process. First, we use simple auxiliary tasks, such as predicting the thickness of a cut slice, to learn an embedding space… ▽ More In this work, we present an interaction-based approach to learn semantically rich representations for the task of slicing vegetables. Unlike previous approaches, we focus on object-centric representations and use auxiliary tasks to learn rich representations using a two-step process. First, we use simple auxiliary tasks, such as predicting the thickness of a cut slice, to learn an embedding space which captures object properties that are important for the task of slicing vegetables. In the second step, we use these learned latent embeddings to learn a forward model. Learning a forward model affords us to plan online in the latent embedding space and forces our model to improve its representations while performing the slicing task. To show the efficacy of our approach we perform experiments on two different vegetables: cucumbers and tomatoes. Our experimental evaluation shows that our method is able to capture important semantic properties for the slicing task, such as the thickness of the vegetable being cut. We further show that by using our learned forward model, we can plan for the task of vegetable slicing. △ Less

Submitted 30 March, 2019; originally announced April 2019.

arXiv:1605.04439 [pdf, other]

Learning Relevant Features for Manipulation Skills using Meta-Level Priors

Authors: Oliver Kroemer, Gaurav S. Sukhatme

Abstract: Robots can generalize manipulation skills between different scenarios by adapting to the features of the objects being manipulated. Selecting the set of relevant features for generalizing skills has usually been performed manually by a human. Alternatively, a robot can learn to select relevant features autonomously. However, feature selection usually requires a large amount of training data, which… ▽ More Robots can generalize manipulation skills between different scenarios by adapting to the features of the objects being manipulated. Selecting the set of relevant features for generalizing skills has usually been performed manually by a human. Alternatively, a robot can learn to select relevant features autonomously. However, feature selection usually requires a large amount of training data, which would require many demonstrations. In order to learn the relevant features more efficiently, we propose using a meta-level prior to transfer the relevance of features from previously learned skills. The experiments show that the meta-level prior more than doubles the average precision and recall of the feature selection when compared to a standard uniform prior. The proposed approach was used to learn a variety of manipulation skills, including pushing, cutting, and pouring. △ Less

Submitted 14 May, 2016; originally announced May 2016.

Comments: 13 pages, 13 figures

Showing 1–45 of 45 results for author: Kroemer, O