-
Dynamics Models in the Aggressive Off-Road Driving Regime
Authors:
Tyler Han,
Sidharth Talia,
Rohan Panicker,
Preet Shah,
Neel Jawale,
Byron Boots
Abstract:
Current developments in autonomous off-road driving are steadily increasing performance through higher speeds and more challenging, unstructured environments. However, this operating regime subjects the vehicle to larger inertial effects, where consideration of higher-order states is necessary to avoid failures such as rollovers or excessive impact forces. Aggressive driving through Model Predicti…
▽ More
Current developments in autonomous off-road driving are steadily increasing performance through higher speeds and more challenging, unstructured environments. However, this operating regime subjects the vehicle to larger inertial effects, where consideration of higher-order states is necessary to avoid failures such as rollovers or excessive impact forces. Aggressive driving through Model Predictive Control (MPC) in these conditions requires dynamics models that accurately predict safety-critical information. This work aims to empirically quantify this aggressive operating regime and its effects on the performance of current models. We evaluate three dynamics models of varying complexity on two distinct off-road driving datasets: one simulated and the other real-world. By conditioning trajectory data on higher-order states, we show that model accuracy degrades with aggressiveness and simpler models degrade faster. These models are also validated across datasets, where accuracies over safety-critical states are reported and provide benchmarks for future work.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
LocoMan: Advancing Versatile Quadrupedal Dexterity with Lightweight Loco-Manipulators
Authors:
Changyi Lin,
Xingyu Liu,
Yuxiang Yang,
Yaru Niu,
Wenhao Yu,
Tingnan Zhang,
Jie Tan,
Byron Boots,
Ding Zhao
Abstract:
Quadrupedal robots have emerged as versatile agents capable of locomoting and manipulating in complex environments. Traditional designs typically rely on the robot's inherent body parts or incorporate top-mounted arms for manipulation tasks. However, these configurations may limit the robot's operational dexterity, efficiency and adaptability, particularly in cluttered or constrained spaces. In th…
▽ More
Quadrupedal robots have emerged as versatile agents capable of locomoting and manipulating in complex environments. Traditional designs typically rely on the robot's inherent body parts or incorporate top-mounted arms for manipulation tasks. However, these configurations may limit the robot's operational dexterity, efficiency and adaptability, particularly in cluttered or constrained spaces. In this work, we present LocoMan, a dexterous quadrupedal robot with a novel morphology to perform versatile manipulation in diverse constrained environments. By equipping a Unitree Go1 robot with two low-cost and lightweight modular 3-DoF loco-manipulators on its front calves, LocoMan leverages the combined mobility and functionality of the legs and grippers for complex manipulation tasks that require precise 6D positioning of the end effector in a wide workspace. To harness the loco-manipulation capabilities of LocoMan, we introduce a unified control framework that extends the whole-body controller (WBC) to integrate the dynamics of loco-manipulators. Through experiments, we validate that the proposed whole-body controller can accurately and stably follow desired 6D trajectories of the end effector and torso, which, when combined with the large workspace from our design, facilitates a diverse set of challenging dexterous loco-manipulation tasks in confined spaces, such as opening doors, plugging into sockets, picking objects in narrow and low-lying spaces, and bimanual manipulation.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Multi-Sample Long Range Path Planning under Sensing Uncertainty for Off-Road Autonomous Driving
Authors:
Matt Schmittle,
Rohan Baijal,
Brian Hou,
Siddhartha Srinivasa,
Byron Boots
Abstract:
We focus on the problem of long-range dynamic replanning for off-road autonomous vehicles, where a robot plans paths through a previously unobserved environment while continuously receiving noisy local observations. An effective approach for planning under sensing uncertainty is determinization, where one converts a stochastic world into a deterministic one and plans under this simplification. Thi…
▽ More
We focus on the problem of long-range dynamic replanning for off-road autonomous vehicles, where a robot plans paths through a previously unobserved environment while continuously receiving noisy local observations. An effective approach for planning under sensing uncertainty is determinization, where one converts a stochastic world into a deterministic one and plans under this simplification. This makes the planning problem tractable, but the cost of following the planned path in the real world may be different than in the determinized world. This causes collisions if the determinized world optimistically ignores obstacles, or causes unnecessarily long routes if the determinized world pessimistically imagines more obstacles. We aim to be robust to uncertainty over potential worlds while still achieving the efficiency benefits of determinization. We evaluate algorithms for dynamic replanning on a large real-world dataset of challenging long-range planning problems from the DARPA RACER program. Our method, Dynamic Replanning via Evaluating and Aggregating Multiple Samples (DREAMS), outperforms other determinization-based approaches in terms of combined traversal time and collision cost. https://sites.google.com/cs.washington.edu/dreams/
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
V-STRONG: Visual Self-Supervised Traversability Learning for Off-road Navigation
Authors:
Sanghun Jung,
JoonHo Lee,
Xiangyun Meng,
Byron Boots,
Alexander Lambert
Abstract:
Reliable estimation of terrain traversability is critical for the successful deployment of autonomous systems in wild, outdoor environments. Given the lack of large-scale annotated datasets for off-road navigation, strictly-supervised learning approaches remain limited in their generalization ability. To this end, we introduce a novel, image-based self-supervised learning method for traversability…
▽ More
Reliable estimation of terrain traversability is critical for the successful deployment of autonomous systems in wild, outdoor environments. Given the lack of large-scale annotated datasets for off-road navigation, strictly-supervised learning approaches remain limited in their generalization ability. To this end, we introduce a novel, image-based self-supervised learning method for traversability prediction, leveraging a state-of-the-art vision foundation model for improved out-of-distribution performance. Our method employs contrastive representation learning using both human driving data and instance-based segmentation masks during training. We show that this simple, yet effective, technique drastically outperforms recent methods in predicting traversability for both on- and off-trail driving scenarios. We compare our method with recent baselines on both a common benchmark as well as our own datasets, covering a diverse range of outdoor environments and varied terrain types. We also demonstrate the compatibility of resulting costmap predictions with a model-predictive controller. Finally, we evaluate our approach on zero- and few-shot tasks, demonstrating unprecedented performance for generalization to new environments. Videos and additional material can be found here: https://sites.google.com/view/visual-traversability-learning.
△ Less
Submitted 15 March, 2024; v1 submitted 26 December, 2023;
originally announced December 2023.
-
Model Predictive Control for Aggressive Driving Over Uneven Terrain
Authors:
Tyler Han,
Alex Liu,
Anqi Li,
Alex Spitzer,
Guanya Shi,
Byron Boots
Abstract:
Terrain traversability in unstructured off-road autonomy has traditionally relied on semantic classification, resource-intensive dynamics models, or purely geometry-based methods to predict vehicle-terrain interactions. While inconsequential at low speeds, uneven terrain subjects our full-scale system to safety-critical challenges at operating speeds of 7--10 m/s. This study focuses particularly o…
▽ More
Terrain traversability in unstructured off-road autonomy has traditionally relied on semantic classification, resource-intensive dynamics models, or purely geometry-based methods to predict vehicle-terrain interactions. While inconsequential at low speeds, uneven terrain subjects our full-scale system to safety-critical challenges at operating speeds of 7--10 m/s. This study focuses particularly on uneven terrain such as hills, banks, and ditches. These common high-risk geometries are capable of disabling the vehicle and causing severe passenger injuries if poorly traversed. We introduce a physics-based framework for identifying traversability constraints on terrain dynamics. Using this framework, we derive two fundamental constraints, each with a focus on mitigating rollover and ditch-crossing failures while being fully parallelizable in the sample-based Model Predictive Control (MPC) framework. In addition, we present the design of our planning and control system, which implements our parallelized constraints in MPC and utilizes a low-level controller to meet the demands of our aggressive driving without prior information about the environment and its dynamics. Through real-world experimentation and traversal of hills and ditches, we demonstrate that our approach captures fundamental elements of safe and aggressive autonomy over uneven terrain. Our approach improves upon geometry-based methods by completing comprehensive off-road courses up to 22% faster while maintaining safe operation.
△ Less
Submitted 7 June, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control
Authors:
Kevin Huang,
Rwik Rana,
Alexander Spitzer,
Guanya Shi,
Byron Boots
Abstract:
Precise arbitrary trajectory tracking for quadrotors is challenging due to unknown nonlinear dynamics, trajectory infeasibility, and actuation limits. To tackle these challenges, we present Deep Adaptive Trajectory Tracking (DATT), a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world. DATT builds o…
▽ More
Precise arbitrary trajectory tracking for quadrotors is challenging due to unknown nonlinear dynamics, trajectory infeasibility, and actuation limits. To tackle these challenges, we present Deep Adaptive Trajectory Tracking (DATT), a learning-based approach that can precisely track arbitrary, potentially infeasible trajectories in the presence of large disturbances in the real world. DATT builds on a novel feedforward-feedback-adaptive control structure trained in simulation using reinforcement learning. When deployed on real hardware, DATT is augmented with a disturbance estimator using L1 adaptive control in closed-loop, without any fine-tuning. DATT significantly outperforms competitive adaptive nonlinear and model predictive controllers for both feasible smooth and infeasible trajectories in unsteady wind fields, including challenging scenarios where baselines completely fail. Moreover, DATT can efficiently run online with an inference time less than 3.2 ms, less than 1/4 of the adaptive nonlinear model predictive control baseline
△ Less
Submitted 13 December, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Deep Model Predictive Optimization
Authors:
Jacob Sacks,
Rwik Rana,
Kevin Huang,
Alex Spitzer,
Guanya Shi,
Byron Boots
Abstract:
A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world. On one end of the spectrum, we have model-free reinforcement learning (MFRL), which is incredibly flexible and general but often results in brittle policies. In contrast, model predictive control (MPC) continually re-plans at each time step to remain robust to perturbations and mo…
▽ More
A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world. On one end of the spectrum, we have model-free reinforcement learning (MFRL), which is incredibly flexible and general but often results in brittle policies. In contrast, model predictive control (MPC) continually re-plans at each time step to remain robust to perturbations and model inaccuracies. However, despite its real-world successes, MPC often under-performs the optimal strategy. This is due to model quality, myopic behavior from short planning horizons, and approximations due to computational constraints. And even with a perfect model and enough compute, MPC can get stuck in bad local optima, depending heavily on the quality of the optimization algorithm. To this end, we propose Deep Model Predictive Optimization (DMPO), which learns the inner-loop of an MPC optimization algorithm directly via experience, specifically tailored to the needs of the control problem. We evaluate DMPO on a real quadrotor agile trajectory tracking task, on which it improves performance over a baseline MPC algorithm for a given computational budget. It can outperform the best MPC algorithm by up to 27% with fewer samples and an end-to-end policy trained with MFRL by 19%. Moreover, because DMPO requires fewer samples, it can also achieve these benefits with 4.3X less memory. When we subject the quadrotor to turbulent wind fields with an attached drag plate, DMPO can adapt zero-shot while still outperforming all baselines. Additional results can be found at https://tinyurl.com/mr2ywmnw.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Perceiving Extrinsic Contacts from Touch Improves Learning Insertion Policies
Authors:
Carolina Higuera,
Joseph Ortiz,
Haozhi Qi,
Luis Pineda,
Byron Boots,
Mustafa Mukadam
Abstract:
Robotic manipulation tasks such as object insertion typically involve interactions between object and environment, namely extrinsic contacts. Prior work on Neural Contact Fields (NCF) use intrinsic tactile sensing between gripper and object to estimate extrinsic contacts in simulation. However, its effectiveness and utility in real-world tasks remains unknown.
In this work, we improve NCF to ena…
▽ More
Robotic manipulation tasks such as object insertion typically involve interactions between object and environment, namely extrinsic contacts. Prior work on Neural Contact Fields (NCF) use intrinsic tactile sensing between gripper and object to estimate extrinsic contacts in simulation. However, its effectiveness and utility in real-world tasks remains unknown.
In this work, we improve NCF to enable sim-to-real transfer and use it to train policies for mug-in-cupholder and bowl-in-dishrack insertion tasks. We find our model NCF-v2, is capable of estimating extrinsic contacts in the real-world. Furthermore, our insertion policy with NCF-v2 outperforms policies without it, achieving 33% higher success and 1.36x faster execution on mug-in-cupholder, and 13% higher success and 1.27x faster execution on bowl-in-dishrack.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
LiDAR-UDA: Self-ensembling Through Time for Unsupervised LiDAR Domain Adaptation
Authors:
Amirreza Shaban,
JoonHo Lee,
Sanghun Jung,
Xiangyun Meng,
Byron Boots
Abstract:
We introduce LiDAR-UDA, a novel two-stage self-training-based Unsupervised Domain Adaptation (UDA) method for LiDAR segmentation. Existing self-training methods use a model trained on labeled source data to generate pseudo labels for target data and refine the predictions via fine-tuning the network on the pseudo labels. These methods suffer from domain shifts caused by different LiDAR sensor conf…
▽ More
We introduce LiDAR-UDA, a novel two-stage self-training-based Unsupervised Domain Adaptation (UDA) method for LiDAR segmentation. Existing self-training methods use a model trained on labeled source data to generate pseudo labels for target data and refine the predictions via fine-tuning the network on the pseudo labels. These methods suffer from domain shifts caused by different LiDAR sensor configurations in the source and target domains. We propose two techniques to reduce sensor discrepancy and improve pseudo label quality: 1) LiDAR beam subsampling, which simulates different LiDAR scanning patterns by randomly dropping beams; 2) cross-frame ensembling, which exploits temporal consistency of consecutive frames to generate more reliable pseudo labels. Our method is simple, generalizable, and does not incur any extra inference cost. We evaluate our method on several public LiDAR datasets and show that it outperforms the state-of-the-art methods by more than $3.9\%$ mIoU on average for all scenarios. Code will be available at https://github.com/JHLee0513/LiDARUDA.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
CAJun: Continuous Adaptive Jumping using a Learned Centroidal Controller
Authors:
Yuxiang Yang,
Guanya Shi,
Xiangyun Meng,
Wenhao Yu,
Tingnan Zhang,
Jie Tan,
Byron Boots
Abstract:
We present CAJun, a novel hierarchical learning and control framework that enables legged robots to jump continuously with adaptive jumping distances. CAJun consists of a high-level centroidal policy and a low-level leg controller. In particular, we use reinforcement learning (RL) to train the centroidal policy, which specifies the gait timing, base velocity, and swing foot position for the leg co…
▽ More
We present CAJun, a novel hierarchical learning and control framework that enables legged robots to jump continuously with adaptive jumping distances. CAJun consists of a high-level centroidal policy and a low-level leg controller. In particular, we use reinforcement learning (RL) to train the centroidal policy, which specifies the gait timing, base velocity, and swing foot position for the leg controller. The leg controller optimizes motor commands for the swing and stance legs according to the gait timing to track the swing foot target and base velocity commands using optimal control. Additionally, we reformulate the stance leg optimizer in the leg controller to speed up policy training by an order of magnitude. Our system combines the versatility of learning with the robustness of optimal control. By combining RL with optimal control methods, our system achieves the versatility of learning while enjoys the robustness from control methods, making it easily transferable to real robots. We show that after 20 minutes of training on a single GPU, CAJun can achieve continuous, long jumps with adaptive distances on a Go1 robot with small sim-to-real gaps. Moreover, the robot can jump across gaps with a maximum width of 70cm, which is over 40% wider than existing methods.
△ Less
Submitted 27 October, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Stackelberg Games for Learning Emergent Behaviors During Competitive Autocurricula
Authors:
Boling Yang,
Liyuan Zheng,
Lillian J. Ratliff,
Byron Boots,
Joshua R. Smith
Abstract:
Autocurricular training is an important sub-area of multi-agent reinforcement learning~(MARL) that allows multiple agents to learn emergent skills in an unsupervised co-evolving scheme. The robotics community has experimented autocurricular training with physically grounded problems, such as robust control and interactive manipulation tasks. However, the asymmetric nature of these tasks makes the…
▽ More
Autocurricular training is an important sub-area of multi-agent reinforcement learning~(MARL) that allows multiple agents to learn emergent skills in an unsupervised co-evolving scheme. The robotics community has experimented autocurricular training with physically grounded problems, such as robust control and interactive manipulation tasks. However, the asymmetric nature of these tasks makes the generation of sophisticated policies challenging. Indeed, the asymmetry in the environment may implicitly or explicitly provide an advantage to a subset of agents which could, in turn, lead to a low-quality equilibrium. This paper proposes a novel game-theoretic algorithm, Stackelberg Multi-Agent Deep Deterministic Policy Gradient (ST-MADDPG), which formulates a two-player MARL problem as a Stackelberg game with one player as the `leader' and the other as the `follower' in a hierarchical interaction structure wherein the leader has an advantage. We first demonstrate that the leader's advantage from ST-MADDPG can be used to alleviate the inherent asymmetry in the environment. By exploiting the leader's advantage, ST-MADDPG improves the quality of a co-evolution process and results in more sophisticated and complex strategies that work well even against an unseen strong opponent.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Continuous Versatile Jumping Using Learned Action Residuals
Authors:
Yuxiang Yang,
Xiangyun Meng,
Wenhao Yu,
Tingnan Zhang,
Jie Tan,
Byron Boots
Abstract:
Jumping is essential for legged robots to traverse through difficult terrains. In this work, we propose a hierarchical framework that combines optimal control and reinforcement learning to learn continuous jumping motions for quadrupedal robots. The core of our framework is a stance controller, which combines a manually designed acceleration controller with a learned residual policy. As the accele…
▽ More
Jumping is essential for legged robots to traverse through difficult terrains. In this work, we propose a hierarchical framework that combines optimal control and reinforcement learning to learn continuous jumping motions for quadrupedal robots. The core of our framework is a stance controller, which combines a manually designed acceleration controller with a learned residual policy. As the acceleration controller warm starts policy for efficient training, the trained policy overcomes the limitation of the acceleration controller and improves the jumping stability. In addition, a low-level whole-body controller converts the body pose command from the stance controller to motor commands. After training in simulation, our framework can be deployed directly to the real robot, and perform versatile, continuous jumping motions, including omni-directional jumps at up to 50cm high, 60cm forward, and jump-turning at up to 90 degrees. Please visit our website for more results: https://sites.google.com/view/learning-to-jump.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Learning to Read Braille: Bridging the Tactile Reality Gap with Diffusion Models
Authors:
Carolina Higuera,
Byron Boots,
Mustafa Mukadam
Abstract:
Simulating vision-based tactile sensors enables learning models for contact-rich tasks when collecting real world data at scale can be prohibitive. However, modeling the optical response of the gel deformation as well as incorporating the dynamics of the contact makes sim2real challenging. Prior works have explored data augmentation, fine-tuning, or learning generative models to reduce the sim2rea…
▽ More
Simulating vision-based tactile sensors enables learning models for contact-rich tasks when collecting real world data at scale can be prohibitive. However, modeling the optical response of the gel deformation as well as incorporating the dynamics of the contact makes sim2real challenging. Prior works have explored data augmentation, fine-tuning, or learning generative models to reduce the sim2real gap. In this work, we present the first method to leverage probabilistic diffusion models for capturing complex illumination changes from gel deformations. Our tactile diffusion model is able to generate realistic tactile images from simulated contact depth bridging the reality gap for vision-based tactile sensing. On real braille reading task with a DIGIT sensor, a classifier trained with our diffusion model achieves 75.74% accuracy outperforming classifiers trained with simulation and other approaches. Project page: https://github.com/carolinahiguera/Tactile-Diffusion
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations
Authors:
Anqi Li,
Byron Boots,
Ching-An Cheng
Abstract:
We study a new paradigm for sequential decision making, called offline policy learning from observations (PLfO). Offline PLfO aims to learn policies using datasets with substandard qualities: 1) only a subset of trajectories is labeled with rewards, 2) labeled trajectories may not contain actions, 3) labeled trajectories may not be of high quality, and 4) the data may not have full coverage. Such…
▽ More
We study a new paradigm for sequential decision making, called offline policy learning from observations (PLfO). Offline PLfO aims to learn policies using datasets with substandard qualities: 1) only a subset of trajectories is labeled with rewards, 2) labeled trajectories may not contain actions, 3) labeled trajectories may not be of high quality, and 4) the data may not have full coverage. Such imperfection is common in real-world learning scenarios, and offline PLfO encompasses many existing offline learning setups, including offline imitation learning (IL), offline IL from observations (ILfO), and offline reinforcement learning (RL). In this work, we present a generic approach to offline PLfO, called $\textbf{M}$odality-agnostic $\textbf{A}$dversarial $\textbf{H}$ypothesis $\textbf{A}$daptation for $\textbf{L}$earning from $\textbf{O}$bservations (MAHALO). Built upon the pessimism concept in offline RL, MAHALO optimizes the policy using a performance lower bound that accounts for uncertainty due to the dataset's insufficient coverage. We implement this idea by adversarially training data-consistent critic and reward functions, which forces the learned policy to be robust to data deficiency. We show that MAHALO consistently outperforms or matches specialized algorithms across a variety of offline PLfO tasks in theory and experiments. Our code is available at https://github.com/AnqiLi/mahalo.
△ Less
Submitted 6 August, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
-
TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation
Authors:
Xiangyun Meng,
Nathan Hatch,
Alexander Lambert,
Anqi Li,
Nolan Wagener,
Matthew Schmittle,
JoonHo Lee,
Wentao Yuan,
Zoey Chen,
Samuel Deng,
Greg Okopal,
Dieter Fox,
Byron Boots,
Amirreza Shaban
Abstract:
Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for se…
▽ More
Effective use of camera-based vision systems is essential for robust performance in autonomous off-road driving, particularly in the high-speed regime. Despite success in structured, on-road settings, current end-to-end approaches for scene prediction have yet to be successfully adapted for complex outdoor terrain. To this end, we present TerrainNet, a vision-based terrain perception system for semantic and geometric terrain prediction for aggressive, off-road navigation. The approach relies on several key insights and practical considerations for achieving reliable terrain modeling. The network includes a multi-headed output representation to capture fine- and coarse-grained terrain features necessary for estimating traversability. Accurate depth estimation is achieved using self-supervised depth completion with multi-view RGB and stereo inputs. Requirements for real-time performance and fast inference speeds are met using efficient, learned image feature projections. Furthermore, the model is trained on a large-scale, real-world off-road dataset collected across a variety of diverse outdoor environments. We show how TerrainNet can also be used for costmap prediction and provide a detailed framework for integration into a planning module. We demonstrate the performance of TerrainNet through extensive comparison to current state-of-the-art baselines for camera-only scene prediction. Finally, we showcase the effectiveness of integrating TerrainNet within a complete autonomous-driving stack by conducting a real-world vehicle test in a challenging off-road scenario.
△ Less
Submitted 29 May, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Adversarial Model for Offline Reinforcement Learning
Authors:
Mohak Bhardwaj,
Tengyang Xie,
Byron Boots,
Nan Jiang,
Ching-An Cheng
Abstract:
We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage. ARMOR is designed to optimize policies for the worst-case performance relative to the reference policy through adversarially training a Markov d…
▽ More
We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage. ARMOR is designed to optimize policies for the worst-case performance relative to the reference policy through adversarially training a Markov decision process model. In theory, we prove that ARMOR, with a well-tuned hyperparameter, can compete with the best policy within data coverage when the reference policy is supported by the data. At the same time, ARMOR is robust to hyperparameter choices: the policy learned by ARMOR, with "any" admissible hyperparameter, would never degrade the performance of the reference policy, even when the reference policy is not covered by the dataset. To validate these properties in practice, we design a scalable implementation of ARMOR, which by adversarial training, can optimize policies without using model ensembles in contrast to typical model-based methods. We show that ARMOR achieves competent performance with both state-of-the-art offline model-free and model-based RL algorithms and can robustly improve the reference policy over various hyperparameter choices.
△ Less
Submitted 24 December, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Learning to Optimize in Model Predictive Control
Authors:
Jacob Sacks,
Byron Boots
Abstract:
Sampling-based Model Predictive Control (MPC) is a flexible control framework that can reason about non-smooth dynamics and cost functions. Recently, significant work has focused on the use of machine learning to improve the performance of MPC, often through learning or fine-tuning the dynamics or cost function. In contrast, we focus on learning to optimize more effectively. In other words, to imp…
▽ More
Sampling-based Model Predictive Control (MPC) is a flexible control framework that can reason about non-smooth dynamics and cost functions. Recently, significant work has focused on the use of machine learning to improve the performance of MPC, often through learning or fine-tuning the dynamics or cost function. In contrast, we focus on learning to optimize more effectively. In other words, to improve the update rule within MPC. We show that this can be particularly useful in sampling-based MPC, where we often wish to minimize the number of samples for computational reasons. Unfortunately, the cost of computational efficiency is a reduction in performance; fewer samples results in noisier updates. We show that we can contend with this noise by learning how to update the control distribution more effectively and make better use of the few samples that we have. Our learned controllers are trained via imitation learning to mimic an expert which has access to substantially more samples. We test the efficacy of our approach on multiple simulated robotics tasks in sample-constrained regimes and demonstrate that our approach can outperform a MPC controller with the same number of samples.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Learning Sampling Distributions for Model Predictive Control
Authors:
Jacob Sacks,
Byron Boots
Abstract:
Sampling-based methods have become a cornerstone of contemporary approaches to Model Predictive Control (MPC), as they make no restrictions on the differentiability of the dynamics or cost function and are straightforward to parallelize. However, their efficacy is highly dependent on the quality of the sampling distribution itself, which is often assumed to be simple, like a Gaussian. This restric…
▽ More
Sampling-based methods have become a cornerstone of contemporary approaches to Model Predictive Control (MPC), as they make no restrictions on the differentiability of the dynamics or cost function and are straightforward to parallelize. However, their efficacy is highly dependent on the quality of the sampling distribution itself, which is often assumed to be simple, like a Gaussian. This restriction can result in samples which are far from optimal, leading to poor performance. Recent work has explored improving the performance of MPC by sampling in a learned latent space of controls. However, these methods ultimately perform all MPC parameter updates and warm-starting between time steps in the control space. This requires us to rely on a number of heuristics for generating samples and updating the distribution and may lead to sub-optimal performance. Instead, we propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution. Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time. By using a normalizing flow parameterization of the distribution, we can leverage its tractable density to avoid requiring differentiability of the dynamics and cost function. Finally, we evaluate the proposed approach on simulated robotics tasks and demonstrate its ability to surpass the performance of prior methods and scale better with a reduced number of samples.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Scalable Measurement Error Mitigation via Iterative Bayesian Unfolding
Authors:
Siddarth Srinivasan,
Bibek Pokharel,
Gregory Quiroz,
Byron Boots
Abstract:
Measurement error mitigation (MEM) techniques are postprocessing strategies to counteract systematic read-out errors on quantum computers (QC). Currently used MEM strategies face a tradeoff: methods that scale well with the number of qubits return negative probabilities, while those that guarantee a valid probability distribution are not scalable. Here, we present a scheme that addresses both of t…
▽ More
Measurement error mitigation (MEM) techniques are postprocessing strategies to counteract systematic read-out errors on quantum computers (QC). Currently used MEM strategies face a tradeoff: methods that scale well with the number of qubits return negative probabilities, while those that guarantee a valid probability distribution are not scalable. Here, we present a scheme that addresses both of these issues. In particular, we present a scalable implementation of iterative Bayesian unfolding, a standard mitigation technique used in high-energy physics experiments. We demonstrate our method by mitigating QC data from experimental preparation of Greenberger-Horne-Zeilinger (GHZ) states up to 127 qubits and implementation of the Bernstein-Vazirani algorithm on up to 26 qubits.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Motion Policy Networks
Authors:
Adam Fishman,
Adithyavairan Murali,
Clemens Eppner,
Bryan Peele,
Byron Boots,
Dieter Fox
Abstract:
Collision-free motion generation in unknown environments is a core building block for robot manipulation. Generating such motions is challenging due to multiple objectives; not only should the solutions be optimal, the motion generator itself must be fast enough for real-time performance and reliable enough for practical deployment. A wide variety of methods have been proposed ranging from local c…
▽ More
Collision-free motion generation in unknown environments is a core building block for robot manipulation. Generating such motions is challenging due to multiple objectives; not only should the solutions be optimal, the motion generator itself must be fast enough for real-time performance and reliable enough for practical deployment. A wide variety of methods have been proposed ranging from local controllers to global planners, often being combined to offset their shortcomings. We present an end-to-end neural model called Motion Policy Networks (M$π$Nets) to generate collision-free, smooth motion from just a single depth camera observation. M$π$Nets are trained on over 3 million motion planning problems in over 500,000 environments. Our experiments show that M$π$Nets are significantly faster than global planners while exhibiting the reactivity needed to deal with dynamic scenes. They are 46% better than prior neural planners and more robust than local control policies. Despite being only trained in simulation, M$π$Nets transfer well to the real robot with noisy partial point clouds. Code and data are publicly available at https://mpinets.github.io.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Neural Contact Fields: Tracking Extrinsic Contact with Tactile Sensing
Authors:
Carolina Higuera,
Siyuan Dong,
Byron Boots,
Mustafa Mukadam
Abstract:
We present Neural Contact Fields, a method that brings together neural fields and tactile sensing to address the problem of tracking extrinsic contact between object and environment. Knowing where the external contact occurs is a first step towards methods that can actively control it in facilitating downstream manipulation tasks. Prior work for localizing environmental contacts typically assume a…
▽ More
We present Neural Contact Fields, a method that brings together neural fields and tactile sensing to address the problem of tracking extrinsic contact between object and environment. Knowing where the external contact occurs is a first step towards methods that can actively control it in facilitating downstream manipulation tasks. Prior work for localizing environmental contacts typically assume a contact type (e.g. point or line), does not capture contact/no-contact transitions, and only works with basic geometric-shaped objects. Neural Contact Fields are the first method that can track arbitrary multi-modal extrinsic contacts without making any assumptions about the contact type. Our key insight is to estimate the probability of contact for any 3D point in the latent space of object shapes, given vision-based tactile inputs that sense the local motion resulting from the external contact. In experiments, we find that Neural Contact Fields are able to localize multiple contact patches without making any assumptions about the geometry of the contact, and capture contact/no-contact transitions for known categories of objects with unseen shapes in unseen environment configurations. In addition to Neural Contact Fields, we also release our YCB-Extrinsic-Contact dataset of simulated extrinsic contact interactions to enable further research in this area. Project page: https://github.com/carolinahiguera/NCF
△ Less
Submitted 13 March, 2023; v1 submitted 17 October, 2022;
originally announced October 2022.
-
Learning Semantics-Aware Locomotion Skills from Human Demonstration
Authors:
Yuxiang Yang,
Xiangyun Meng,
Wenhao Yu,
Tingnan Zhang,
Jie Tan,
Byron Boots
Abstract:
The semantics of the environment, such as the terrain type and property, reveals important information for legged robots to adjust their behaviors. In this work, we present a framework that learns semantics-aware locomotion skills from perception for quadrupedal robots, such that the robot can traverse through complex offroad terrains with appropriate speeds and gaits using perception information.…
▽ More
The semantics of the environment, such as the terrain type and property, reveals important information for legged robots to adjust their behaviors. In this work, we present a framework that learns semantics-aware locomotion skills from perception for quadrupedal robots, such that the robot can traverse through complex offroad terrains with appropriate speeds and gaits using perception information. Due to the lack of high-fidelity outdoor simulation, our framework needs to be trained directly in the real world, which brings unique challenges in data efficiency and safety. To ensure sample efficiency, we pre-train the perception model with an off-road driving dataset. To avoid the risks of real-world policy exploration, we leverage human demonstration to train a speed policy that selects a desired forward speed from camera image. For maximum traversability, we pair the speed policy with a gait selector, which selects a robust locomotion gait for each forward speed. Using only 40 minutes of human demonstration data, our framework learns to adjust the speed and gait of the robot based on perceived terrain semantics, and enables the robot to walk over 6km without failure at close-to-optimal speed.
△ Less
Submitted 10 October, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
CAFA: Class-Aware Feature Alignment for Test-Time Adaptation
Authors:
Sanghun Jung,
Jungsoo Lee,
Nanhee Kim,
Amirreza Shaban,
Byron Boots,
Jaegul Choo
Abstract:
Despite recent advancements in deep learning, deep neural networks continue to suffer from performance degradation when applied to new data that differs from training data. Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time. TTA can be applied to pretrained networks without modifying their training procedures, enabling them to utilize a wel…
▽ More
Despite recent advancements in deep learning, deep neural networks continue to suffer from performance degradation when applied to new data that differs from training data. Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time. TTA can be applied to pretrained networks without modifying their training procedures, enabling them to utilize a well-formed source distribution for adaptation. One possible approach is to align the representation space of test samples to the source distribution (\textit{i.e.,} feature alignment). However, performing feature alignment in TTA is especially challenging in that access to labeled source data is restricted during adaptation. That is, a model does not have a chance to learn test data in a class-discriminative manner, which was feasible in other adaptation tasks (\textit{e.g.,} unsupervised domain adaptation) via supervised losses on the source data. Based on this observation, we propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously 1) encourages a model to learn target representations in a class-discriminative manner and 2) effectively mitigates the distribution shifts at test time. Our method does not require any hyper-parameters or additional losses, which are required in previous approaches. We conduct extensive experiments on 6 different datasets and show our proposed method consistently outperforms existing baselines.
△ Less
Submitted 3 September, 2023; v1 submitted 31 May, 2022;
originally announced June 2022.
-
Learning Implicit Priors for Motion Optimization
Authors:
Julen Urain,
An T. Le,
Alexander Lambert,
Georgia Chalvatzaki,
Byron Boots,
Jan Peters
Abstract:
In this paper, we focus on the problem of integrating Energy-based Models (EBM) as guiding priors for motion optimization. EBMs are a set of neural networks that can represent expressive probability density distributions in terms of a Gibbs distribution parameterized by a suitable energy function. Due to their implicit nature, they can easily be integrated as optimization factors or as initial sam…
▽ More
In this paper, we focus on the problem of integrating Energy-based Models (EBM) as guiding priors for motion optimization. EBMs are a set of neural networks that can represent expressive probability density distributions in terms of a Gibbs distribution parameterized by a suitable energy function. Due to their implicit nature, they can easily be integrated as optimization factors or as initial sampling distributions in the motion optimization problem, making them good candidates to integrate data-driven priors in the motion optimization problem. In this work, we present a set of required modeling and algorithmic choices to adapt EBMs into motion optimization. We investigate the benefit of including additional regularizers in the learning of the EBMs to use them with gradient-based optimizers and we present a set of EBM architectures to learn generalizable distributions for manipulation tasks. We present multiple cases in which the EBM could be integrated for motion optimization and evaluate the performance of learned EBMs as guiding priors for both simulated and real robot experiments.
△ Less
Submitted 11 January, 2023; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Motivating Physical Activity via Competitive Human-Robot Interaction
Authors:
Boling Yang,
Golnaz Habibi,
Patrick E. Lancaster,
Byron Boots,
Joshua R. Smith
Abstract:
This project aims to motivate research in competitive human-robot interaction by creating a robot competitor that can challenge human users in certain scenarios such as physical exercise and games. With this goal in mind, we introduce the Fencing Game, a human-robot competition used to evaluate both the capabilities of the robot competitor and user experience. We develop the robot competitor throu…
▽ More
This project aims to motivate research in competitive human-robot interaction by creating a robot competitor that can challenge human users in certain scenarios such as physical exercise and games. With this goal in mind, we introduce the Fencing Game, a human-robot competition used to evaluate both the capabilities of the robot competitor and user experience. We develop the robot competitor through iterative multi-agent reinforcement learning and show that it can perform well against human competitors. Our user study additionally found that our system was able to continuously create challenging and enjoyable interactions that significantly increased human subjects' heart rates. The majority of human subjects considered the system to be entertaining and desirable for improving the quality of their exercise.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Nonprehensile Riemannian Motion Predictive Control
Authors:
Hamid Izadinia,
Byron Boots,
Steven M. Seitz
Abstract:
Nonprehensile manipulation involves long horizon underactuated object interactions and physical contact with different objects that can inherently introduce a high degree of uncertainty. In this work, we introduce a novel Real-to-Sim reward analysis technique, called Riemannian Motion Predictive Control (RMPC), to reliably imagine and predict the outcome of taking possible actions for a real robot…
▽ More
Nonprehensile manipulation involves long horizon underactuated object interactions and physical contact with different objects that can inherently introduce a high degree of uncertainty. In this work, we introduce a novel Real-to-Sim reward analysis technique, called Riemannian Motion Predictive Control (RMPC), to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. Our proposed RMPC benefits from Riemannian motion policy and second order dynamic model to compute the acceleration command and control the robot at every location on the surface. Our approach creates a 3D object-level recomposed model of the real scene where we can simulate the effect of different trajectories. We produce a closed-loop controller to reactively push objects in a continuous action space. We evaluate the performance of our RMPC approach by conducting experiments on a real robot platform as well as simulation and compare against several baselines. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Stein Variational Probabilistic Roadmaps
Authors:
Alexander Lambert,
Brian Hou,
Rosario Scalise,
Siddhartha S. Srinivasa,
Byron Boots
Abstract:
Efficient and reliable generation of global path plans are necessary for safe execution and deployment of autonomous systems. In order to generate planning graphs which adequately resolve the topology of a given environment, many sampling-based motion planners resort to coarse, heuristically-driven strategies which often fail to generalize to new and varied surroundings. Further, many of these app…
▽ More
Efficient and reliable generation of global path plans are necessary for safe execution and deployment of autonomous systems. In order to generate planning graphs which adequately resolve the topology of a given environment, many sampling-based motion planners resort to coarse, heuristically-driven strategies which often fail to generalize to new and varied surroundings. Further, many of these approaches are not designed to contend with partial-observability. We posit that such uncertainty in environment geometry can, in fact, help drive the sampling process in generating feasible, and probabilistically-safe planning graphs. We propose a method for Probabilistic Roadmaps which relies on particle-based Variational Inference to efficiently cover the posterior distribution over feasible regions in configuration space. Our approach, Stein Variational Probabilistic Roadmap (SV-PRM), results in sample-efficient generation of planning-graphs and large improvements over traditional sampling approaches. We demonstrate the approach on a variety of challenging planning problems, including real-world probabilistic occupancy maps and high-dof manipulation problems common in robotics.
△ Less
Submitted 20 May, 2022; v1 submitted 4 November, 2021;
originally announced November 2021.
-
Leveraging Experience in Lazy Search
Authors:
Mohak Bhardwaj,
Sanjiban Choudhury,
Byron Boots,
Siddhartha Srinivasa
Abstract:
Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a…
▽ More
Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics. We further provide a novel theoretical analysis of lazy search in a Bayesian framework as well as regret guarantees on our imitation learning based approach to motion planning.
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
Geometric Fabrics: Generalizing Classical Mechanics to Capture the Physics of Behavior
Authors:
Karl Van Wyk,
Mandy Xie,
Anqi Li,
Muhammad Asif Rana,
Buck Babich,
Bryan Peele,
Qian Wan,
Iretiayo Akinola,
Balakumar Sundaralingam,
Dieter Fox,
Byron Boots,
Nathan D. Ratliff
Abstract:
Classical mechanical systems are central to controller design in energy shaping methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guara…
▽ More
Classical mechanical systems are central to controller design in energy shaping methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guarantees. In this work, we generalize classical mechanics to what we call geometric fabrics, whose expressivity and theory enable the design of systems that outperform RMPs in practice. Geometric fabrics strictly generalize classical mechanics forming a new physics of behavior by first generalizing them to Finsler geometries and then explicitly bending them to shape their behavior while maintaining stability. We develop the theory of fabrics and present both a collection of controlled experiments examining their theoretical properties and a set of robot system experiments showing improved performance over a well-engineered and hardened implementation of RMPs, our current state-of-the-art in controller design.
△ Less
Submitted 18 January, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.
-
Entropy Regularized Motion Planning via Stein Variational Inference
Authors:
Alexander Lambert,
Byron Boots
Abstract:
Many Imitation and Reinforcement Learning approaches rely on the availability of expert-generated demonstrations for learning policies or value functions from data. Obtaining a reliable distribution of trajectories from motion planners is non-trivial, since it must broadly cover the space of states likely to be encountered during execution while also satisfying task-based constraints. We propose a…
▽ More
Many Imitation and Reinforcement Learning approaches rely on the availability of expert-generated demonstrations for learning policies or value functions from data. Obtaining a reliable distribution of trajectories from motion planners is non-trivial, since it must broadly cover the space of states likely to be encountered during execution while also satisfying task-based constraints. We propose a sampling strategy based on variational inference to generate distributions of feasible, low-cost trajectories for high-dof motion planning tasks. This includes a distributed, particle-based motion planning algorithm which leverages a structured graphical representations for inference over multi-modal posterior distributions. We also make explicit connections to both approximate inference for trajectory optimization and entropy-regularized reinforcement learning.
△ Less
Submitted 11 July, 2021;
originally announced July 2021.
-
Safe Reinforcement Learning Using Advantage-Based Intervention
Authors:
Nolan Wagener,
Byron Boots,
Ching-An Cheng
Abstract:
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still s…
▽ More
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints. Although much recent research has focused on the development of safe reinforcement learning (RL) algorithms that produce a safe policy after training, ensuring safety during training as well remains an open problem. A fundamental challenge is performing exploration while still satisfying constraints in an unknown Markov decision process (MDP). In this work, we address this problem for the chance-constrained setting. We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training and optimizes the agent's policy using off-the-shelf RL algorithms designed for unconstrained MDPs. Our method comes with strong guarantees on safety during both training and deployment (i.e., after training and without the intervention mechanism) and policy performance compared to the optimal safety-constrained policy. In our experiments, we show that SAILR violates constraints far less during training than standard safe RL and constrained MDP approaches and converges to a well-performing policy that can be deployed safely without intervention. Our code is available at https://github.com/nolanwagener/safe_rl.
△ Less
Submitted 19 July, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories
Authors:
Mandy Xie,
Anqi Li,
Karl Van Wyk,
Frank Dellaert,
Byron Boots,
Nathan Ratliff
Abstract:
Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoper…
▽ More
Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.
△ Less
Submitted 5 June, 2021; v1 submitted 6 May, 2021;
originally announced May 2021.
-
STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation
Authors:
Mohak Bhardwaj,
Balakumar Sundaralingam,
Arsalan Mousavian,
Nathan Ratliff,
Dieter Fox,
Fabio Ramos,
Byron Boots
Abstract:
Sampling-based model-predictive control (MPC) is a promising tool for feedback control of robots with complex, non-smooth dynamics, and cost functions. However, the computationally demanding nature of sampling-based MPC algorithms has been a key bottleneck in their application to high-dimensional robotic manipulation problems in the real world. Previous methods have addressed this issue by running…
▽ More
Sampling-based model-predictive control (MPC) is a promising tool for feedback control of robots with complex, non-smooth dynamics, and cost functions. However, the computationally demanding nature of sampling-based MPC algorithms has been a key bottleneck in their application to high-dimensional robotic manipulation problems in the real world. Previous methods have addressed this issue by running MPC in the task space while relying on a low-level operational space controller for joint control. However, by not using the joint space of the robot in the MPC formulation, existing methods cannot directly account for non-task space related constraints such as avoiding joint limits, singular configurations, and link collisions. In this paper, we develop a system for fast, joint space sampling-based MPC for manipulators that is efficiently parallelized using GPUs. Our approach can handle task and joint space constraints while taking less than 8ms~(125Hz) to compute the next control command. Further, our method can tightly integrate perception into the control problem by utilizing learned cost functions from raw sensor data. We validate our approach by deploying it on a Franka Panda robot for a variety of dynamic manipulation tasks. We study the effect of different cost formulations and MPC parameters on the synthesized behavior and provide key insights that pave the way for the application of sampling-based MPC for manipulators in a principled manner. We also provide highly optimized, open-source code to be used by the wider robot learning and control community. Videos of experiments can be found at: https://sites.google.com/view/manipulation-mpc
△ Less
Submitted 14 September, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Fast and Efficient Locomotion via Learned Gait Transitions
Authors:
Yuxiang Yang,
Tingnan Zhang,
Erwin Coumans,
Jie Tan,
Byron Boots
Abstract:
We focus on the problem of developing energy efficient controllers for quadrupedal robots. Animals can actively switch gaits at different speeds to lower their energy consumption. In this paper, we devise a hierarchical learning framework, in which distinctive locomotion gaits and natural gait transitions emerge automatically with a simple reward of energy minimization. We use evolutionary strateg…
▽ More
We focus on the problem of developing energy efficient controllers for quadrupedal robots. Animals can actively switch gaits at different speeds to lower their energy consumption. In this paper, we devise a hierarchical learning framework, in which distinctive locomotion gaits and natural gait transitions emerge automatically with a simple reward of energy minimization. We use evolutionary strategies (ES) to train a high-level gait policy that specifies gait patterns of each foot, while the low-level convex MPC controller optimizes the motor commands so that the robot can walk at a desired velocity using that gait pattern. We test our learning framework on a quadruped robot and demonstrate automatic gait transitions, from walking to trotting and to fly-trotting, as the robot increases its speed. We show that the learned hierarchical controller consumes much less energy across a wide range of locomotion speed than baseline controllers.
△ Less
Submitted 22 November, 2021; v1 submitted 9 April, 2021;
originally announced April 2021.
-
The Value of Planning for Infinite-Horizon Model Predictive Control
Authors:
Nathan Hatch,
Byron Boots
Abstract:
Model Predictive Control (MPC) is a classic tool for optimal control of complex, real-world systems. Although it has been successfully applied to a wide range of challenging tasks in robotics, it is fundamentally limited by the prediction horizon, which, if too short, will result in myopic decisions. Recently, several papers have suggested using a learned value function as the terminal cost for MP…
▽ More
Model Predictive Control (MPC) is a classic tool for optimal control of complex, real-world systems. Although it has been successfully applied to a wide range of challenging tasks in robotics, it is fundamentally limited by the prediction horizon, which, if too short, will result in myopic decisions. Recently, several papers have suggested using a learned value function as the terminal cost for MPC. If the value function is accurate, it effectively allows MPC to reason over an infinite horizon. Unfortunately, Reinforcement Learning (RL) solutions to value function approximation can be difficult to realize for robotics tasks. In this paper, we suggest a more efficient method for value function approximation that applies to goal-directed problems, like reaching and navigation. In these problems, MPC is often formulated to track a path or trajectory returned by a planner. However, this strategy is brittle in that unexpected perturbations to the robot will require replanning, which can be costly at runtime. Instead, we show how the intermediate data structures used by modern planners can be interpreted as an approximate value function. We show that that this value function can be used by MPC directly, resulting in more efficient and resilient behavior at runtime.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Few-shot Weakly-Supervised Object Detection via Directional Statistics
Authors:
Amirreza Shaban,
Amir Rahimi,
Thalaiyasingam Ajanthan,
Byron Boots,
Richard Hartley
Abstract:
Detecting novel objects from few examples has become an emerging topic in computer vision recently. However, these methods need fully annotated training images to learn new object categories which limits their applicability in real world scenarios such as field robotics. In this work, we propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and f…
▽ More
Detecting novel objects from few examples has become an emerging topic in computer vision recently. However, these methods need fully annotated training images to learn new object categories which limits their applicability in real world scenarios such as field robotics. In this work, we propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and few-shot Weakly Supervised Object Detection (WSOD). In these tasks, only image-level labels, which are much cheaper to acquire, are available. We find that operating on features extracted from the last layer of a pre-trained Faster-RCNN is more effective compared to previous episodic learning based few-shot COL methods. Our model simultaneously learns the distribution of the novel objects and localizes them via expectation-maximization steps. As a probabilistic model, we employ von Mises-Fisher (vMF) distribution which captures the semantic information better than Gaussian distribution when applied to the pre-trained embedding space. When the novel objects are localized, we utilize them to learn a linear appearance model to detect novel classes in new images. Our extensive experiments show that the proposed method, despite being simple, outperforms strong baselines in few-shot COL and WSOD, as well as large-scale WSOD tasks.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Dual Online Stein Variational Inference for Control and Dynamics
Authors:
Lucas Barcelos,
Alexander Lambert,
Rafael Oliveira,
Paulo Borges,
Byron Boots,
Fabio Ramos
Abstract:
Model predictive control (MPC) schemes have a proven track record for delivering aggressive and robust performance in many challenging control tasks, coping with nonlinear system dynamics, constraints, and observational noise. Despite their success, these methods often rely on simple control distributions, which can limit their performance in highly uncertain and complex environments. MPC framewor…
▽ More
Model predictive control (MPC) schemes have a proven track record for delivering aggressive and robust performance in many challenging control tasks, coping with nonlinear system dynamics, constraints, and observational noise. Despite their success, these methods often rely on simple control distributions, which can limit their performance in highly uncertain and complex environments. MPC frameworks must be able to accommodate changing distributions over system parameters, based on the most recent measurements. In this paper, we devise an implicit variational inference algorithm able to estimate distributions over model parameters and control inputs on-the-fly. The method incorporates Stein Variational gradient descent to approximate the target distributions as a collection of particles, and performs updates based on a Bayesian formulation. This enables the approximation of complex multi-modal posterior distributions, typically occurring in challenging and realistic robot navigation tasks. We demonstrate our approach on both simulated and real-world experiments requiring real-time execution in the face of dynamically changing environments.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
RMP2: A Structured Composable Policy Class for Robot Learning
Authors:
Anqi Li,
Ching-An Cheng,
M. Asif Rana,
Man Xie,
Karl Van Wyk,
Nathan Ratliff,
Byron Boots
Abstract:
We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different lev…
▽ More
We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots. However, implementing a system for end-to-end learning RMPflow policies faces several computational challenges. In this work, we re-examine the message passing algorithm of RMPflow and propose a more efficient alternate algorithm, called RMP2, that uses modern automatic differentiation tools (such as TensorFlow and PyTorch) to compute RMPflow policies. Our new design retains the strengths of RMPflow while bringing in advantages from automatic differentiation, including 1) easy programming interfaces to designing complex transformations; 2) support of general directed acyclic graph (DAG) transformation structures; 3) end-to-end differentiability for policy learning; 4) improved computational efficiency. Because of these features, RMP2 can be treated as a structured policy class for efficient robot learning which is suitable encoding domain knowledge. Our experiments show that using structured policy class given by RMP2 can improve policy performance and safety in reinforcement learning tasks for goal reaching in cluttered space.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Combining pretrained CNN feature extractors to enhance clustering of complex natural images
Authors:
Joris Guerin,
Stephane Thiery,
Eric Nyiri,
Olivier Gibaru,
Byron Boots
Abstract:
Recently, a common starting point for solving complex unsupervised image classification tasks is to use generic features, extracted with deep Convolutional Neural Networks (CNN) pretrained on a large and versatile dataset (ImageNet). However, in most research, the CNN architecture for feature extraction is chosen arbitrarily, without justification. This paper aims at providing insight on the use o…
▽ More
Recently, a common starting point for solving complex unsupervised image classification tasks is to use generic features, extracted with deep Convolutional Neural Networks (CNN) pretrained on a large and versatile dataset (ImageNet). However, in most research, the CNN architecture for feature extraction is chosen arbitrarily, without justification. This paper aims at providing insight on the use of pretrained CNN features for image clustering (IC). First, extensive experiments are conducted and show that, for a given dataset, the choice of the CNN architecture for feature extraction has a huge impact on the final clustering. These experiments also demonstrate that proper extractor selection for a given IC task is difficult. To solve this issue, we propose to rephrase the IC problem as a multi-view clustering (MVC) problem that considers features extracted from different architectures as different "views" of the same data. This approach is based on the assumption that information contained in the different CNN may be complementary, even when pretrained on the same data. We then propose a multi-input neural network architecture that is trained end-to-end to solve the MVC problem effectively. This approach is tested on nine natural image datasets, and produces state-of-the-art results for IC.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
Towards Coordinated Robot Motions: End-to-End Learning of Motion Policies on Transform Trees
Authors:
M. Asif Rana,
Anqi Li,
Dieter Fox,
Sonia Chernova,
Byron Boots,
Nathan Ratliff
Abstract:
Generating robot motion that fulfills multiple tasks simultaneously is challenging due to the geometric constraints imposed by the robot. In this paper, we propose to solve multi-task problems through learning structured policies from human demonstrations. Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces. The policy structure provides the…
▽ More
Generating robot motion that fulfills multiple tasks simultaneously is challenging due to the geometric constraints imposed by the robot. In this paper, we propose to solve multi-task problems through learning structured policies from human demonstrations. Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces. The policy structure provides the user an interface to 1) specifying the spaces that are directly relevant to the completion of the tasks, and 2) designing policies for certain tasks that do not need to be learned. We derive an end-to-end learning objective function that is suitable for the multi-task problem, emphasizing the deviation of motions on task spaces. Furthermore, the motion generated from the learned policy class is guaranteed to be stable. We validate the effectiveness of our proposed learning framework through qualitative and quantitative evaluations on three robotic tasks on a 7-DOF Rethink Sawyer robot.
△ Less
Submitted 10 March, 2021; v1 submitted 24 December, 2020;
originally announced December 2020.
-
Blending MPC & Value Function Approximation for Efficient Reinforcement Learning
Authors:
Mohak Bhardwaj,
Sanjiban Choudhury,
Byron Boots
Abstract:
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems that uses a model to make predictions about future behavior. For each state encountered, MPC solves an online optimization problem to choose a control action that will minimize future cost. This is a surprisingly effective strategy, but real-time performance requirements warrant the use of simple models.…
▽ More
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems that uses a model to make predictions about future behavior. For each state encountered, MPC solves an online optimization problem to choose a control action that will minimize future cost. This is a surprisingly effective strategy, but real-time performance requirements warrant the use of simple models. If the model is not sufficiently accurate, then the resulting controller can be biased, limiting performance. We present a framework for improving on MPC with model-free reinforcement learning (RL). The key insight is to view MPC as constructing a series of local Q-function approximations. We show that by using a parameter $λ$, similar to the trace decay parameter in TD($λ$), we can systematically trade-off learned value estimates against the local Q-function approximations. We present a theoretical analysis that shows how error from inaccurate models in MPC and value function estimation in RL can be balanced. We further propose an algorithm that changes $λ$ over time to reduce the dependence on MPC as our estimates of the value function improve, and test the efficacy our approach on challenging high-dimensional manipulation tasks with biased models in simulation. We demonstrate that our approach can obtain performance comparable with MPC with access to true dynamics even under severe model bias and is more sample efficient as compared to model-free RL.
△ Less
Submitted 13 April, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Stein Variational Model Predictive Control
Authors:
Alexander Lambert,
Adam Fishman,
Dieter Fox,
Byron Boots,
Fabio Ramos
Abstract:
Decision making under uncertainty is critical to real-world, autonomous systems. Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex probability distributions. In this paper, we propose a generalization of MPC that represents a multitude of solutions as posterior distributions. By casting MPC as a Bayesian inferen…
▽ More
Decision making under uncertainty is critical to real-world, autonomous systems. Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex probability distributions. In this paper, we propose a generalization of MPC that represents a multitude of solutions as posterior distributions. By casting MPC as a Bayesian inference problem, we employ variational methods for posterior computation, naturally encoding the complexity and multi-modality of the decision making problem. We present a Stein variational gradient descent method to estimate the posterior directly over control parameters, given a cost function and observed state trajectories. We show that this framework leads to successful planning in challenging, non-convex optimal control problems.
△ Less
Submitted 12 April, 2021; v1 submitted 15 November, 2020;
originally announced November 2020.
-
Grasping with Chopsticks: Combating Covariate Shift in Model-free Imitation Learning for Fine Manipulation
Authors:
Liyiming Ke,
Jingqiang Wang,
Tapomayukh Bhattacharjee,
Byron Boots,
Siddhartha Srinivasa
Abstract:
Billions of people use chopsticks, a simple yet versatile tool, for fine manipulation of everyday objects. The small, curved, and slippery tips of chopsticks pose a challenge for picking up small objects, making them a suitably complex test case. This paper leverages human demonstrations to develop an autonomous chopsticks-equipped robotic manipulator. Due to the lack of accurate models for fine m…
▽ More
Billions of people use chopsticks, a simple yet versatile tool, for fine manipulation of everyday objects. The small, curved, and slippery tips of chopsticks pose a challenge for picking up small objects, making them a suitably complex test case. This paper leverages human demonstrations to develop an autonomous chopsticks-equipped robotic manipulator. Due to the lack of accurate models for fine manipulation, we explore model-free imitation learning, which traditionally suffers from the covariate shift phenomenon that causes poor generalization. We propose two approaches to reduce covariate shift, neither of which requires access to an interactive expert or a model, unlike previous approaches. First, we alleviate single-step prediction errors by applying an invariant operator to increase the data support at critical steps for grasping. Second, we generate synthetic corrective labels by adding bounded noise and combining parametric and non-parametric methods to prevent error accumulation. We demonstrate our methods on a real chopstick-equipped robot that we built, and observe the agent's success rate increase from 37.3% to 80%, which is comparable to the human expert performance of 82.6%.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Geometric Fabrics for the Acceleration-based Design of Robotic Motion
Authors:
Mandy Xie,
Karl Van Wyk,
Anqi Li,
Muhammad Asif Rana,
Qian Wan,
Dieter Fox,
Byron Boots,
Nathan Ratliff
Abstract:
This paper describes the pragmatic design and construction of geometric fabrics for shaping a robot's task-independent nominal behavior, capturing behavioral components such as obstacle avoidance, joint limit avoidance, redundancy resolution, global navigation heuristics, etc. Geometric fabrics constitute the most concrete incarnation of a new mathematical formulation for reactive behavior called…
▽ More
This paper describes the pragmatic design and construction of geometric fabrics for shaping a robot's task-independent nominal behavior, capturing behavioral components such as obstacle avoidance, joint limit avoidance, redundancy resolution, global navigation heuristics, etc. Geometric fabrics constitute the most concrete incarnation of a new mathematical formulation for reactive behavior called optimization fabrics. Fabrics generalize recent work on Riemannian Motion Policies (RMPs); they add provable stability guarantees and improve design consistency while promoting the intuitive acceleration-based principles of modular design that make RMPs successful. We describe a suite of mathematical modeling tools that practitioners can employ in practice and demonstrate both how to mitigate system complexity by constructing behaviors layer-wise and how to employ these tools to design robust, strongly-generalizing, policies that solve practical problems one would expect to find in industry applications. Our system exhibits intelligent global navigation behaviors expressed entirely as provably stable fabrics with zero planning or state machine governance.
△ Less
Submitted 25 June, 2021; v1 submitted 28 October, 2020;
originally announced October 2020.
-
Quantum Tensor Networks, Stochastic Processes, and Weighted Automata
Authors:
Siddarth Srinivasan,
Sandesh Adhikary,
Jacob Miller,
Guillaume Rabusseau,
Byron Boots
Abstract:
Modeling joint probability distributions over sequences has been studied from many perspectives. The physics community developed matrix product states, a tensor-train decomposition for probabilistic modeling, motivated by the need to tractably model many-body systems. But similar models have also been studied in the stochastic processes and weighted automata literature, with little work on how the…
▽ More
Modeling joint probability distributions over sequences has been studied from many perspectives. The physics community developed matrix product states, a tensor-train decomposition for probabilistic modeling, motivated by the need to tractably model many-body systems. But similar models have also been studied in the stochastic processes and weighted automata literature, with little work on how these bodies of work relate to each other. We address this gap by showing how stationary or uniform versions of popular quantum tensor network models have equivalent representations in the stochastic processes and weighted automata literature, in the limit of infinitely long sequences. We demonstrate several equivalence results between models used in these three communities: (i) uniform variants of matrix product states, Born machines and locally purified states from the quantum tensor networks literature, (ii) predictive state representations, hidden Markov models, norm-observable operator models and hidden quantum Markov models from the stochastic process literature,and (iii) stochastic weighted automata, probabilistic automata and quadratic automata from the formal languages literature. Such connections may open the door for results and methods developed in one area to be applied in another.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion
Authors:
Xingye Da,
Zhaoming Xie,
David Hoeller,
Byron Boots,
Animashree Anandkumar,
Yuke Zhu,
Buck Babich,
Animesh Garg
Abstract:
We present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped (the Unitree Laikago). The system consists of a high-level controller that learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute…
▽ More
We present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped (the Unitree Laikago). The system consists of a high-level controller that learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute the primitives. Our framework learns a controller that can adapt to challenging environmental changes on the fly, including novel scenarios not seen during training. The learned controller is up to 85~percent more energy efficient and is more robust compared to baseline methods. We also deploy the controller on a physical robot without any randomization or adaptation scheme.
△ Less
Submitted 23 November, 2020; v1 submitted 21 September, 2020;
originally announced September 2020.
-
RMPflow: A Geometric Framework for Generation of Multi-Task Motion Policies
Authors:
Ching-An Cheng,
Mustafa Mukadam,
Jan Issac,
Stan Birchfield,
Dieter Fox,
Byron Boots,
Nathan Ratliff
Abstract:
Generating robot motion for multiple tasks in dynamic environments is challenging, requiring an algorithm to respond reactively while accounting for complex nonlinear relationships between tasks. In this paper, we develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies…
▽ More
Generating robot motion for multiple tasks in dynamic environments is challenging, requiring an algorithm to respond reactively while accounting for complex nonlinear relationships between tasks. In this paper, we develop a novel policy synthesis algorithm, RMPflow, based on geometrically consistent transformations of Riemannian Motion Policies (RMPs). RMPs are a class of reactive motion policies that parameterize non-Euclidean behaviors as dynamical systems in intrinsically nonlinear task spaces. Given a set of RMPs designed for individual tasks, RMPflow can combine these policies to generate an expressive global policy, while simultaneously exploiting sparse structure for computational efficiency. We study the geometric properties of RMPflow and provide sufficient conditions for stability. Finally, we experimentally demonstrate that accounting for the natural Riemannian geometry of task policies can simplify classically difficult problems, such as planning through clutter on high-DOF manipulation systems.
△ Less
Submitted 24 July, 2020;
originally announced July 2020.
-
Explaining Fast Improvement in Online Imitation Learning
Authors:
Xinyan Yan,
Byron Boots,
Ching-An Cheng
Abstract:
Online imitation learning (IL) is an algorithmic framework that leverages interactions with expert policies for efficient policy optimization. Here policies are optimized by performing online learning on a sequence of loss functions that encourage the learner to mimic expert actions, and if the online learning has no regret, the agent can provably learn an expert-like policy. Online IL has demonst…
▽ More
Online imitation learning (IL) is an algorithmic framework that leverages interactions with expert policies for efficient policy optimization. Here policies are optimized by performing online learning on a sequence of loss functions that encourage the learner to mimic expert actions, and if the online learning has no regret, the agent can provably learn an expert-like policy. Online IL has demonstrated empirical successes in many applications and interestingly, its policy improvement speed observed in practice is usually much faster than existing theory suggests. In this work, we provide an explanation of this phenomenon. Let $ξ$ denote the policy class bias and assume the online IL loss functions are convex, smooth, and non-negative. We prove that, after $N$ rounds of online IL with stochastic feedback, the policy improves in $\tilde{O}(1/N + \sqrt{ξ/N})$ in both expectation and high probability. In other words, we show that adopting a sufficiently expressive policy class in online IL has two benefits: both the policy improvement speed increases and the performance bias decreases.
△ Less
Submitted 21 February, 2021; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems
Authors:
Muhammad Asif Rana,
Anqi Li,
Dieter Fox,
Byron Boots,
Fabio Ramos,
Nathan Ratliff
Abstract:
Robotic tasks often require motions with complex geometric structures. We present an approach to learn such motions from a limited number of human demonstrations by exploiting the regularity properties of human motions e.g. stability, smoothness, and boundedness. The complex motions are encoded as rollouts of a stable dynamical system, which, under a change of coordinates defined by a diffeomorphi…
▽ More
Robotic tasks often require motions with complex geometric structures. We present an approach to learn such motions from a limited number of human demonstrations by exploiting the regularity properties of human motions e.g. stability, smoothness, and boundedness. The complex motions are encoded as rollouts of a stable dynamical system, which, under a change of coordinates defined by a diffeomorphism, is equivalent to a simple, hand-specified dynamical system. As an immediate result of using diffeomorphisms, the stability property of the hand-specified dynamical system directly carry over to the learned dynamical system. Inspired by recent works in density estimation, we propose to represent the diffeomorphism as a composition of simple parameterized diffeomorphisms. Additional structure is imposed to provide guarantees on the smoothness of the generated motions. The efficacy of this approach is demonstrated through validation on an established benchmark as well demonstrations collected on a real-world robotic system.
△ Less
Submitted 21 September, 2020; v1 submitted 26 May, 2020;
originally announced May 2020.
-
Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization
Authors:
Amir Rahimi,
Amirreza Shaban,
Thalaiyasingam Ajanthan,
Richard Hartley,
Byron Boots
Abstract:
Weakly Supervised Object Localization (WSOL) methods only require image level labels as opposed to expensive bounding box annotations required by fully supervised algorithms. We study the problem of learning localization model on target classes with weakly supervised image labels, helped by a fully annotated source dataset. Typically, a WSOL model is first trained to predict class generic objectne…
▽ More
Weakly Supervised Object Localization (WSOL) methods only require image level labels as opposed to expensive bounding box annotations required by fully supervised algorithms. We study the problem of learning localization model on target classes with weakly supervised image labels, helped by a fully annotated source dataset. Typically, a WSOL model is first trained to predict class generic objectness scores on an off-the-shelf fully supervised source dataset and then it is progressively adapted to learn the objects in the weakly supervised target dataset. In this work, we argue that learning only an objectness function is a weak form of knowledge transfer and propose to learn a classwise pairwise similarity function that directly compares two input proposals as well. The combined localization model and the estimated object annotations are jointly learned in an alternating optimization paradigm as is typically done in standard WSOL methods. In contrast to the existing work that learns pairwise similarities, our approach optimizes a unified objective with convergence guarantee and it is computationally efficient for large-scale applications. Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function. For instance, in the ILSVRC dataset, the Correct Localization (CorLoc) performance improves from 72.8% to 78.2% which is a new state-of-the-art for WSOL task in the context of knowledge transfer.
△ Less
Submitted 19 July, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.