Search | arXiv e-print repository

Mobile Robot Sensory Coverage in 2-D Environments: An Optimization Approach with Efficiency Bounds

Authors: E. Fourney, J. W. Burdick, E. D. Rimon

Abstract: This paper considers three related mobile robot multi-target sensory coverage and inspection planning problems in 2-D environments. In the first problem, a mobile robot must find the shortest path to observe multiple targets with a limited range sensor in an obstacle free environment. In the second problem, the mobile robot must efficiently observe multiple targets while taking advantage of multi-… ▽ More This paper considers three related mobile robot multi-target sensory coverage and inspection planning problems in 2-D environments. In the first problem, a mobile robot must find the shortest path to observe multiple targets with a limited range sensor in an obstacle free environment. In the second problem, the mobile robot must efficiently observe multiple targets while taking advantage of multi-target views in an obstacle free environment. The third problem considers multi-target sensory coverage in the presence of obstacles that obstruct sensor views of the targets. We show how all three problems can be formulated in a MINLP optimization framework. Because exact solutions to these problems are NP-hard, we introduce polynomial time approximation algorithms for each problem. These algorithms combine polynomial-time methods to approximate the optimal target sensing order, combined with efficient convex optimization methods that incorporate the constraints posed by the robot sensor footprint and obstacles in the environment. Importantly, we develop bounds that limit the gap between the exact and approximate solutions. Algorithms for all problems are fully implemented and illustrated with examples. Beyond the utility of our algorithms, the bounds derived in the paper contribute to the theory of optimal coverage planning algorithms. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2403.18972 [pdf, other]

Risk-Aware Robotics: Tail Risk Measures in Planning, Control, and Verification

Authors: Prithvi Akella, Anushri Dixit, Mohamadreza Ahmadi, Lars Lindemann, Margaret P. Chapman, George J. Pappas, Aaron D. Ames, Joel W. Burdick

Abstract: The need for a systematic approach to risk assessment has increased in recent years due to the ubiquity of autonomous systems that alter our day-to-day experiences and their need for safety, e.g., for self-driving vehicles, mobile service robots, and bipedal robots. These systems are expected to function safely in unpredictable environments and interact seamlessly with humans, whose behavior is no… ▽ More The need for a systematic approach to risk assessment has increased in recent years due to the ubiquity of autonomous systems that alter our day-to-day experiences and their need for safety, e.g., for self-driving vehicles, mobile service robots, and bipedal robots. These systems are expected to function safely in unpredictable environments and interact seamlessly with humans, whose behavior is notably challenging to forecast. We present a survey of risk-aware methodologies for autonomous systems. We adopt a contemporary risk-aware approach to mitigate rare and detrimental outcomes by advocating the use of tail risk measures, a concept borrowed from financial literature. This survey will introduce these measures and explain their relevance in the context of robotic systems for planning, control, and verification applications. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.08916 [pdf, ps, other]

doi 10.1109/LCSYS.2024.3416239

Rollover Prevention for Mobile Robots with Control Barrier Functions: Differentiator-Based Adaptation and Projection-to-State Safety

Authors: Ersin Das, Aaron D. Ames, Joel W. Burdick

Abstract: This paper develops rollover prevention guarantees for mobile robots using control barrier function (CBF) theory, and demonstrates the method experimentally. We consider a safety measure based on a zero moment point condition through the lens of CBFs. However, these conditions depend on time-varying and noisy parameters. To address this issue, we present a differentiator-based safety-critical cont… ▽ More This paper develops rollover prevention guarantees for mobile robots using control barrier function (CBF) theory, and demonstrates the method experimentally. We consider a safety measure based on a zero moment point condition through the lens of CBFs. However, these conditions depend on time-varying and noisy parameters. To address this issue, we present a differentiator-based safety-critical controller that estimates these parameters and pairs Input-to-State Stable (ISS) differentiator dynamics with CBFs to achieve rigorous safety guarantees. Additionally, to ensure safety in the presence of disturbances, we utilize a time-varying extension of Projection-to-State Safety (PSSf). The effectiveness of the proposed method is demonstrated via experiments on a tracked robot with a rollover potential on steep slopes. △ Less

Submitted 15 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.03215 [pdf, other]

A Safety-Critical Framework for UGVs in Complex Environments: A Data-Driven Discrepancy-Aware Approach

Authors: Skylar X. Wei, Lu Gan, Joel W. Burdick

Abstract: This work presents a novel data-driven multi-layered planning and control framework for the safe navigation of a class of unmanned ground vehicles (UGVs) in the presence of unknown stationary obstacles and additive modeling uncertainties. The foundation of this framework is a novel robust model predictive planner, designed to generate optimal collision-free trajectories given an occupancy grid map… ▽ More This work presents a novel data-driven multi-layered planning and control framework for the safe navigation of a class of unmanned ground vehicles (UGVs) in the presence of unknown stationary obstacles and additive modeling uncertainties. The foundation of this framework is a novel robust model predictive planner, designed to generate optimal collision-free trajectories given an occupancy grid map, and a paired ancillary controller, augmented to provide robustness against model uncertainties extracted from learning data. To tackle modeling discrepancies, we identify both matched (input discrepancies) and unmatched model residuals between the true and the nominal reduced-order models using closed-loop tracking errors as training data. Utilizing conformal prediction, we extract probabilistic upper bounds for the unknown model residuals, which serve to construct a robustifying ancillary controller. Further, we also determine maximum tracking discrepancies, also known as the robust control invariance tube, under the augmented policy, formulating them as collision buffers. Employing a LiDAR-based occupancy map to characterize the environment, we construct a discrepancy-aware cost map that incorporates these collision buffers. This map is then integrated into a sampling-based model predictive path planner that generates optimal and safe trajectories that can be robustly tracked by the augmented ancillary controller in the presence of model mismatches. The effectiveness of the framework is experimentally validated for autonomous high-speed trajectory tracking in a cluttered environment with four different vehicle-terrain configurations. We also showcase the framework's versatility by reformulating it as a driver-assist program, providing collision avoidance corrections based on user joystick commands. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2401.01881 [pdf, other]

Robust Control Barrier Functions using Uncertainty Estimation with Application to Mobile Robots

Authors: Ersin Das, Joel W. Burdick

Abstract: Model uncertainty poses a significant challenge to the implementation of safety-critical control systems. With this as motivation, this paper proposes a safe control design approach that guarantees the robustness of nonlinear feedback systems in the presence of matched or unmatched unmodelled system dynamics and external disturbances. Our approach couples control barrier functions (CBFs) with a ne… ▽ More Model uncertainty poses a significant challenge to the implementation of safety-critical control systems. With this as motivation, this paper proposes a safe control design approach that guarantees the robustness of nonlinear feedback systems in the presence of matched or unmatched unmodelled system dynamics and external disturbances. Our approach couples control barrier functions (CBFs) with a new uncertainty/disturbance estimator to ensure robust safety against input and state-dependent model uncertainties. We prove upper bounds on the estimator's error and estimated outputs. We use an uncertainty estimator-based composite feedback control law to adaptively improve robust control performance under hard safety constraints by compensating for the matched uncertainty. Then, we robustify existing CBF constraints with this uncertainty estimate and the estimation error bounds to ensure robust safety via a quadratic program (CBF-QP). We also extend our method to higher-order CBFs (HOCBFs) to achieve safety under unmatched uncertainty, which causes relative degree differences with respect to control input and disturbance. We assume the relative degree difference is at most one, resulting in a second-order cone (SOC) condition. The proposed robust HOCBFs method is demonstrated in a simulation of an uncertain elastic actuator control problem. Finally, the efficacy of our method is experimentally demonstrated on a tracked robot with slope-induced matched and unmatched perturbations. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2310.05865 [pdf, other]

A Learning-Based Framework for Safe Human-Robot Collaboration with Multiple Backup Control Barrier Functions

Authors: Neil C. Janwani, Ersin Daş, Thomas Touma, Skylar X. Wei, Tamas G. Molnar, Joel W. Burdick

Abstract: Ensuring robot safety in complex environments is a difficult task due to actuation limits, such as torque bounds. This paper presents a safety-critical control framework that leverages learning-based switching between multiple backup controllers to formally guarantee safety under bounded control inputs while satisfying driver intention. By leveraging backup controllers designed to uphold safety an… ▽ More Ensuring robot safety in complex environments is a difficult task due to actuation limits, such as torque bounds. This paper presents a safety-critical control framework that leverages learning-based switching between multiple backup controllers to formally guarantee safety under bounded control inputs while satisfying driver intention. By leveraging backup controllers designed to uphold safety and input constraints, backup control barrier functions (BCBFs) construct implicitly defined control invariance sets via a feasible quadratic program (QP). However, BCBF performance largely depends on the design and conservativeness of the chosen backup controller, especially in our setting of human-driven vehicles in complex, e.g, off-road, conditions. While conservativeness can be reduced by using multiple backup controllers, determining when to switch is an open problem. Consequently, we develop a broadcast scheme that estimates driver intention and integrates BCBFs with multiple backup strategies for human-robot interaction. An LSTM classifier uses data inputs from the robot, human, and safety algorithms to continually choose a backup controller in real-time. We demonstrate our method's efficacy on a dual-track robot in obstacle avoidance scenarios. Our framework guarantees robot safety while adhering to driver intention. △ Less

Submitted 7 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted to the International Conference on Robotics and Automation 2024

arXiv:2309.08766 [pdf, other]

The Fractal Hand-II: Reviving a Classic Mechanism for Contemporary Grasping Challenges

Authors: Malcolm G. A. Tisdale, Joel W. Burdick

Abstract: This paper, and its companion, propose a new fractal robotic gripper, drawing inspiration from the century-old Fractal Vise. The unusual synergistic properties allow it to passively conform to diverse objects using only one actuator. Designed to be easily integrated with prevailing parallel jaw grippers, it alleviates the complexities tied to perception and grasp planning, especially when dealing… ▽ More This paper, and its companion, propose a new fractal robotic gripper, drawing inspiration from the century-old Fractal Vise. The unusual synergistic properties allow it to passively conform to diverse objects using only one actuator. Designed to be easily integrated with prevailing parallel jaw grippers, it alleviates the complexities tied to perception and grasp planning, especially when dealing with unpredictable object poses and geometries. We build on the foundational principles of the Fractal Vise to a broader class of gripping mechanisms, and also address the limitations that had led to its obscurity. Two Fractal Fingers, coupled by a closing actuator, can form an adaptive and synergistic Fractal Hand. We articulate a design methodology for low cost, easy to fabricate, large workspace, and compliant Fractal Fingers. The companion paper delves into the kinematics and grasping properties of a specific class of Fractal Fingers and Hands. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: This paper is prepared for ICRA 2024

arXiv:2303.03658 [pdf, ps, other]

An Active Learning Based Robot Kinematic Calibration Framework Using Gaussian Processes

Authors: Ersin Daş, Joel W. Burdick

Abstract: Future NASA lander missions to icy moons will require completely automated, accurate, and data efficient calibration methods for the robot manipulator arms that sample icy terrains in the lander's vicinity. To support this need, this paper presents a Gaussian Process (GP) approach to the classical manipulator kinematic calibration process. Instead of identifying a corrected set of Denavit-Hartenbe… ▽ More Future NASA lander missions to icy moons will require completely automated, accurate, and data efficient calibration methods for the robot manipulator arms that sample icy terrains in the lander's vicinity. To support this need, this paper presents a Gaussian Process (GP) approach to the classical manipulator kinematic calibration process. Instead of identifying a corrected set of Denavit-Hartenberg kinematic parameters, a set of GPs models the residual kinematic error of the arm over the workspace. More importantly, this modeling framework allows a Gaussian Process Upper Confident Bound (GP-UCB) algorithm to efficiently and adaptively select the calibration's measurement points so as to minimize the number of experiments, and therefore minimize the time needed for recalibration. The method is demonstrated in simulation on a simple 2-DOF arm, a 6 DOF arm whose geometry is a candidate for a future NASA mission, and a 7 DOF Barrett WAM arm. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.01614 [pdf, other]

doi 10.55417/fr.2024006

STEP: Stochastic Traversability Evaluation and Planning for Risk-Aware Off-road Navigation; Results from the DARPA Subterranean Challenge

Authors: Anushri Dixit, David D. Fan, Kyohei Otsu, Sharmita Dey, Ali-Akbar Agha-Mohammadi, Joel W. Burdick

Abstract: Although autonomy has gained widespread usage in structured and controlled environments, robotic autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, rubble, and other post-disaster sites pose unique and challenging problems for autonomous navigation. Based on our participation in the DARPA Sub… ▽ More Although autonomy has gained widespread usage in structured and controlled environments, robotic autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, rubble, and other post-disaster sites pose unique and challenging problems for autonomous navigation. Based on our participation in the DARPA Subterranean Challenge, we propose an approach to improve autonomous traversal of robots in subterranean environments that are perceptually degraded and completely unknown through a traversability and planning framework called STEP (Stochastic Traversability Evaluation and Planning). We present 1) rapid uncertainty-aware mapping and traversability evaluation, 2) tail risk assessment using the Conditional Value-at-Risk (CVaR), 3) efficient risk and constraint-aware kinodynamic motion planning using sequential quadratic programming-based (SQP) model predictive control (MPC), 4) fast recovery behaviors to account for unexpected scenarios that may cause failure, and 5) risk-based gait adaptation for quadrupedal robots. We illustrate and validate extensive results from our experiments on wheeled and legged robotic platforms in field studies at the Valentine Cave, CA (cave environment), Kentucky Underground, KY (mine environment), and Louisville Mega Cavern, KY (final competition site for the DARPA Subterranean Challenge with tunnel, urban, and cave environments). △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2103.02828

Journal ref: Field Robotics, 4, 2024, 182-210

arXiv:2302.13687 [pdf, other]

FRoGGeR: Fast Robust Grasp Generation via the Min-Weight Metric

Authors: Albert H. Li, Preston Culbertson, Joel W. Burdick, Aaron D. Ames

Abstract: Many approaches to grasp synthesis optimize analytic quality metrics that measure grasp robustness based on finger placements and local surface geometry. However, generating feasible dexterous grasps by optimizing these metrics is slow, often taking minutes. To address this issue, this paper presents FRoGGeR: a method that quickly generates robust precision grasps using the min-weight metric, a no… ▽ More Many approaches to grasp synthesis optimize analytic quality metrics that measure grasp robustness based on finger placements and local surface geometry. However, generating feasible dexterous grasps by optimizing these metrics is slow, often taking minutes. To address this issue, this paper presents FRoGGeR: a method that quickly generates robust precision grasps using the min-weight metric, a novel, almost-everywhere differentiable approximation of the classical epsilon grasp metric. The min-weight metric is simple and interpretable, provides a reasonable measure of grasp robustness, and admits numerically efficient gradients for smooth optimization. We leverage these properties to rapidly synthesize collision-free robust grasps - typically in less than a second. FRoGGeR can refine the candidate grasps generated by other methods (heuristic, data-driven, etc.) and is compatible with many object representations (SDFs, meshes, etc.). We study FRoGGeR's performance on over 40 objects drawn from the YCB dataset, outperforming a competitive baseline in computation time, feasibility rate of grasp synthesis, and picking success in simulation. We conclude that FRoGGeR is fast: it has a median synthesis time of 0.834s over hundreds of experiments. △ Less

Submitted 24 July, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted at IROS 2023. The arXiv version contains the appendix, which does not appear in the conference version

arXiv:2212.06253 [pdf, other]

Learning Disturbances Online for Risk-Aware Control: Risk-Aware Flight with Less Than One Minute of Data

Authors: Prithvi Akella, Skylar X. Wei, Joel W. Burdick, Aaron D. Ames

Abstract: Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the ris… ▽ More Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the risk-aware controls community. Second, we model the norm of the state discrepancy between the model and the true system evolution as a scalar-valued stochastic process and determine an upper bound to its Surface-at-Risk via Gaussian Process Regression. Third, we provide theoretical results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with respect to the data sets collected during system operation. Finally, we experimentally verify our procedure by augmenting a drone's controller and highlight performance increases achieved via our risk-aware approach after collecting less than a minute of operating data. △ Less

Submitted 12 December, 2022; originally announced December 2022.

arXiv:2212.00278 [pdf, other]

Adaptive Conformal Prediction for Motion Planning among Dynamic Agents

Authors: Anushri Dixit, Lars Lindemann, Skylar Wei, Matthew Cleaveland, George J. Pappas, Joel W. Burdick

Abstract: This paper proposes an algorithm for motion planning among dynamic agents using adaptive conformal prediction. We consider a deterministic control system and use trajectory predictors to predict the dynamic agents' future motion, which is assumed to follow an unknown distribution. We then leverage ideas from adaptive conformal prediction to dynamically quantify prediction uncertainty from an onlin… ▽ More This paper proposes an algorithm for motion planning among dynamic agents using adaptive conformal prediction. We consider a deterministic control system and use trajectory predictors to predict the dynamic agents' future motion, which is assumed to follow an unknown distribution. We then leverage ideas from adaptive conformal prediction to dynamically quantify prediction uncertainty from an online data stream. Particularly, we provide an online algorithm uses delayed agent observations to obtain uncertainty sets for multistep-ahead predictions with probabilistic coverage. These uncertainty sets are used within a model predictive controller to safely navigate among dynamic agents. While most existing data-driven prediction approached quantify prediction uncertainty heuristically, we quantify the true prediction uncertainty in a distribution-free, adaptive manner that even allows to capture changes in prediction quality and the agents' motion. We empirically evaluate of our algorithm on a simulation case studies where a drone avoids a flying frisbee. △ Less

Submitted 30 November, 2022; originally announced December 2022.

arXiv:2204.09833 [pdf, other]

Sample-Based Bounds for Coherent Risk Measures: Applications to Policy Synthesis and Verification

Authors: Prithvi Akella, Anushri Dixit, Mohamadreza Ahmadi, Joel W. Burdick, Aaron D. Ames

Abstract: The dramatic increase of autonomous systems subject to variable environments has given rise to the pressing need to consider risk in both the synthesis and verification of policies for these systems. This paper aims to address a few problems regarding risk-aware verification and policy synthesis, by first developing a sample-based method to bound the risk measure evaluation of a random variable wh… ▽ More The dramatic increase of autonomous systems subject to variable environments has given rise to the pressing need to consider risk in both the synthesis and verification of policies for these systems. This paper aims to address a few problems regarding risk-aware verification and policy synthesis, by first developing a sample-based method to bound the risk measure evaluation of a random variable whose distribution is unknown. These bounds permit us to generate high-confidence verification statements for a large class of robotic systems. Second, we develop a sample-based method to determine solutions to non-convex optimization problems that outperform a large fraction of the decision space of possible solutions. Both sample-based approaches then permit us to rapidly synthesize risk-aware policies that are guaranteed to achieve a minimum level of system performance. To showcase our approach in simulation, we verify a cooperative multi-agent system and develop a risk-aware controller that outperforms the system's baseline controller. We also mention how our approach can be extended to account for any $g$-entropic risk measure - the subset of coherent risk measures on which we focus. △ Less

Submitted 20 April, 2022; originally announced April 2022.

arXiv:2204.09596 [pdf, other]

doi 10.1016/j.artint.2023.104018

Risk-Averse Receding Horizon Motion Planning for Obstacle Avoidance using Coherent Risk Measures

Authors: Anushri Dixit, Mohamadreza Ahmadi, Joel W. Burdick

Abstract: This paper studies the problem of risk-averse receding horizon motion planning for agents with uncertain dynamics, in the presence of stochastic, dynamic obstacles. We propose a model predictive control (MPC) scheme that formulates the obstacle avoidance constraint using coherent risk measures. To handle disturbances, or process noise, in the state dynamics, the state constraints are tightened in… ▽ More This paper studies the problem of risk-averse receding horizon motion planning for agents with uncertain dynamics, in the presence of stochastic, dynamic obstacles. We propose a model predictive control (MPC) scheme that formulates the obstacle avoidance constraint using coherent risk measures. To handle disturbances, or process noise, in the state dynamics, the state constraints are tightened in a risk-aware manner to provide a disturbance feedback policy. We also propose a waypoint following algorithm that uses the proposed MPC scheme for discrete distributions and prove its risk-sensitive recursive feasibility while guaranteeing finite-time task completion. We further investigate some commonly used coherent risk metrics, namely, conditional value-at-risk (CVaR), entropic value-at-risk (EVaR), and g-entropic risk measures, and propose a tractable incorporation within MPC. We illustrate our framework via simulation studies. △ Less

Submitted 28 September, 2023; v1 submitted 20 April, 2022; originally announced April 2022.

Comments: Accepted to Artificial Intelligence Journal, Special Issue on Risk-aware Autonomous Systems: Theory and Practice. arXiv admin note: text overlap with arXiv:2011.11211

Journal ref: Artificial Intelligence, 325, 2023, 104018

arXiv:2203.14913 [pdf, other]

Moving Obstacle Avoidance: a Data-Driven Risk-Aware Approach

Authors: Skylar X. Wei, Anushri Dixit, Shashank Tomar, Joel W. Burdick

Abstract: This paper proposes a new structured method for a moving agent to predict the paths of dynamically moving obstacles and avoid them using a risk-aware model predictive control (MPC) scheme. Given noisy measurements of the a priori unknown obstacle trajectory, a bootstrapping technique predicts a set of obstacle trajectories. The bootstrapped predictions are incorporated in the MPC optimization usin… ▽ More This paper proposes a new structured method for a moving agent to predict the paths of dynamically moving obstacles and avoid them using a risk-aware model predictive control (MPC) scheme. Given noisy measurements of the a priori unknown obstacle trajectory, a bootstrapping technique predicts a set of obstacle trajectories. The bootstrapped predictions are incorporated in the MPC optimization using a risk-aware methodology so as to provide probabilistic guarantees on obstacle avoidance. We validate our methods using simulations of a 3-dimensional multi-rotor drone that avoids various moving obstacles, such as a thrown ball and a frisbee with air drag. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: This is prepared for IEEE Control Systems Letters (L-CSS) 2022

arXiv:2203.12062 [pdf, other]

Distributionally Robust Model Predictive Control with Total Variation Distance

Authors: Anushri Dixit, Mohamadreza Ahmadi, Joel W. Burdick

Abstract: This paper studies the problem of distributionally robust model predictive control (MPC) using total variation distance ambiguity sets. For a discrete-time linear system with additive disturbances, we provide a conditional value-at-risk reformulation of the MPC optimization problem that is distributionally robust in the expected cost and chance constraints. The distributionally robust chance const… ▽ More This paper studies the problem of distributionally robust model predictive control (MPC) using total variation distance ambiguity sets. For a discrete-time linear system with additive disturbances, we provide a conditional value-at-risk reformulation of the MPC optimization problem that is distributionally robust in the expected cost and chance constraints. The distributionally robust chance constraint is over-approximated as a simpler, tightened chance constraint that reduces the computational burden. Numerical experiments support our results on probabilistic guarantees and computational efficiency. △ Less

Submitted 24 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: Accepted to LCSS

arXiv:2110.10341 [pdf, other]

Quadrotor Trajectory Tracking with Learned Dynamics: Joint Koopman-based Learning of System Models and Function Dictionaries

Authors: Carl Folkestad, Skylar X. Wei, Joel W. Burdick

Abstract: Nonlinear dynamical effects are crucial to the operation of many agile robotic systems. Koopman-based model learning methods can capture these nonlinear dynamical system effects in higher dimensional lifted bilinear models that are amenable to optimal control. However, standard methods that lift the system state using a fixed function dictionary before model learning result in high dimensional mod… ▽ More Nonlinear dynamical effects are crucial to the operation of many agile robotic systems. Koopman-based model learning methods can capture these nonlinear dynamical system effects in higher dimensional lifted bilinear models that are amenable to optimal control. However, standard methods that lift the system state using a fixed function dictionary before model learning result in high dimensional models that are intractable for real time control. This paper presents a novel method that jointly learns a function dictionary and lifted bilinear model purely from data by incorporating the Koopman model in a neural network architecture. Nonlinear MPC design utilizing the learned model can be performed readily. We experimentally realized this method on a multirotor drone for agile trajectory tracking at low altitudes where the aerodynamic ground effect influences the system's behavior. Experimental results demonstrate that the learning-based controller achieves similar performance as a nonlinear MPC based on a nominal dynamics model in medium altitude. However, our learning-based system can reliably track trajectories in near-ground flight regimes while the nominal controller crashes due to unmodeled dynamical effects that are captured by our method. △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: arXiv admin note: text overlap with arXiv:2105.08036

arXiv:2105.08036 [pdf, other]

Koopman NMPC: Koopman-based Learning and Nonlinear Model Predictive Control of Control-affine Systems

Authors: Carl Folkestad, Joel W. Burdick

Abstract: Koopman-based learning methods can potentially be practical and powerful tools for dynamical robotic systems. However, common methods to construct Koopman representations seek to learn lifted linear models that cannot capture nonlinear actuation effects inherent in many robotic systems. This paper presents a learning and control methodology that is a first step towards overcoming this limitation.… ▽ More Koopman-based learning methods can potentially be practical and powerful tools for dynamical robotic systems. However, common methods to construct Koopman representations seek to learn lifted linear models that cannot capture nonlinear actuation effects inherent in many robotic systems. This paper presents a learning and control methodology that is a first step towards overcoming this limitation. Using the Koopman canonical transform, control-affine dynamics can be expressed by a lifted bilinear model. The learned model is used for nonlinear model predictive control (NMPC) design where the bilinear structure can be exploited to improve computational efficiency. The benefits for control-affine dynamics compared to existing Koopman-based methods are highlighted through an example of a simulated planar quadrotor. Prediction error is greatly reduced and closed loop performance similar to NMPC with full model knowledge is achieved. △ Less

Submitted 17 May, 2021; originally announced May 2021.

arXiv:2103.14727 [pdf, other]

Risk-Averse Stochastic Shortest Path Planning

Authors: Mohamadreza Ahmadi, Anushri Dixit, Joel W. Burdick, Aaron D. Ames

Abstract: We consider the stochastic shortest path planning problem in MDPs, i.e., the problem of designing policies that ensure reaching a goal state from a given initial state with minimum accrued cost. In order to account for rare but important realizations of the system, we consider a nested dynamic coherent risk total cost functional rather than the conventional risk-neutral total expected cost. Under… ▽ More We consider the stochastic shortest path planning problem in MDPs, i.e., the problem of designing policies that ensure reaching a goal state from a given initial state with minimum accrued cost. In order to account for rare but important realizations of the system, we consider a nested dynamic coherent risk total cost functional rather than the conventional risk-neutral total expected cost. Under some assumptions, we show that optimal, stationary, Markovian policies exist and can be found via a special Bellman's equation. We propose a computational technique based on difference convex programs (DCPs) to find the associated value functions and therefore the risk-averse policies. A rover navigation MDP is used to illustrate the proposed methodology with conditional-value-at-risk (CVaR) and entropic-value-at-risk (EVaR) coherent risk measures. △ Less

Submitted 26 March, 2021; originally announced March 2021.

arXiv:2103.03388 [pdf, other]

Limits of Probabilistic Safety Guarantees when Considering Human Uncertainty

Authors: Richard Cheng, Richard M. Murray, Joel W. Burdick

Abstract: When autonomous robots interact with humans, such as during autonomous driving, explicit safety guarantees are crucial in order to avoid potentially life-threatening accidents. Many data-driven methods have explored learning probabilistic bounds over human agents' trajectories (i.e. confidence tubes that contain trajectories with probability $δ$), which can then be used to guarantee safety with pr… ▽ More When autonomous robots interact with humans, such as during autonomous driving, explicit safety guarantees are crucial in order to avoid potentially life-threatening accidents. Many data-driven methods have explored learning probabilistic bounds over human agents' trajectories (i.e. confidence tubes that contain trajectories with probability $δ$), which can then be used to guarantee safety with probability $1-δ$. However, almost all existing works consider $δ\geq 0.001$. The purpose of this paper is to argue that (1) in safety-critical applications, it is necessary to provide safety guarantees with $δ< 10^{-8}$, and (2) current learning-based methods are ill-equipped to compute accurate confidence bounds at such low $δ$. Using human driving data (from the highD dataset), as well as synthetically generated data, we show that current uncertainty models use inaccurate distributional assumptions to describe human behavior and/or require infeasible amounts of data to accurately learn confidence bounds for $δ\leq 10^{-8}$. These two issues result in unreliable confidence bounds, which can have dangerous implications if deployed on safety-critical systems. △ Less

Submitted 24 March, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: ICRA 2021

arXiv:2102.09119 [pdf, ps, other]

Learning Invariant Representation of Tasks for Robust Surgical State Estimation

Authors: Yidan Qin, Max Allan, Yisong Yue, Joel W. Burdick, Mahdi Azizian

Abstract: Surgical state estimators in robot-assisted surgery (RAS) - especially those trained via learning techniques - rely heavily on datasets that capture surgeon actions in laboratory or real-world surgical tasks. Real-world RAS datasets are costly to acquire, are obtained from multiple surgeons who may use different surgical strategies, and are recorded under uncontrolled conditions in highly complex… ▽ More Surgical state estimators in robot-assisted surgery (RAS) - especially those trained via learning techniques - rely heavily on datasets that capture surgeon actions in laboratory or real-world surgical tasks. Real-world RAS datasets are costly to acquire, are obtained from multiple surgeons who may use different surgical strategies, and are recorded under uncontrolled conditions in highly complex environments. The combination of high diversity and limited data calls for new learning methods that are robust and invariant to operating conditions and surgical techniques. We propose StiseNet, a Surgical Task Invariance State Estimation Network with an invariance induction framework that minimizes the effects of variations in surgical technique and operating environments inherent to RAS datasets. StiseNet's adversarial architecture learns to separate nuisance factors from information needed for surgical state estimation. StiseNet is shown to outperform state-of-the-art state estimation methods on three datasets (including a new real-world RAS dataset: HERNIA-20). △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: Accepted to IEEE Robotics & Automation Letters

arXiv:2011.11211 [pdf, other]

Risk-Sensitive Motion Planning using Entropic Value-at-Risk

Authors: Anushri Dixit, Mohamadreza Ahmadi, Joel W. Burdick

Abstract: We consider the problem of risk-sensitive motion planning in the presence of randomly moving obstacles. To this end, we adopt a model predictive control (MPC) scheme and pose the obstacle avoidance constraint in the MPC problem as a distributionally robust constraint with a KL divergence ambiguity set. This constraint is the dual representation of the Entropic Value-at-Risk (EVaR). Building upon t… ▽ More We consider the problem of risk-sensitive motion planning in the presence of randomly moving obstacles. To this end, we adopt a model predictive control (MPC) scheme and pose the obstacle avoidance constraint in the MPC problem as a distributionally robust constraint with a KL divergence ambiguity set. This constraint is the dual representation of the Entropic Value-at-Risk (EVaR). Building upon this viewpoint, we propose an algorithm to follow waypoints and discuss its feasibility and completion in finite time. We compare the policies obtained using EVaR with those obtained using another common coherent risk measure, Conditional Value-at-Risk (CVaR), via numerical experiments for a 2D system. We also implement the waypoint following algorithm on a 3D quadcopter simulation. △ Less

Submitted 10 April, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: Accepted to 2021 European Control Conference (ECC)

Journal ref: European Control Conference (ECC) 2021

arXiv:2011.04812 [pdf, other]

doi 10.1109/ICRA48506.2021.9560840

ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes

Authors: Kejun Li, Maegan Tucker, Erdem Bıyık, Ellen Novoseller, Joel W. Burdick, Yanan Sui, Dorsa Sadigh, Yisong Yue, Aaron D. Ames

Abstract: Characterizing what types of exoskeleton gaits are comfortable for users, and understanding the science of walking more generally, require recovering a user's utility landscape. Learning these landscapes is challenging, as walking trajectories are defined by numerous gait parameters, data collection from human trials is expensive, and user safety and comfort must be ensured. This work proposes the… ▽ More Characterizing what types of exoskeleton gaits are comfortable for users, and understanding the science of walking more generally, require recovering a user's utility landscape. Learning these landscapes is challenging, as walking trajectories are defined by numerous gait parameters, data collection from human trials is expensive, and user safety and comfort must be ensured. This work proposes the Region of Interest Active Learning (ROIAL) framework, which actively learns each user's underlying utility function over a region of interest that ensures safety and comfort. ROIAL learns from ordinal and preference feedback, which are more reliable feedback mechanisms than absolute numerical scores. The algorithm's performance is evaluated both in simulation and experimentally for three non-disabled subjects walking inside of a lower-body exoskeleton. ROIAL learns Bayesian posteriors that predict each exoskeleton user's utility landscape across four exoskeleton gait parameters. The algorithm discovers both commonalities and discrepancies across users' gait preferences and identifies the gait parameters that most influenced user feedback. These results demonstrate the feasibility of recovering gait utility landscapes from limited human trials. △ Less

Submitted 30 March, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

Comments: 6 pages + 1 page of references; 7 figures; To Appear at ICRA 2021

arXiv:2009.11937 [pdf, ps, other]

daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery

Authors: Yidan Qin, Seyedshams Feyzabadi, Max Allan, Joel W. Burdick, Mahdi Azizian

Abstract: This paper presents a technique to concurrently and jointly predict the future trajectories of surgical instruments and the future state(s) of surgical subtasks in robot-assisted surgeries (RAS) using multiple input sources. Such predictions are a necessary first step towards shared control and supervised autonomy of surgical subtasks. Minute-long surgical subtasks, such as suturing or ultrasound… ▽ More This paper presents a technique to concurrently and jointly predict the future trajectories of surgical instruments and the future state(s) of surgical subtasks in robot-assisted surgeries (RAS) using multiple input sources. Such predictions are a necessary first step towards shared control and supervised autonomy of surgical subtasks. Minute-long surgical subtasks, such as suturing or ultrasound scanning, often have distinguishable tool kinematics and visual features, and can be described as a series of fine-grained states with transition schematics. We propose daVinciNet - an end-to-end dual-task model for robot motion and surgical state predictions. daVinciNet performs concurrent end-effector trajectory and surgical state predictions using features extracted from multiple data streams, including robot kinematics, endoscopic vision, and system events. We evaluate our proposed model on an extended Robotic Intra-Operative Ultrasound (RIOUS+) imaging dataset collected on a da Vinci Xi surgical system and the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our model achieves up to 93.85% short-term (0.5s) and 82.11% long-term (2s) state prediction accuracy, as well as 1.07mm short-term and 5.62mm long-term trajectory prediction error. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: Accepted to IROS 2020

arXiv:2004.05273 [pdf, other]

Safe Multi-Agent Interaction through Robust Control Barrier Functions with Learned Uncertainties

Authors: Richard Cheng, Mohammad Javad Khojasteh, Aaron D. Ames, Joel W. Burdick

Abstract: Robots operating in real world settings must navigate and maintain safety while interacting with many heterogeneous agents and obstacles. Multi-Agent Control Barrier Functions (CBF) have emerged as a computationally efficient tool to guarantee safety in multi-agent environments, but they assume perfect knowledge of both the robot dynamics and other agents' dynamics. While knowledge of the robot's… ▽ More Robots operating in real world settings must navigate and maintain safety while interacting with many heterogeneous agents and obstacles. Multi-Agent Control Barrier Functions (CBF) have emerged as a computationally efficient tool to guarantee safety in multi-agent environments, but they assume perfect knowledge of both the robot dynamics and other agents' dynamics. While knowledge of the robot's dynamics might be reasonably well known, the heterogeneity of agents in real-world environments means there will always be considerable uncertainty in our prediction of other agents' dynamics. This work aims to learn high-confidence bounds for these dynamic uncertainties using Matrix-Variate Gaussian Process models, and incorporates them into a robust multi-agent CBF framework. We transform the resulting min-max robust CBF into a quadratic program, which can be efficiently solved in real time. We verify via simulation results that the nominal multi-agent CBF is often violated during agent interactions, whereas our robust formulation maintains safety with a much higher probability and adapts to learned uncertainties △ Less

Submitted 22 September, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Journal ref: 59th IEEE Conference on Decision and Control (CDC 2020)

arXiv:2003.09267 [pdf, other]

Barrier Functions for Multiagent-POMDPs with DTL Specifications

Authors: Mohamadreza Ahmadi, Andrew Singletary, Joel W. Burdick, Aaron D. Ames

Abstract: Multi-agent partially observable Markov decision processes (MPOMDPs) provide a framework to represent heterogeneous autonomous agents subject to uncertainty and partial observation. In this paper, given a nominal policy provided by a human operator or a conventional planning method, we propose a technique based on barrier functions to design a minimally interfering safety-shield ensuring satisfact… ▽ More Multi-agent partially observable Markov decision processes (MPOMDPs) provide a framework to represent heterogeneous autonomous agents subject to uncertainty and partial observation. In this paper, given a nominal policy provided by a human operator or a conventional planning method, we propose a technique based on barrier functions to design a minimally interfering safety-shield ensuring satisfaction of high-level specifications in terms of linear distribution temporal logic (LDTL). To this end, we use sufficient and necessary conditions for the invariance of a given set based on discrete-time barrier functions (DTBFs) and formulate sufficient conditions for finite time DTBF to study finite time convergence to a set. We then show that different LDTL mission/safety specifications can be cast as a set of invariance or finite time reachability problems. We demonstrate that the proposed method for safety-shield synthesis can be implemented online by a sequence of one-step greedy algorithms. We demonstrate the efficacy of the proposed method using experiments involving a team of robots. △ Less

Submitted 18 March, 2020; originally announced March 2020.

Comments: arXiv admin note: text overlap with arXiv:1903.07823

arXiv:2003.06495 [pdf, other]

Human Preference-Based Learning for High-dimensional Optimization of Exoskeleton Walking Gaits

Authors: Maegan Tucker, Myra Cheng, Ellen Novoseller, Richard Cheng, Yisong Yue, Joel W. Burdick, Aaron D. Ames

Abstract: Optimizing lower-body exoskeleton walking gaits for user comfort requires understanding users' preferences over a high-dimensional gait parameter space. However, existing preference-based learning methods have only explored low-dimensional domains due to computational limitations. To learn user preferences in high dimensions, this work presents LineCoSpar, a human-in-the-loop preference-based fram… ▽ More Optimizing lower-body exoskeleton walking gaits for user comfort requires understanding users' preferences over a high-dimensional gait parameter space. However, existing preference-based learning methods have only explored low-dimensional domains due to computational limitations. To learn user preferences in high dimensions, this work presents LineCoSpar, a human-in-the-loop preference-based framework that enables optimization over many parameters by iteratively exploring one-dimensional subspaces. Additionally, this work identifies gait attributes that characterize broader preferences across users. In simulations and human trials, we empirically verify that LineCoSpar is a sample-efficient approach for high-dimensional preference optimization. Our analysis of the experimental data reveals a correspondence between human preferences and objective measures of dynamicity, while also highlighting differences in the utility functions underlying individual users' gait preferences. This result has implications for exoskeleton gait synthesis, an active field with applications to clinical use and patient rehabilitation. △ Less

Submitted 8 August, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

Comments: 8 pages, 9 figures, 2 tables, to appear at IROS 2020

arXiv:2002.02921 [pdf, other]

Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Authors: Yidan Qin, Sahba Aghajani Pedram, Seyedshams Feyzabadi, Max Allan, A. Jonathan McLeod, Joel W. Burdick, Mahdi Azizian

Abstract: Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The ob… ▽ More Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset. △ Less

Submitted 7 February, 2020; originally announced February 2020.

Comments: Accepted to ICRA 2020

arXiv:2001.07679 [pdf, other]

Stochastic Finite State Control of POMDPs with LTL Specifications

Authors: Mohamadreza Ahmadi, Rangoli Sharan, Joel W. Burdick

Abstract: Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty and imperfect sensing, e.g. robot manipulation and self-driving cars. However, optimal control of POMDPs is notoriously intractable. This paper considers the quantitative problem of synthesizing sub-optimal stochastic finite state controllers (sFSCs) for POMDPs such… ▽ More Partially observable Markov decision processes (POMDPs) provide a modeling framework for autonomous decision making under uncertainty and imperfect sensing, e.g. robot manipulation and self-driving cars. However, optimal control of POMDPs is notoriously intractable. This paper considers the quantitative problem of synthesizing sub-optimal stochastic finite state controllers (sFSCs) for POMDPs such that the probability of satisfying a set of high-level specifications in terms of linear temporal logic (LTL) formulae is maximized. We begin by casting the latter problem into an optimization and use relaxations based on the Poisson equation and McCormick envelopes. Then, we propose an stochastic bounded policy iteration algorithm, leading to a controlled growth in sFSC size and an any time algorithm, where the performance of the controller improves with successive iterations, but can be stopped by the user based on time or memory considerations. We illustrate the proposed method by a robot navigation case study. △ Less

Submitted 21 January, 2020; originally announced January 2020.

arXiv:1909.10209 [pdf, other]

Energy-Efficient Motion Planning for Multi-Modal Hybrid Locomotion

Authors: H. J. Terry Suh, Xiaobin Xiong, Andrew Singletary, Aaron D. Ames, Joel W. Burdick

Abstract: Hybrid locomotion, which combines multiple modalities of locomotion within a single robot, enables robots to carry out complex tasks in diverse environments. This paper presents a novel method for planning multi-modal locomotion trajectories using approximate dynamic programming. We formulate this problem as a shortest-path search through a state-space graph, where the edge cost is assigned as opt… ▽ More Hybrid locomotion, which combines multiple modalities of locomotion within a single robot, enables robots to carry out complex tasks in diverse environments. This paper presents a novel method for planning multi-modal locomotion trajectories using approximate dynamic programming. We formulate this problem as a shortest-path search through a state-space graph, where the edge cost is assigned as optimal transport cost along each segment. This cost is approximated from batches of offline trajectory optimizations, which allows the complex effects of vehicle under-actuation and dynamic constraints to be approximately captured in a tractable way. Our method is illustrated on a hybrid double-integrator, an amphibious robot, and a flying-driving drone, showing the practicality of the approach. △ Less

Submitted 4 August, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

Comments: Accepted to International Conference on Intelligent Robots and Systems (IROS) 2020

arXiv:1908.01289 [pdf, other]

Dueling Posterior Sampling for Preference-Based Reinforcement Learning

Authors: Ellen R. Novoseller, Yibing Wei, Yanan Sui, Yisong Yue, Joel W. Burdick

Abstract: In preference-based reinforcement learning (RL), an agent interacts with the environment while receiving preferences instead of absolute feedback. While there is increasing research activity in preference-based RL, the design of formal frameworks that admit tractable theoretical analysis remains an open challenge. Building upon ideas from preference-based bandit learning and posterior sampling in… ▽ More In preference-based reinforcement learning (RL), an agent interacts with the environment while receiving preferences instead of absolute feedback. While there is increasing research activity in preference-based RL, the design of formal frameworks that admit tractable theoretical analysis remains an open challenge. Building upon ideas from preference-based bandit learning and posterior sampling in RL, we present DUELING POSTERIOR SAMPLING (DPS), which employs preference-based posterior sampling to learn both the system dynamics and the underlying utility function that governs the preference feedback. As preference feedback is provided on trajectories rather than individual state-action pairs, we develop a Bayesian approach for the credit assignment problem, translating preferences to a posterior distribution over state-action reward models. We prove an asymptotic Bayesian no-regret rate for DPS with a Bayesian linear regression credit assignment model. This is the first regret guarantee for preference-based RL to our knowledge. We also discuss possible avenues for extending the proof methodology to other credit assignment models. Finally, we evaluate the approach empirically, showing competitive performance against existing baselines. △ Less

Submitted 29 June, 2020; v1 submitted 4 August, 2019; originally announced August 2019.

Comments: To appear in Conference on Uncertainty in Artificial Intelligence (UAI), 2020. 9 pages before references and appendix; 51 pages total; 7 figures; 4 tables. This replacement incorporates reviewer comments, and in comparison to version 1, extends the theoretical and empirical analyses and adds mathematical detail. Code: https://github.com/ernovoseller/DuelingPosteriorSampling

arXiv:1905.05380 [pdf, other]

Control Regularization for Reduced Variance Reinforcement Learning

Authors: Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel W. Burdick

Abstract: Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of t… ▽ More Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Comments: Appearing in ICML 2019

arXiv:1903.08792 [pdf, other]

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Authors: Richard Cheng, Gabor Orosz, Richard M. Murray, Joel W. Burdick

Abstract: Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2)… ▽ More Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process. △ Less

Submitted 20 March, 2019; originally announced March 2019.

Comments: Published in AAAI 2019

arXiv:1903.07823 [pdf, other]

Safe Policy Synthesis in Multi-Agent POMDPs via Discrete-Time Barrier Functions

Authors: Mohamadreza Ahmadi, Andrew Singletary, Joel W. Burdick, Aaron D. Ames

Abstract: A multi-agent partially observable Markov decision process (MPOMDP) is a modeling paradigm used for high-level planning of heterogeneous autonomous agents subject to uncertainty and partial observation. Despite their modeling efficiency, MPOMDPs have not received significant attention in safety-critical settings. In this paper, we use barrier functions to design policies for MPOMDPs that ensure sa… ▽ More A multi-agent partially observable Markov decision process (MPOMDP) is a modeling paradigm used for high-level planning of heterogeneous autonomous agents subject to uncertainty and partial observation. Despite their modeling efficiency, MPOMDPs have not received significant attention in safety-critical settings. In this paper, we use barrier functions to design policies for MPOMDPs that ensure safety. Notably, our method does not rely on discretization of the belief space, or finite memory. To this end, we formulate sufficient and necessary conditions for the safety of a given set based on discrete-time barrier functions (DTBFs) and we demonstrate that our formulation also allows for Boolean compositions of DTBFs for representing more complicated safe sets. We show that the proposed method can be implemented online by a sequence of one-step greedy algorithms as a standalone safe controller or as a safety-filter given a nominal planning policy. We illustrate the efficiency of the proposed methodology based on DTBFs using a high-fidelity simulation of heterogeneous robots. △ Less

Submitted 12 September, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

Comments: 8 pages and 4 figures

arXiv:1806.07555 [pdf, other]

Stagewise Safe Bayesian Optimization with Gaussian Processes

Authors: Yanan Sui, Vincent Zhuang, Joel W. Burdick, Yisong Yue

Abstract: Enforcing safety is a key aspect of many problems pertaining to sequential decision making under uncertainty, which require the decisions made at every step to be both informative of the optimal decision and also safe. For example, we value both efficacy and comfort in medical therapy, and efficiency and safety in robotic control. We consider this problem of optimizing an unknown utility function… ▽ More Enforcing safety is a key aspect of many problems pertaining to sequential decision making under uncertainty, which require the decisions made at every step to be both informative of the optimal decision and also safe. For example, we value both efficacy and comfort in medical therapy, and efficiency and safety in robotic control. We consider this problem of optimizing an unknown utility function with absolute feedback or preference feedback subject to unknown safety constraints. We develop an efficient safe Bayesian optimization algorithm, StageOpt, that separates safe region expansion and utility function maximization into two distinct stages. Compared to existing approaches which interleave between expansion and optimization, we show that StageOpt is more efficient and naturally applicable to a broader class of problems. We provide theoretical guarantees for both the satisfaction of safety constraints as well as convergence to the optimal utility value. We evaluate StageOpt on both a variety of synthetic experiments, as well as in clinical practice. We demonstrate that StageOpt is more effective than existing safe optimization approaches, and is able to safely and effectively optimize spinal cord stimulation therapy in our clinical experiments. △ Less

Submitted 26 January, 2020; v1 submitted 20 June, 2018; originally announced June 2018.

Comments: International Conference on Machine Learning (ICML) 2018

arXiv:1711.07894 [pdf, other]

Quantifying Performance of Bipedal Standing with Multi-channel EMG

Authors: Yanan Sui, Kun ho Kim, Joel W. Burdick

Abstract: Spinal cord stimulation has enabled humans with motor complete spinal cord injury (SCI) to independently stand and recover some lost autonomic function. Quantifying the quality of bipedal standing under spinal stimulation is important for spinal rehabilitation therapies and for new strategies that seek to combine spinal stimulation and rehabilitative robots (such as exoskeletons) in real time feed… ▽ More Spinal cord stimulation has enabled humans with motor complete spinal cord injury (SCI) to independently stand and recover some lost autonomic function. Quantifying the quality of bipedal standing under spinal stimulation is important for spinal rehabilitation therapies and for new strategies that seek to combine spinal stimulation and rehabilitative robots (such as exoskeletons) in real time feedback. To study the potential for automated electromyography (EMG) analysis in SCI, we evaluated the standing quality of paralyzed patients undergoing electrical spinal cord stimulation using both video and multi-channel surface EMG recordings during spinal stimulation therapy sessions. The quality of standing under different stimulation settings was quantified manually by experienced clinicians. By correlating features of the recorded EMG activity with the expert evaluations, we show that multi-channel EMG recording can provide accurate, fast, and robust estimation for the quality of bipedal standing in spinally stimulated SCI patients. Moreover, our analysis shows that the total number of EMG channels needed to effectively predict standing quality can be reduced while maintaining high estimation accuracy, which provides more flexibility for rehabilitation robotic systems to incorporate EMG recordings. △ Less

Submitted 21 November, 2017; originally announced November 2017.

Journal ref: IROS 2017

arXiv:1710.03592 [pdf, ps, other]

Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis

Authors: Kun Li, Joel W. Burdick

Abstract: This work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each high-dimensional task, insufficient to estimate an accurate reward function. Observing that each demonstrator has an inherent reward for each state and the task-specific behaviors mainly depend on a small number of key states, we propose a meta I… ▽ More This work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each high-dimensional task, insufficient to estimate an accurate reward function. Observing that each demonstrator has an inherent reward for each state and the task-specific behaviors mainly depend on a small number of key states, we propose a meta IRL algorithm that first models the reward function for each task as a distribution conditioned on a baseline reward function shared by all tasks and dependent only on the demonstrator, and then finds the most likely reward function in the distribution that explains the task-specific behaviors. We test the method in a simulated environment on path planning tasks with limited demonstrations, and show that the accuracy of the learned reward function is significantly improved. We also apply the method to analyze the motion of a patient under rehabilitation. △ Less

Submitted 12 October, 2017; v1 submitted 7 October, 2017; originally announced October 2017.

Comments: arXiv admin note: text overlap with arXiv:1707.09394

arXiv:1708.07738 [pdf, ps, other]

A Function Approximation Method for Model-based High-Dimensional Inverse Reinforcement Learning

Authors: Kun Li, Joel W. Burdick

Abstract: This works handles the inverse reinforcement learning problem in high-dimensional state spaces, which relies on an efficient solution of model-based high-dimensional reinforcement learning problems. To solve the computationally expensive reinforcement learning problems, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a funct… ▽ More This works handles the inverse reinforcement learning problem in high-dimensional state spaces, which relies on an efficient solution of model-based high-dimensional reinforcement learning problems. To solve the computationally expensive reinforcement learning problems, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function based on the observed human actions for inverse reinforcement learning problems. The time complexity of the proposed method is linearly proportional to the cardinality of the action set, thus it can handle high-dimensional even continuous state spaces efficiently. We test the proposed method in a simulated environment to show its accuracy, and three clinical tasks to show how it can be used to evaluate a doctor's proficiency. △ Less

Submitted 23 August, 2017; originally announced August 2017.

Comments: arXiv admin note: substantial text overlap with arXiv:1707.09394

arXiv:1707.09394 [pdf, ps, other]

Inverse Reinforcement Learning in Large State Spaces via Function Approximation

Authors: Kun Li, Joel W. Burdick

Abstract: This paper introduces a new method for inverse reinforcement learning in large-scale and high-dimensional state spaces. To avoid solving the computationally expensive reinforcement learning problems in reward learning, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function to maximize the likelihood of the observed motio… ▽ More This paper introduces a new method for inverse reinforcement learning in large-scale and high-dimensional state spaces. To avoid solving the computationally expensive reinforcement learning problems in reward learning, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function to maximize the likelihood of the observed motion. The time complexity of the proposed method is linearly proportional to the cardinality of the action set, thus it can handle large state spaces efficiently. We test the proposed method in a simulated environment, and show that it is more accurate than existing methods and significantly better in scalability. We also show that the proposed method can extend many existing methods to high-dimensional state spaces. We then apply the method to evaluating the effect of rehabilitative stimulations on patients with spinal cord injuries based on the observed patient motions. △ Less

Submitted 13 August, 2017; v1 submitted 28 July, 2017; originally announced July 2017.

Comments: Experiment updated

arXiv:1707.09393 [pdf, ps, other]

Online Inverse Reinforcement Learning via Bellman Gradient Iteration

Authors: Kun Li, Joel W. Burdick

Abstract: This paper develops an online inverse reinforcement learning algorithm aimed at efficiently recovering a reward function from ongoing observations of an agent's actions. To reduce the computation time and storage space in reward estimation, this work assumes that each observed action implies a change of the Q-value distribution, and relates the change to the reward function via the gradient of Q-v… ▽ More This paper develops an online inverse reinforcement learning algorithm aimed at efficiently recovering a reward function from ongoing observations of an agent's actions. To reduce the computation time and storage space in reward estimation, this work assumes that each observed action implies a change of the Q-value distribution, and relates the change to the reward function via the gradient of Q-value with respect to reward function parameter. The gradients are computed with a novel Bellman Gradient Iteration method that allows the reward function to be updated whenever a new observation is available. The method's convergence to a local optimum is proved. This work tests the proposed method in two simulated environments, and evaluates the algorithm's performance under a linear reward function and a non-linear reward function. The results show that the proposed algorithm only requires a limited computation time and storage space, but achieves an increasing accuracy as the number of observations grows. We also present a potential application to robot cleaners at home. △ Less

Submitted 28 July, 2017; originally announced July 2017.

Comments: The code and video are available at https://github.com/mestoking/BellmanGradientIteration/ . arXiv admin note: substantial text overlap with arXiv:1707.07767

arXiv:1707.07767 [pdf, ps, other]

Bellman Gradient Iteration for Inverse Reinforcement Learning

Authors: Kun Li, Yanan Sui, Joel W. Burdick

Abstract: This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Q-value with respect to the reward function. These methods… ▽ More This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Q-value with respect to the reward function. These methods allow us to build a differentiable relation between the Q-value and the reward function and learn an approximately optimal reward function with gradient methods. We test the proposed method in two simulated environments by evaluating the accuracy of different approximations and comparing the proposed method with existing solutions. The results show that even with a linear reward function, the proposed method has a comparable accuracy with the state-of-the-art method adopting a non-linear reward function, and the proposed method is more flexible because it is defined on observed actions instead of trajectories. △ Less

Submitted 24 July, 2017; originally announced July 2017.

arXiv:1707.07139 [pdf, ps, other]

Clinical Patient Tracking in the Presence of Transient and Permanent Occlusions via Geodesic Feature

Authors: Kun Li, Joel W. Burdick

Abstract: This paper develops a method to use RGB-D cameras to track the motions of a human spinal cord injury patient undergoing spinal stimulation and physical rehabilitation. Because clinicians must remain close to the patient during training sessions, the patient is usually under permanent and transient occlusions due to the training equipment and the movements of the attending clinicians. These occlusi… ▽ More This paper develops a method to use RGB-D cameras to track the motions of a human spinal cord injury patient undergoing spinal stimulation and physical rehabilitation. Because clinicians must remain close to the patient during training sessions, the patient is usually under permanent and transient occlusions due to the training equipment and the movements of the attending clinicians. These occlusions can significantly degrade the accuracy of existing human tracking methods. To improve the data association problem in these circumstances, we present a new global feature based on the geodesic distances of surface mesh points to a set of anchor points. Transient occlusions are handled via a multi-hypothesis tracking framework. To evaluate the method, we simulated different occlusion sizes on a data set captured from a human in varying movement patterns, and compared the proposed feature with other tracking methods. The results show that the proposed method achieves robustness to both surface deformations and transient occlusions. △ Less

Submitted 24 July, 2017; v1 submitted 22 July, 2017; originally announced July 2017.

arXiv:1707.02375 [pdf, ps, other]

Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces

Authors: Yanan Sui, Yisong Yue, Joel W. Burdick

Abstract: We consider sequential decision making under uncertainty, where the goal is to optimize over a large decision space using noisy comparative feedback. This problem can be formulated as a $K$-armed Dueling Bandits problem where $K$ is the total number of decisions. When $K$ is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This pap… ▽ More We consider sequential decision making under uncertainty, where the goal is to optimize over a large decision space using noisy comparative feedback. This problem can be formulated as a $K$-armed Dueling Bandits problem where $K$ is the total number of decisions. When $K$ is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper studies the dueling bandits problem with a large number of arms that exhibit a low-dimensional correlation structure. Our problem is motivated by a clinical decision making process in large decision space. We propose an efficient algorithm CorrDuel which optimizes the exploration/exploitation tradeoff in this large decision space of clinical treatments. More broadly, our approach can be applied to other sequential decision problems with large and structured decision spaces. We derive regret bounds, and evaluate performance in simulation experiments as well as on a live clinical trial of therapeutic spinal cord stimulation. To our knowledge, this marks the first time an online learning algorithm was applied towards spinal cord injury treatments. Our experimental results show the effectiveness and efficiency of our approach. △ Less

Submitted 7 July, 2017; originally announced July 2017.

arXiv:1705.00253 [pdf, ps, other]

Multi-dueling Bandits with Dependent Arms

Authors: Yanan Sui, Vincent Zhuang, Joel W. Burdick, Yisong Yue

Abstract: The dueling bandits problem is an online learning framework for learning from pairwise preference feedback, and is particularly well-suited for modeling settings that elicit subjective or implicit human feedback. In this paper, we study the problem of multi-dueling bandits with dependent arms, which extends the original dueling bandits setting by simultaneously dueling multiple arms as well as mod… ▽ More The dueling bandits problem is an online learning framework for learning from pairwise preference feedback, and is particularly well-suited for modeling settings that elicit subjective or implicit human feedback. In this paper, we study the problem of multi-dueling bandits with dependent arms, which extends the original dueling bandits setting by simultaneously dueling multiple arms as well as modeling dependencies between arms. These extensions capture key characteristics found in many real-world applications, and allow for the opportunity to develop significantly more efficient algorithms than were possible in the original setting. We propose the \selfsparring algorithm, which reduces the multi-dueling bandits problem to a conventional bandit setting that can be solved using a stochastic bandit algorithm such as Thompson Sampling, and can naturally model dependencies using a Gaussian process prior. We present a no-regret analysis for multi-dueling setting, and demonstrate the effectiveness of our algorithm empirically on a wide range of simulation settings. △ Less

Submitted 29 April, 2017; originally announced May 2017.

arXiv:1410.2792 [pdf, other]

Convex Model Predictive Control for Vehicular Systems

Authors: Tiffany A. Huang, Matanya B. Horowitz, Joel W. Burdick

Abstract: In this work, we present a method to perform Model Predictive Control (MPC) over systems whose state is an element of $SO(n)$ for $n=2,3$. This is done without charts or any local linearization, and instead is performed by operating over the orbitope of rotation matrices. This results in a novel MPC scheme without the drawbacks associated with conventional linearization techniques. Instead, second… ▽ More In this work, we present a method to perform Model Predictive Control (MPC) over systems whose state is an element of $SO(n)$ for $n=2,3$. This is done without charts or any local linearization, and instead is performed by operating over the orbitope of rotation matrices. This results in a novel MPC scheme without the drawbacks associated with conventional linearization techniques. Instead, second order cone- or semidefinite-constraints on state variables are the only requirement beyond those of a QP-scheme typical for MPC of linear systems. Of particular emphasis is the application to aeronautical and vehicular systems, wherein the method removes many of the transcendental trigonometric terms associated with these systems' state space equations. Furthermore, the method is shown to be compatible with many existing variants of MPC, including obstacle avoidance via Mixed Integer Linear Programming (MILP). △ Less

Submitted 10 October, 2014; originally announced October 2014.

arXiv:1409.5993 [pdf, other]

Optimal Navigation Functions for Nonlinear Stochastic Systems

Authors: Matanya B. Horowitz, Joel W. Burdick

Abstract: This paper presents a new methodology to craft navigation functions for nonlinear systems with stochastic uncertainty. The method relies on the transformation of the Hamilton-Jacobi-Bellman (HJB) equation into a linear partial differential equation. This approach allows for optimality criteria to be incorporated into the navigation function, and generalizes several existing results in navigation f… ▽ More This paper presents a new methodology to craft navigation functions for nonlinear systems with stochastic uncertainty. The method relies on the transformation of the Hamilton-Jacobi-Bellman (HJB) equation into a linear partial differential equation. This approach allows for optimality criteria to be incorporated into the navigation function, and generalizes several existing results in navigation functions. It is shown that the HJB and that existing navigation functions in the literature sit on ends of a spectrum of optimization problems, upon which tradeoffs may be made in problem complexity. In particular, it is shown that under certain criteria the optimal navigation function is related to Laplace's equation, previously used in the literature, through an exponential transform. Further, analytical solutions to the HJB are available in simplified domains, yielding guidance towards optimality for approximation schemes. Examples are used to illustrate the role that noise, and optimality can potentially play in navigation system design. △ Less

Submitted 21 September, 2014; originally announced September 2014.

Comments: Accepted to IROS 2014. 8 Pages

arXiv:1401.3700 [pdf, other]

Convex Relaxations of SE(2) and SE(3) for Visual Pose Estimation

Authors: Matanya B. Horowitz, Nikolai Matni, Joel W. Burdick

Abstract: This paper proposes a new method for rigid body pose estimation based on spectrahedral representations of the tautological orbitopes of $SE(2)$ and $SE(3)$. The approach can use dense point cloud data from stereo vision or an RGB-D sensor (such as the Microsoft Kinect), as well as visual appearance data. The method is a convex relaxation of the classical pose estimation problem, and is based on ex… ▽ More This paper proposes a new method for rigid body pose estimation based on spectrahedral representations of the tautological orbitopes of $SE(2)$ and $SE(3)$. The approach can use dense point cloud data from stereo vision or an RGB-D sensor (such as the Microsoft Kinect), as well as visual appearance data. The method is a convex relaxation of the classical pose estimation problem, and is based on explicit linear matrix inequality (LMI) representations for the convex hulls of $SE(2)$ and $SE(3)$. Given these representations, the relaxed pose estimation problem can be framed as a robust least squares problem with the optimization variable constrained to these convex sets. Although this formulation is a relaxation of the original problem, numerical experiments indicate that it is indeed exact - i.e. its solution is a member of $SE(2)$ or $SE(3)$ - in many interesting settings. We additionally show that this method is guaranteed to be exact for a large class of pose estimation problems. △ Less

Submitted 6 April, 2014; v1 submitted 15 January, 2014; originally announced January 2014.

Comments: ICRA 2014 Preprint

Showing 1–47 of 47 results for author: Burdick, J W