Search | arXiv e-print repository

Negotiating Control: Neurosymbolic Variable Autonomy

Authors: Georgios Bakirtzis, Manolis Chiou, Andreas Theodorou

Abstract: Variable autonomy equips a system, such as a robot, with mixed initiatives such that it can adjust its independence level based on the task's complexity and the surrounding environment. Variable autonomy solves two main problems in robotic planning: the first is the problem of humans being unable to keep focus in monitoring and intervening during robotic tasks without appropriate human factor indi… ▽ More Variable autonomy equips a system, such as a robot, with mixed initiatives such that it can adjust its independence level based on the task's complexity and the surrounding environment. Variable autonomy solves two main problems in robotic planning: the first is the problem of humans being unable to keep focus in monitoring and intervening during robotic tasks without appropriate human factor indicators, and the second is achieving mission success in unforeseen and uncertain environments in the face of static reward structures. An open problem in variable autonomy is developing robust methods to dynamically balance autonomy and human intervention in real-time, ensuring optimal performance and safety in unpredictable and evolving environments. We posit that addressing unpredictable and evolving environments through an addition of rule-based symbolic logic has the potential to make autonomy adjustments more contextually reliable and adding feedback to reinforcement learning through data from mixed-initiative control further increases efficacy and safety of autonomous behaviour. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2405.16381 [pdf, other]

Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

Authors: Yuchen Zhu, Tianrong Chen, Lingkai Kong, Evangelos A. Theodorou, Molei Tao

Abstract: The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the po… ▽ More The generative modeling of data on manifold is an important task, for which diffusion models in flat spaces typically need nontrivial adaptations. This article demonstrates how a technique called `trivialization' can transfer the effectiveness of diffusion models in Euclidean spaces to Lie groups. In particular, an auxiliary momentum variable was algorithmically introduced to help transport the position variable between data distribution and a fixed, easy-to-sample distribution. Normally, this would incur further difficulty for manifold data because momentum lives in a space that changes with the position. However, our trivialization technique creates to a new momentum variable that stays in a simple $\textbf{fixed vector space}$. This design, together with a manifold preserving integrator, simplifies implementation and avoids inaccuracies created by approximations such as projections to tangent space and manifold, which were typically used in prior work, hence facilitating generation with high-fidelity and efficiency. The resulting method achieves state-of-the-art performance on protein and RNA torsion angle generation and sophisticated torus datasets. We also, arguably for the first time, tackle the generation of data on high-dimensional Special Orthogonal and Unitary groups, the latter essential for quantum problems. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2404.13430 [pdf, other]

React-OT: Optimal Transport for Generating Transition State in Chemical Reactions

Authors: Chenru Duan, Guan-Horng Liu, Yuanqi Du, Tianrong Chen, Qiyuan Zhao, Haojun Jia, Carla P. Gomes, Evangelos A. Theodorou, Heather J. Kulik

Abstract: Transition states (TSs) are transient structures that are key in understanding reaction mechanisms and designing catalysts but challenging to be captured in experiments. Alternatively, many optimization algorithms have been developed to search for TSs computationally. Yet the cost of these algorithms driven by quantum chemistry methods (usually density functional theory) is still high, posing chal… ▽ More Transition states (TSs) are transient structures that are key in understanding reaction mechanisms and designing catalysts but challenging to be captured in experiments. Alternatively, many optimization algorithms have been developed to search for TSs computationally. Yet the cost of these algorithms driven by quantum chemistry methods (usually density functional theory) is still high, posing challenges for their applications in building large reaction networks for reaction exploration. Here we developed React-OT, an optimal transport approach for generating unique TS structures from reactants and products. React-OT generates highly accurate TS structures with a median structural root mean square deviation (RMSD) of 0.053Å and median barrier height error of 1.06 kcal/mol requiring only 0.4 second per reaction. The RMSD and barrier height error is further improved by roughly 25% through pretraining React-OT on a large reaction dataset obtained with a lower level of theory, GFN2-xTB. We envision the great accuracy and fast inference of React-OT useful in targeting TSs when exploring chemical reactions with unknown mechanisms. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 5 figures, 1 table

arXiv:2404.06336 [pdf, other]

Quantum State Generation with Structure-Preserving Diffusion Model

Authors: Yuchen Zhu, Tianrong Chen, Evangelos A. Theodorou, Xie Chen, Molei Tao

Abstract: This article considers the generative modeling of the (mixed) states of quantum systems, and an approach based on denoising diffusion model is proposed. The key contribution is an algorithmic innovation that respects the physical nature of quantum states. More precisely, the commonly used density matrix representation of mixed-state has to be complex-valued Hermitian, positive semi-definite, and t… ▽ More This article considers the generative modeling of the (mixed) states of quantum systems, and an approach based on denoising diffusion model is proposed. The key contribution is an algorithmic innovation that respects the physical nature of quantum states. More precisely, the commonly used density matrix representation of mixed-state has to be complex-valued Hermitian, positive semi-definite, and trace one. Generic diffusion models, or other generative methods, may not be able to generate data that strictly satisfy these structural constraints, even if all training data do. To develop a machine learning algorithm that has physics hard-wired in, we leverage mirror diffusion and borrow the physical notion of von Neumann entropy to design a new map, for enabling strict structure-preserving generation. Both unconditional generation and conditional generation via classifier-free guidance are experimentally demonstrated efficacious, the latter enabling the design of new quantum states when generated on unseen labels. △ Less

Submitted 25 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.03094 [pdf, other]

doi 10.1109/LRA.2024.3382530

Low Frequency Sampling in Model Predictive Path Integral Control

Authors: Bogdan Vlahov, Jason Gibson, David D. Fan, Patrick Spieler, Ali-akbar Agha-mohammadi, Evangelos A. Theodorou

Abstract: Sampling-based model-predictive controllers have become a powerful optimization tool for planning and control problems in various challenging environments. In this paper, we show how the default choice of uncorrelated Gaussian distributions can be improved upon with the use of a colored noise distribution. Our choice of distribution allows for the emphasis on low frequency control signals, which c… ▽ More Sampling-based model-predictive controllers have become a powerful optimization tool for planning and control problems in various challenging environments. In this paper, we show how the default choice of uncorrelated Gaussian distributions can be improved upon with the use of a colored noise distribution. Our choice of distribution allows for the emphasis on low frequency control signals, which can result in smoother and more exploratory samples. We use this frequency-based sampling distribution with Model Predictive Path Integral (MPPI) in both hardware and simulation experiments to show better or equal performance on systems with various speeds of input response. △ Less

Submitted 18 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: Published to RA-L

Journal ref: IEEE Robotics and Automation Letters, vol. 9, no. 5, pp.4543-4550, 2024

arXiv:2403.18130 [pdf, other]

Generalized Maximum Entropy Differential Dynamic Programming

Authors: Yuichiro Aoyama, Evangelos A. Theodorou

Abstract: We present a sampling-based trajectory optimization method derived from the maximum entropy formulation of Differential Dynamic Programming with Tsallis entropy. This method can be seen as a generalization of the legacy work with Shannon entropy, which leads to a Gaussian optimal control policy for exploration during optimization. With the Tsallis entropy, the optimal control policy takes the form… ▽ More We present a sampling-based trajectory optimization method derived from the maximum entropy formulation of Differential Dynamic Programming with Tsallis entropy. This method can be seen as a generalization of the legacy work with Shannon entropy, which leads to a Gaussian optimal control policy for exploration during optimization. With the Tsallis entropy, the optimal control policy takes the form of $q$-Gaussian, which further encourages exploration with its heavy-tailed shape. Moreover, in our formulation, the exploration variance, which was scaled by a fixed constant inverse temperature in the original formulation with Shannon entropy, is automatically scaled based on the value function of the trajectory. Due to this property, our algorithms can promote exploration when necessary, that is, the cost of the trajectory is high, rather than using the same scaling factor. The simulation results demonstrate the properties of the proposed algorithm described above. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 7 pages, 5 figures, This paper is for CDC 2024

MSC Class: 34H05

arXiv:2402.16227 [pdf, other]

Scaling Robust Optimization for Multi-Agent Robotic Systems: A Distributed Perspective

Authors: Arshiya Taj Abdul, Augustinos D. Saravanos, Evangelos A. Theodorou

Abstract: This paper presents a novel distributed robust optimization scheme for steering distributions of multi-agent systems under stochastic and deterministic uncertainty. Robust optimization is a subfield of optimization which aims in discovering an optimal solution that remains robustly feasible for all possible realizations of the problem parameters within a given uncertainty set. Such approaches woul… ▽ More This paper presents a novel distributed robust optimization scheme for steering distributions of multi-agent systems under stochastic and deterministic uncertainty. Robust optimization is a subfield of optimization which aims in discovering an optimal solution that remains robustly feasible for all possible realizations of the problem parameters within a given uncertainty set. Such approaches would naturally constitute an ideal candidate for multi-robot control, where in addition to stochastic noise, there might be exogenous deterministic disturbances. Nevertheless, as these methods are usually associated with significantly high computational demands, their application to multi-agent robotics has remained limited. The scope of this work is to propose a scalable robust optimization framework that effectively addresses both types of uncertainties, while retaining computational efficiency and scalability. In this direction, we provide tractable approximations for robust constraints that are relevant in multi-robot settings. Subsequently, we demonstrate how computations can be distributed through an Alternating Direction Method of Multipliers (ADMM) approach towards achieving scalability and communication efficiency. Simulation results highlight the performance of the proposed algorithm in effectively handling both stochastic and deterministic uncertainty in multi-robot systems. The scalability of the method is also emphasized by showcasing tasks with up to 100 agents. The results of this work indicate the promise of blending robust optimization, distribution steering and distributed optimization towards achieving scalable, safe and robust multi-robot control. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2312.07635 [pdf, other]

Clash of the Explainers: Argumentation for Context-Appropriate Explanations

Authors: Leila Methnani, Virginia Dignum, Andreas Theodorou

Abstract: Understanding when and why to apply any given eXplainable Artificial Intelligence (XAI) technique is not a straightforward task. There is no single approach that is best suited for a given context. This paper aims to address the challenge of selecting the most appropriate explainer given the context in which an explanation is required. For AI explainability to be effective, explanations and how th… ▽ More Understanding when and why to apply any given eXplainable Artificial Intelligence (XAI) technique is not a straightforward task. There is no single approach that is best suited for a given context. This paper aims to address the challenge of selecting the most appropriate explainer given the context in which an explanation is required. For AI explainability to be effective, explanations and how they are presented needs to be oriented towards the stakeholder receiving the explanation. If -- in general -- no single explanation technique surpasses the rest, then reasoning over the available methods is required in order to select one that is context-appropriate. Due to the transparency they afford, we propose employing argumentation techniques to reach an agreement over the most suitable explainers from a given set of possible explainers. In this paper, we propose a modular reasoning system consisting of a given mental model of the relevant stakeholder, a reasoner component that solves the argumentation problem generated by a multi-explainer component, and an AI model that is to be explained suitably to the stakeholder of interest. By formalising supporting premises -- and inferences -- we can map stakeholder characteristics to those of explanation techniques. This allows us to reason over the techniques and prioritise the best one for the given context, while also offering transparency into the selection decision. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 17 pages, 3 figures, Accepted at XAI^3 Workshop at ECAI 2023

arXiv:2311.06978 [pdf, other]

Augmented Bridge Matching

Authors: Valentin De Bortoli, Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou, Weilie Nie

Abstract: Flow and bridge matching are a novel class of processes which encompass diffusion models. One of the main aspect of their increased flexibility is that these models can interpolate between arbitrary data distributions i.e. they generalize beyond generative modeling and can be applied to learning stochastic (and deterministic) processes of arbitrary transfer tasks between two given distributions. I… ▽ More Flow and bridge matching are a novel class of processes which encompass diffusion models. One of the main aspect of their increased flexibility is that these models can interpolate between arbitrary data distributions i.e. they generalize beyond generative modeling and can be applied to learning stochastic (and deterministic) processes of arbitrary transfer tasks between two given distributions. In this paper, we highlight that while flow and bridge matching processes preserve the information of the marginal distributions, they do \emph{not} necessarily preserve the coupling information unless additional, stronger optimality conditions are met. This can be problematic if one aims at preserving the original empirical pairing. We show that a simple modification of the matching process recovers this coupling by augmenting the velocity field (or drift) with the information of the initial sample point. Doing so, we lose the Markovian property of the process but preserve the coupling information between distributions. We illustrate the efficiency of our augmentation in learning mixture of image translation tasks. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2310.07805 [pdf, other]

Generative Modeling with Phase Stochastic Bridges

Authors: Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Joshua Susskind, Shuangfei Zhai

Abstract: Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented spac… ▽ More Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling. △ Less

Submitted 12 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.02233 [pdf, other]

Generalized Schrödinger Bridge Matching

Authors: Guan-Horng Liu, Yaron Lipman, Maximilian Nickel, Brian Karrer, Evangelos A. Theodorou, Ricky T. Q. Chen

Abstract: Modern distribution matching algorithms for training diffusion or flow models directly prescribe the time evolution of the marginal distributions between two boundary distributions. In this work, we consider a generalized distribution matching setup, where these marginals are only implicitly described as a solution to some task-specific objective function. The problem setup, known as the Generaliz… ▽ More Modern distribution matching algorithms for training diffusion or flow models directly prescribe the time evolution of the marginal distributions between two boundary distributions. In this work, we consider a generalized distribution matching setup, where these marginals are only implicitly described as a solution to some task-specific objective function. The problem setup, known as the Generalized Schrödinger Bridge (GSB), appears prevalently in many scientific areas both within and without machine learning. We propose Generalized Schrödinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances, generalizing them beyond kinetic energy minimization and to account for task-specific state costs. We show that such a generalization can be cast as solving conditional stochastic optimal control, for which efficient variational approximations can be used, and further debiased with the aid of path integral theory. Compared to prior methods for solving GSB problems, our GSBM algorithm better preserves a feasible transport map between the boundary distributions throughout training, thereby enabling stable convergence and significantly improved scalability. We empirically validate our claims on an extensive suite of experimental setups, including crowd navigation, opinion depolarization, LiDAR manifolds, and image domain transfer. Our work brings new algorithmic opportunities for training diffusion models enhanced with task-specific optimality structures. Code available at https://github.com/facebookresearch/generalized-schrodinger-bridge-matching △ Less

Submitted 18 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: ICLR 2024 Camera Ready

arXiv:2310.01236 [pdf, other]

Mirror Diffusion Models for Constrained and Watermarked Generation

Authors: Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou, Molei Tao

Abstract: Modern successes of diffusion models in learning complex, high-dimensional data distributions are attributed, in part, to their capability to construct diffusion processes with analytic transition kernels and score functions. The tractability results in a simulation-free framework with stable regression losses, from which reversed, generative processes can be learned at scale. However, when data i… ▽ More Modern successes of diffusion models in learning complex, high-dimensional data distributions are attributed, in part, to their capability to construct diffusion processes with analytic transition kernels and score functions. The tractability results in a simulation-free framework with stable regression losses, from which reversed, generative processes can be learned at scale. However, when data is confined to a constrained set as opposed to a standard Euclidean space, these desirable characteristics appear to be lost based on prior attempts. In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability. This is achieved by learning diffusion processes in a dual space constructed from a mirror map, which, crucially, is a standard Euclidean space. We derive efficient computation of mirror maps for popular constrained sets, such as simplices and $\ell_2$-balls, showing significantly improved performance of MDM over existing methods. For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information (i.e., watermarks) in generated data, for which MDM serves as a compelling approach. Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains. Our code is available at https://github.com/ghliu/mdm △ Less

Submitted 29 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: submitted to NeurIPS on 5/18 but did not arxiv per NeurIPS policy, accepted on 9/22

arXiv:2309.12756 [pdf, other]

Towards an MLOps Architecture for XAI in Industrial Applications

Authors: Leonhard Faubel, Thomas Woudsma, Leila Methnani, Amir Ghorbani Ghezeljhemeidan, Fabian Buelow, Klaus Schmid, Willem D. van Driel, Benjamin Kloepper, Andreas Theodorou, Mohsen Nosratinia, Magnus Bång

Abstract: Machine learning (ML) has become a popular tool in the industrial sector as it helps to improve operations, increase efficiency, and reduce costs. However, deploying and managing ML models in production environments can be complex. This is where Machine Learning Operations (MLOps) comes in. MLOps aims to streamline this deployment and management process. One of the remaining MLOps challenges is th… ▽ More Machine learning (ML) has become a popular tool in the industrial sector as it helps to improve operations, increase efficiency, and reduce costs. However, deploying and managing ML models in production environments can be complex. This is where Machine Learning Operations (MLOps) comes in. MLOps aims to streamline this deployment and management process. One of the remaining MLOps challenges is the need for explanations. These explanations are essential for understanding how ML models reason, which is key to trust and acceptance. Better identification of errors and improved model accuracy are only two resulting advantages. An often neglected fact is that deployed models are bypassed in practice when accuracy and especially explainability do not meet user expectations. We developed a novel MLOps software architecture to address the challenge of integrating explanations and feedback capabilities into the ML development and deployment processes. In the project EXPLAIN, our architecture is implemented in a series of industrial use cases. The proposed MLOps software architecture has several advantages. It provides an efficient way to manage ML models in production environments. Further, it allows for integrating explanations into the development and deployment processes. △ Less

Submitted 20 October, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2308.08426 [pdf, other]

Differentiable Robust Model Predictive Control

Authors: Alex Oshin, Hassan Almubarak, Evangelos A. Theodorou

Abstract: Deterministic model predictive control (MPC), while powerful, is often insufficient for effectively controlling autonomous systems in the real-world. Factors such as environmental noise and model error can cause deviations from the expected nominal performance. Robust MPC algorithms aim to bridge this gap between deterministic and uncertain control. However, these methods are often excessively dif… ▽ More Deterministic model predictive control (MPC), while powerful, is often insufficient for effectively controlling autonomous systems in the real-world. Factors such as environmental noise and model error can cause deviations from the expected nominal performance. Robust MPC algorithms aim to bridge this gap between deterministic and uncertain control. However, these methods are often excessively difficult to tune for robustness due to the nonlinear and non-intuitive effects that controller parameters have on performance. To address this challenge, we first present a unifying perspective on differentiable optimization for control using the implicit function theorem (IFT), from which existing state-of-the art methods can be derived. Drawing parallels with differential dynamic programming, the IFT enables the derivation of an efficient differentiable optimal control framework. The derived scheme is subsequently paired with a tube-based MPC architecture to facilitate the automatic and real-time tuning of robust controllers in the presence of large uncertainties and disturbances. The proposed algorithm is benchmarked on multiple nonlinear robotic systems, including two systems in the MuJoCo simulator environment and one hardware experiment on the Robotarium testbed, to demonstrate its efficacy. △ Less

Submitted 26 July, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: Accepted to Robotics: Science and Systems 2024

arXiv:2305.18718 [pdf, other]

Distributed Hierarchical Distribution Control for Very-Large-Scale Clustered Multi-Agent Systems

Authors: Augustinos D. Saravanos, Yihui Li, Evangelos A. Theodorou

Abstract: As the scale and complexity of multi-agent robotic systems are subject to a continuous increase, this paper considers a class of systems labeled as Very-Large-Scale Multi-Agent Systems (VLMAS) with dimensionality that can scale up to the order of millions of agents. In particular, we consider the problem of steering the state distributions of all agents of a VLMAS to prescribed target distribution… ▽ More As the scale and complexity of multi-agent robotic systems are subject to a continuous increase, this paper considers a class of systems labeled as Very-Large-Scale Multi-Agent Systems (VLMAS) with dimensionality that can scale up to the order of millions of agents. In particular, we consider the problem of steering the state distributions of all agents of a VLMAS to prescribed target distributions while satisfying probabilistic safety guarantees. Based on the key assumption that such systems often admit a multi-level hierarchical clustered structure - where the agents are organized into cliques of different levels - we associate the control of such cliques with the control of distributions, and introduce the Distributed Hierarchical Distribution Control (DHDC) framework. The proposed approach consists of two sub-frameworks. The first one, Distributed Hierarchical Distribution Estimation (DHDE), is a bottom-up hierarchical decentralized algorithm which links the initial and target configurations of the cliques of all levels with suitable Gaussian distributions. The second part, Distributed Hierarchical Distribution Steering (DHDS), is a top-down hierarchical distributed method that steers the distributions of all cliques and agents from the initial to the targets ones assigned by DHDE. Simulation results that scale up to two million agents demonstrate the effectiveness and scalability of the proposed framework. The increased computational efficiency and safety performance of DHDC against related methods is also illustrated. The results of this work indicate the importance of hierarchical distribution control approaches towards achieving safe and scalable solutions for the control of VLMAS. A video with all results is available in https://youtu.be/0QPyR4bD2q0 . △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: Accepted at Robotics: Science and Systems 2023

arXiv:2305.02241 [pdf, other]

A Multi-step Dynamics Modeling Framework For Autonomous Driving In Multiple Environments

Authors: Jason Gibson, Bogdan Vlahov, David Fan, Patrick Spieler, Daniel Pastor, Ali-akbar Agha-mohammadi, Evangelos A. Theodorou

Abstract: Modeling dynamics is often the first step to making a vehicle autonomous. While on-road autonomous vehicles have been extensively studied, off-road vehicles pose many challenging modeling problems. An off-road vehicle encounters highly complex and difficult-to-model terrain/vehicle interactions, as well as having complex vehicle dynamics of its own. These complexities can create challenges for eff… ▽ More Modeling dynamics is often the first step to making a vehicle autonomous. While on-road autonomous vehicles have been extensively studied, off-road vehicles pose many challenging modeling problems. An off-road vehicle encounters highly complex and difficult-to-model terrain/vehicle interactions, as well as having complex vehicle dynamics of its own. These complexities can create challenges for effective high-speed control and planning. In this paper, we introduce a framework for multistep dynamics prediction that explicitly handles the accumulation of modeling error and remains scalable for sampling-based controllers. Our method uses a specially-initialized Long Short-Term Memory (LSTM) over a limited time horizon as the learned component in a hybrid model to predict the dynamics of a 4-person seating all-terrain vehicle (Polaris S4 1000 RZR) in two distinct environments. By only having the LSTM predict over a fixed time horizon, we negate the need for long term stability that is often a challenge when training recurrent neural networks. Our framework is flexible as it only requires odometry information for labels. Through extensive experimentation, we show that our method is able to predict millions of possible trajectories in real-time, with a time horizon of five seconds in challenging off road driving scenarios. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2303.06776 [pdf, other]

Robot Health Indicator: A Visual Cue to Improve Level of Autonomy Switching Systems

Authors: Aniketh Ramesh, Madeleine Englund, Andreas Theodorou, Rustam Stolkin, Manolis Chiou

Abstract: Using different Levels of Autonomy (LoA), a human operator can vary the extent of control they have over a robot's actions. LoAs enable operators to mitigate a robot's performance degradation or limitations in the its autonomous capabilities. However, LoA regulation and other tasks may often overload an operator's cognitive abilities. Inspired by video game user interfaces, we study if adding a 'R… ▽ More Using different Levels of Autonomy (LoA), a human operator can vary the extent of control they have over a robot's actions. LoAs enable operators to mitigate a robot's performance degradation or limitations in the its autonomous capabilities. However, LoA regulation and other tasks may often overload an operator's cognitive abilities. Inspired by video game user interfaces, we study if adding a 'Robot Health Bar' to the robot control UI can reduce the cognitive demand and perceptual effort required for LoA regulation while promoting trust and transparency. This Health Bar uses the robot vitals and robot health framework to quantify and present runtime performance degradation in robots. Results from our pilot study indicate that when using a health bar, operators used to manual control more to minimise the risk of robot failure during high performance degradation. It also gave us insights and lessons to inform subsequent experiments on human-robot teaming. △ Less

Submitted 12 March, 2023; originally announced March 2023.

Comments: Accepted for Variable Autonomy for human-robot Teaming (VAT) workshop at ACM/IEEE HRI 2023

ACM Class: I.2.9

arXiv:2303.03360 [pdf, other]

Improved Exploration for Safety-Embedded Differential Dynamic Programming Using Tolerant Barrier States

Authors: Joshua E. Kuperman, Hassan Almubarak, Augustinos D. Saravanos, Evangelos A. Theodorou

Abstract: In this paper, we introduce Tolerant Discrete Barrier States (T-DBaS), a novel safety-embedding technique for trajectory optimization with enhanced exploratory capabilities. The proposed approach generalizes the standard discrete barrier state (DBaS) method by accommodating temporary constraint violation during the optimization process while still approximating its safety guarantees. Consequently,… ▽ More In this paper, we introduce Tolerant Discrete Barrier States (T-DBaS), a novel safety-embedding technique for trajectory optimization with enhanced exploratory capabilities. The proposed approach generalizes the standard discrete barrier state (DBaS) method by accommodating temporary constraint violation during the optimization process while still approximating its safety guarantees. Consequently, the proposed approach eliminates the DBaS's safe nominal trajectories assumption, while enhancing its exploration effectiveness for escaping local minima. Towards applying T-DBaS to safety-critical autonomous robotics, we combine it with Differential Dynamic Programming (DDP), leading to the proposed safe trajectory optimization method T-DBaS-DDP, which inherits the convergence and scalability properties of the solver. The effectiveness of the T-DBaS algorithm is verified on differential drive robot and quadrotor simulations. In addition, we compare against the classical DBaS-DDP as well as Augmented-Lagrangian DDP (AL-DDP) in extensive numerical comparisons that demonstrate the proposed method's competitive advantages. Finally, the applicability of the proposed approach is verified through hardware experiments on the Georgia Tech Robotarium platform. △ Less

Submitted 11 March, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

arXiv:2303.01751 [pdf, other]

Deep Momentum Multi-Marginal Schrödinger Bridge

Authors: Tianrong Chen, Guan-Horng Liu, Molei Tao, Evangelos A. Theodorou

Abstract: It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schrödinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Mar… ▽ More It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schrödinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Marginal $\underline{S}$chrödinger $\underline{B}$ridge(DMSB), a novel computational framework that learns the smooth measure-valued spline for stochastic systems that satisfy position marginal constraints across time. By tailoring the celebrated Bregman Iteration and extending the Iteration Proportional Fitting to phase space, we manage to handle high-dimensional multi-marginal trajectory inference tasks efficiently. Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA sequence dataset. Additionally, the proposed approach can reasonably reconstruct the evolution of velocity distribution, from position snapshots only, when there is a ground truth velocity that is nevertheless inaccessible. △ Less

Submitted 5 October, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2302.05872 [pdf, other]

I$^2$SB: Image-to-Image Schrödinger Bridge

Authors: Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, Anima Anandkumar

Abstract: We propose Image-to-Image Schrödinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions. These diffusion bridges are particularly useful for image restoration, as the degraded images are structurally informative priors for reconstructing the clean images. I$^2$SB belongs to a tractable class of Schröd… ▽ More We propose Image-to-Image Schrödinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions. These diffusion bridges are particularly useful for image restoration, as the degraded images are structurally informative priors for reconstructing the clean images. I$^2$SB belongs to a tractable class of Schrödinger bridge, the nonlinear extension to score-based models, whose marginal distributions can be computed analytically given boundary pairs. This results in a simulation-free framework for nonlinear diffusions, where the I$^2$SB training becomes scalable by adopting practical techniques used in standard diffusion models. We validate I$^2$SB in solving various image restoration tasks, including inpainting, super-resolution, deblurring, and JPEG restoration on ImageNet 256x256 and show that I$^2$SB surpasses standard conditional diffusion models with more interpretable generative processes. Moreover, I$^2$SB matches the performance of inverse methods that additionally require the knowledge of the corruption operators. Our work opens up new algorithmic opportunities for developing efficient nonlinear diffusion models on a large scale. scale. Project page and codes: https://i2sb.github.io/ △ Less

Submitted 25 May, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

Comments: ICML camera ready (high-resolution figures)

arXiv:2212.00398 [pdf, other]

Distributed Model Predictive Covariance Steering

Authors: Augustinos D. Saravanos, Isin M. Balci, Efstathios Bakolas, Evangelos A. Theodorou

Abstract: This paper proposes Distributed Model Predictive Covariance Steering (DMPCS), a novel method for safe multi-robot control under uncertainty. The scope of our approach is to blend covariance steering theory, distributed optimization and model predictive control (MPC) into a single methodology that is safe, scalable and decentralized. Initially, we pose a problem formulation that uses the Wasserstei… ▽ More This paper proposes Distributed Model Predictive Covariance Steering (DMPCS), a novel method for safe multi-robot control under uncertainty. The scope of our approach is to blend covariance steering theory, distributed optimization and model predictive control (MPC) into a single methodology that is safe, scalable and decentralized. Initially, we pose a problem formulation that uses the Wasserstein distance to steer the state distributions of a multi-robot team to desired targets, and probabilistic constraints to ensure safety. We then transform this problem into a finite-dimensional optimization one by utilizing a disturbance feedback policy parametrization for covariance steering and a tractable approximation of the safety constraints. To solve the latter problem, we derive a decentralized consensus-based algorithm using the Alternating Direction Method of Multipliers (ADMM). This method is then extended to a receding horizon form, which yields the proposed DMPCS algorithm. Simulation experiments on large-scale problems with up to hundreds of robots successfully demonstrate the effectiveness and scalability of DMPCS. Its superior capability in achieving safety is also highlighted through a comparison against a standard stochastic MPC approach. A video with all simulation experiments is available in https://youtu.be/Hks-0BRozxA. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2212.00268 [pdf, other]

Gaussian Process Barrier States for Safe Trajectory Optimization and Control

Authors: Hassan Almubarak, Manan Gandhi, Yuichiro Aoyama, Nader Sadegh, Evangelos A. Theodorou

Abstract: This paper proposes embedded Gaussian Process Barrier States (GP-BaS), a methodology to safely control unmodeled dynamics of nonlinear system using Bayesian learning. Gaussian Processes (GPs) are used to model the dynamics of the safety-critical system, which is subsequently used in the GP-BaS model. We derive the barrier state dynamics utilizing the GP posterior, which is used to construct a safe… ▽ More This paper proposes embedded Gaussian Process Barrier States (GP-BaS), a methodology to safely control unmodeled dynamics of nonlinear system using Bayesian learning. Gaussian Processes (GPs) are used to model the dynamics of the safety-critical system, which is subsequently used in the GP-BaS model. We derive the barrier state dynamics utilizing the GP posterior, which is used to construct a safety embedded Gaussian process dynamical model (GPDM). We show that the safety-critical system can be controlled to remain inside the safe region as long as we can design a controller that renders the BaS-GPDM's trajectories bounded (or asymptotically stable). The proposed approach overcomes various limitations in early attempts at combining GPs with barrier functions due to the abstention of restrictive assumptions such as linearity of the system with respect to control, relative degree of the constraints and number or nature of constraints. This work is implemented on various examples for trajectory optimization and control including optimal stabilization of unstable linear system and safe trajectory optimization of a Dubins vehicle navigating through an obstacle course and on a quadrotor in an obstacle avoidance task using GP differentiable dynamic programming (GP-DDP). The proposed framework is capable of maintaining safe optimization and control of unmodeled dynamics and is purely data driven. △ Less

Submitted 30 November, 2022; originally announced December 2022.

arXiv:2210.10814 [pdf, other]

MPOGames: Efficient Multimodal Partially Observable Dynamic Games

Authors: Oswin So, Paul Drews, Thomas Balch, Velin Dimitrov, Guy Rosman, Evangelos A. Theodorou

Abstract: Game theoretic methods have become popular for planning and prediction in situations involving rich multi-agent interactions. However, these methods often assume the existence of a single local Nash equilibria and are hence unable to handle uncertainty in the intentions of different agents. While maximum entropy (MaxEnt) dynamic games try to address this issue, practical approaches solve for MaxEn… ▽ More Game theoretic methods have become popular for planning and prediction in situations involving rich multi-agent interactions. However, these methods often assume the existence of a single local Nash equilibria and are hence unable to handle uncertainty in the intentions of different agents. While maximum entropy (MaxEnt) dynamic games try to address this issue, practical approaches solve for MaxEnt Nash equilibria using linear-quadratic approximations which are restricted to unimodal responses and unsuitable for scenarios with multiple local Nash equilibria. By reformulating the problem as a POMDP, we propose MPOGames, a method for efficiently solving MaxEnt dynamic games that captures the interactions between local Nash equilibria. We show the importance of uncertainty-aware game theoretic methods via a two-agent merge case study. Finally, we prove the real-time capabilities of our approach with hardware experiments on a 1/10th scale car platform. △ Less

Submitted 23 May, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: Accepted to ICRA 2023

arXiv:2210.09010 [pdf, other]

Good AI for Good: How AI Strategies of the Nordic Countries Address the Sustainable Development Goals

Authors: Andreas Theodorou, Juan Carlos Nieves, Virginia Dignum

Abstract: Developed and used responsibly Artificial Intelligence (AI) is a force for global sustainable development. Given this opportunity, we expect that the many of the existing guidelines and recommendations for trustworthy or responsible AI will provide explicit guidance on how AI can contribute to the achievement of United Nations' Sustainable Development Goals (SDGs). This would in particular be the… ▽ More Developed and used responsibly Artificial Intelligence (AI) is a force for global sustainable development. Given this opportunity, we expect that the many of the existing guidelines and recommendations for trustworthy or responsible AI will provide explicit guidance on how AI can contribute to the achievement of United Nations' Sustainable Development Goals (SDGs). This would in particular be the case for the AI strategies of the Nordic countries, at least given their high ranking and overall political focus when it comes to the achievement of the SDGs. In this paper, we present an analysis of existing AI recommendations from 10 different countries or organisations based on topic modelling techniques to identify how much these strategy documents refer to the SDGs. The analysis shows no significant difference on how much these documents refer to SDGs. Moreover, the Nordic countries are not different from the others albeit their long-term commitment to SDGs. More importantly, references to \textit{gender equality} (SDG 5) and \textit{inequality} (SDG 10), as well as references to environmental impact of AI development and use, and in particular the consequences for life on earth, are notably missing from the guidelines. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: IJCAI-AIofAI 2022 : 2nd Workshop on Adverse Impacts and Collateral Effects of AI Technologies

arXiv:2210.00090 [pdf, other]

Data-driven discovery of non-Newtonian astronomy via learning non-Euclidean Hamiltonian

Authors: Oswin So, Gongjie Li, Evangelos A. Theodorou, Molei Tao

Abstract: Incorporating the Hamiltonian structure of physical dynamics into deep learning models provides a powerful way to improve the interpretability and prediction accuracy. While previous works are mostly limited to the Euclidean spaces, their extension to the Lie group manifold is needed when rotations form a key component of the dynamics, such as the higher-order physics beyond simple point-mass dyna… ▽ More Incorporating the Hamiltonian structure of physical dynamics into deep learning models provides a powerful way to improve the interpretability and prediction accuracy. While previous works are mostly limited to the Euclidean spaces, their extension to the Lie group manifold is needed when rotations form a key component of the dynamics, such as the higher-order physics beyond simple point-mass dynamics for N-body celestial interactions. Moreover, the multiscale nature of these processes presents a challenge to existing methods as a long time horizon is required. By leveraging a symplectic Lie-group manifold preserving integrator, we present a method for data-driven discovery of non-Newtonian astronomy. Preliminary results show the importance of both these properties in training stability and prediction accuracy. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2209.09893 [pdf, other]

Deep Generalized Schrödinger Bridge

Authors: Guan-Horng Liu, Tianrong Chen, Oswin So, Evangelos A. Theodorou

Abstract: Mean-Field Game (MFG) serves as a crucial mathematical framework in modeling the collective behavior of individual agents interacting stochastically with a large population. In this work, we aim at solving a challenging class of MFGs in which the differentiability of these interacting preferences may not be available to the solver, and the population is urged to converge exactly to some desired di… ▽ More Mean-Field Game (MFG) serves as a crucial mathematical framework in modeling the collective behavior of individual agents interacting stochastically with a large population. In this work, we aim at solving a challenging class of MFGs in which the differentiability of these interacting preferences may not be available to the solver, and the population is urged to converge exactly to some desired distribution. These setups are, despite being well-motivated for practical purposes, complicated enough to paralyze most (deep) numerical solvers. Nevertheless, we show that Schrödinger Bridge - as an entropy-regularized optimal transport model - can be generalized to accepting mean-field structures, hence solving these MFGs. This is achieved via the application of Forward-Backward Stochastic Differential Equations theory, which, intriguingly, leads to a computational framework with a similar structure to Temporal Difference learning. As such, it opens up novel algorithmic connections to Deep Reinforcement Learning that we leverage to facilitate practical training. We show that our proposed objective function provides necessary and sufficient conditions to the mean-field problem. Our method, named Deep Generalized Schrödinger Bridge (DeepGSB), not only outperforms prior methods in solving classical population navigation MFGs, but is also capable of solving 1000-dimensional opinion depolarization, setting a new state-of-the-art numerical solver for high-dimensional MFGs. Our code will be made available at https://github.com/ghliu/DeepGSB. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: NeurIPS 2022

arXiv:2208.04697 [pdf, other]

Let it RAIN for Social Good

Authors: Mattias Brännström, Andreas Theodorou, Virginia Dignum

Abstract: Artificial Intelligence (AI) as a highly transformative technology take on a special role as both an enabler and a threat to UN Sustainable Development Goals (SDGs). AI Ethics and emerging high-level policy efforts stand at the pivot point between these outcomes but is barred from effect due the abstraction gap between high-level values and responsible action. In this paper the Responsible Norms (… ▽ More Artificial Intelligence (AI) as a highly transformative technology take on a special role as both an enabler and a threat to UN Sustainable Development Goals (SDGs). AI Ethics and emerging high-level policy efforts stand at the pivot point between these outcomes but is barred from effect due the abstraction gap between high-level values and responsible action. In this paper the Responsible Norms (RAIN) framework is presented, bridging this gap thereby enabling effective high-level control of AI impact. With effective and operationalized AI Ethics, AI technologies can be directed towards global sustainable development. △ Less

Submitted 26 July, 2022; originally announced August 2022.

arXiv:2204.10740 [pdf, other]

Embracing AWKWARD! Real-time Adjustment of Reactive Plans Using Social Norms

Authors: Leila Methnani, Andreas Antoniades, Andreas Theodorou

Abstract: This paper presents the AWKWARD architecture for the development of hybrid agents in Multi-Agent Systems. AWKWARD agents can have their plans re-configured in real time to align with social role requirements under changing environmental and social circumstances. The proposed hybrid architecture makes use of Behaviour Oriented Design (BOD) to develop agents with reactive planning and of the well-es… ▽ More This paper presents the AWKWARD architecture for the development of hybrid agents in Multi-Agent Systems. AWKWARD agents can have their plans re-configured in real time to align with social role requirements under changing environmental and social circumstances. The proposed hybrid architecture makes use of Behaviour Oriented Design (BOD) to develop agents with reactive planning and of the well-established OperA framework to provide organisational, social, and interaction definitions in order to validate and adjust agents' behaviours. Together, OperA and BOD can achieve real-time adjustment of agent plans for evolving social roles, while providing the additional benefit of transparency into the interactions that drive this behavioural change in individual agents. We present this architecture to motivate the bridging between traditional symbolic- and behaviour-based AI communities, where such combined solutions can help MAS researchers in their pursuit of building stronger, more robust intelligent agent teams. We use DOTA2, a game where success is heavily dependent on social interactions, as a medium to demonstrate a sample implementation of our proposed hybrid architecture. △ Less

Submitted 21 July, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: 18 pages, 2 figures, 3 Tables, 4 Formalisms, Accepted at COINE 2022 Workshop

arXiv:2204.03727 [pdf, other]

Parameterized Differential Dynamic Programming

Authors: Alex Oshin, Matthew D. Houghton, Michael J. Acheson, Irene M. Gregory, Evangelos A. Theodorou

Abstract: Differential Dynamic Programming (DDP) is an efficient trajectory optimization algorithm relying on second-order approximations of a system's dynamics and cost function, and has recently been applied to optimize systems with time-invariant parameters. Prior works include system parameter estimation and identifying the optimal switching time between modes of hybrid dynamical systems. This paper gen… ▽ More Differential Dynamic Programming (DDP) is an efficient trajectory optimization algorithm relying on second-order approximations of a system's dynamics and cost function, and has recently been applied to optimize systems with time-invariant parameters. Prior works include system parameter estimation and identifying the optimal switching time between modes of hybrid dynamical systems. This paper generalizes previous work by proposing a general parameterized optimal control objective and deriving a parametric version of DDP, titled Parameterized Differential Dynamic Programming (PDDP). A rigorous convergence analysis of the algorithm is provided, and PDDP is shown to converge to a minimum of the cost regardless of initialization. The effects of varying the optimization to more effectively escape local minima are analyzed. Experiments are presented applying PDDP on multiple robotics systems to solve model predictive control (MPC) and moving horizon estimation (MHE) tasks simultaneously. Finally, PDDP is used to determine the optimal transition point between flight regimes of a complex urban air mobility (UAM) class vehicle exhibiting multiple phases of flight. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: Submitted to RSS 2022

arXiv:2204.02506 [pdf, other]

Deep Graphic FBSDEs for Opinion Dynamics Stochastic Control

Authors: Tianrong Chen, Ziyi Wang, Evangelos A. Theodorou

Abstract: In this paper, we present a scalable deep learning approach to solve opinion dynamics stochastic optimal control problems with mean field term coupling in the dynamics and cost function. Our approach relies on the probabilistic representation of the solution of the Hamilton-Jacobi-Bellman partial differential equation. Grounded on the nonlinear version of the Feynman-Kac lemma, the solutions of th… ▽ More In this paper, we present a scalable deep learning approach to solve opinion dynamics stochastic optimal control problems with mean field term coupling in the dynamics and cost function. Our approach relies on the probabilistic representation of the solution of the Hamilton-Jacobi-Bellman partial differential equation. Grounded on the nonlinear version of the Feynman-Kac lemma, the solutions of the Hamilton-Jacobi-Bellman partial differential equation are linked to the solution of Forward-Backward Stochastic Differential Equations. These equations can be solved numerically using a novel deep neural network with architecture tailored to the problem in consideration. The resulting algorithm is tested on a polarized opinion consensus experiment. The large-scale (10K) agents experiment validates the scalability and generalizability of our algorithm. The proposed framework opens up the possibility for future applications on extremely large-scale problems. △ Less

Submitted 17 April, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

arXiv:2202.10658 [pdf, other]

Decentralized Safe Multi-agent Stochastic Optimal Control using Deep FBSDEs and ADMM

Authors: Marcus A. Pereira, Augustinos D. Saravanos, Oswin So, Evangelos A. Theodorou

Abstract: In this work, we propose a novel safe and scalable decentralized solution for multi-agent control in the presence of stochastic disturbances. Safety is mathematically encoded using stochastic control barrier functions and safe controls are computed by solving quadratic programs. Decentralization is achieved by augmenting to each agent's optimization variables, copy variables, for its neighbors. Th… ▽ More In this work, we propose a novel safe and scalable decentralized solution for multi-agent control in the presence of stochastic disturbances. Safety is mathematically encoded using stochastic control barrier functions and safe controls are computed by solving quadratic programs. Decentralization is achieved by augmenting to each agent's optimization variables, copy variables, for its neighbors. This allows us to decouple the centralized multi-agent optimization problem. However, to ensure safety, neighboring agents must agree on "what is safe for both of us" and this creates a need for consensus. To enable safe consensus solutions, we incorporate an ADMM-based approach. Specifically, we propose a Merged CADMM-OSQP implicit neural network layer, that solves a mini-batch of both, local quadratic programs as well as the overall consensus problem, as a single optimization problem. This layer is embedded within a Deep FBSDEs network architecture at every time step, to facilitate end-to-end differentiable, safe and decentralized stochastic optimal control. The efficacy of the proposed approach is demonstrated on several challenging multi-robot tasks in simulation. By imposing requirements on safety specified by collision avoidance constraints, the safe operation of all agents is ensured during the entire training process. We also demonstrate superior scalability in terms of computational and memory savings as compared to a centralized approach. △ Less

Submitted 7 June, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

Journal ref: Robotics: Science and Systems (RSS), 2022

arXiv:2201.12925 [pdf, other]

Multimodal Maximum Entropy Dynamic Games

Authors: Oswin So, Kyle Stachowicz, Evangelos A. Theodorou

Abstract: Environments with multi-agent interactions often result a rich set of modalities of behavior between agents due to the inherent suboptimality of decision making processes when agents settle for satisfactory decisions. However, existing algorithms for solving these dynamic games are strictly unimodal and fail to capture the intricate multimodal behaviors of the agents. In this paper, we propose MME… ▽ More Environments with multi-agent interactions often result a rich set of modalities of behavior between agents due to the inherent suboptimality of decision making processes when agents settle for satisfactory decisions. However, existing algorithms for solving these dynamic games are strictly unimodal and fail to capture the intricate multimodal behaviors of the agents. In this paper, we propose MMELQGames (Multimodal Maximum-Entropy Linear Quadratic Games), a novel constrained multimodal maximum entropy formulation of the Differential Dynamic Programming algorithm for solving generalized Nash equilibria. By formulating the problem as a certain dynamic game with incomplete and asymmetric information where agents are uncertain about the cost and dynamics of the game itself, the proposed method is able to reason about multiple local generalized Nash equilibria, enforce constraints with the Augmented Lagrangian framework and also perform Bayesian inference on the latent mode from past observations. We assess the efficacy of the proposed algorithm on two illustrative examples: multi-agent collision avoidance and autonomous racing. In particular, we show that only MMELQGames is able to effectively block a rear vehicle when given a speed disadvantage and the rear vehicle can overtake from multiple positions. △ Less

Submitted 2 February, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: Under review for RSS 2022. Supplementary Video: https://youtu.be/7molN_Q38dk

arXiv:2201.06539 [pdf, other]

Spatiotemporal Costmap Inference for MPC via Deep Inverse Reinforcement Learning

Authors: Keuntaek Lee, David Isele, Evangelos A. Theodorou, Sangjae Bae

Abstract: It can be difficult to autonomously produce driver behavior so that it appears natural to other traffic participants. Through Inverse Reinforcement Learning (IRL), we can automate this process by learning the underlying reward function from human demonstrations. We propose a new IRL algorithm that learns a goal-conditioned spatiotemporal reward function. The resulting costmap is used by Model Pred… ▽ More It can be difficult to autonomously produce driver behavior so that it appears natural to other traffic participants. Through Inverse Reinforcement Learning (IRL), we can automate this process by learning the underlying reward function from human demonstrations. We propose a new IRL algorithm that learns a goal-conditioned spatiotemporal reward function. The resulting costmap is used by Model Predictive Controllers (MPCs) to perform a task without any hand-designing or hand-tuning of the cost function. We evaluate our proposed Goal-conditioned SpatioTemporal Zeroing Maximum Entropy Deep IRL (GSTZ)-MEDIRL framework together with MPC in the CARLA simulator for autonomous driving, lane keeping, and lane changing tasks in a challenging dense traffic highway scenario. Our proposed methods show higher success rates compared to other baseline methods including behavior cloning, state-of-the-art RL policies, and MPC with a learning-based behavior prediction model. △ Less

Submitted 17 January, 2022; originally announced January 2022.

Comments: IEEE Robotics and Automation Letters (RA-L)

arXiv:2111.09207 [pdf, other]

Optimal-Horizon Model-Predictive Control with Differential Dynamic Programming

Authors: Kyle Stachowicz, Evangelos A. Theodorou

Abstract: We present an algorithm, based on the Differential Dynamic Programming framework, to handle trajectory optimization problems in which the horizon is determined online rather than fixed a priori. This algorithm exhibits exact one-step convergence for linear, quadratic, time-invariant problems and is fast enough for real-time nonlinear model-predictive control. We show derivations for the nonlinear… ▽ More We present an algorithm, based on the Differential Dynamic Programming framework, to handle trajectory optimization problems in which the horizon is determined online rather than fixed a priori. This algorithm exhibits exact one-step convergence for linear, quadratic, time-invariant problems and is fast enough for real-time nonlinear model-predictive control. We show derivations for the nonlinear algorithm in the discrete-time case, and apply this algorithm to a variety of nonlinear problems. Finally, we show the efficacy of the optimal-horizon model-predictive control scheme compared to a standard MPC controller, on an obstacle-avoidance problem with planar robots. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: Submitted to ICRA 2022

arXiv:2110.11291 [pdf, other]

Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory

Authors: Tianrong Chen, Guan-Horng Liu, Evangelos A. Theodorou

Abstract: Schrödinger Bridge (SB) is an entropy-regularized optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing log-likelih… ▽ More Schrödinger Bridge (SB) is an entropy-regularized optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing log-likelihood objectives.This raises questions on the suitability of SB models as a principled alternative for generative applications. In this work, we present a novel computational framework for likelihood training of SB models grounded on Forward-Backward Stochastic Differential Equations Theory - a mathematical methodology appeared in stochastic optimal control that transforms the optimality condition of SB into a set of SDEs. Crucially, these SDEs can be used to construct the likelihood objectives for SB that, surprisingly, generalizes the ones for SGM as special cases. This leads to a new optimization principle that inherits the same SB optimality yet without losing applications of modern generative training techniques, and we show that the resulting training algorithm achieves comparable results on generating realistic images on MNIST, CelebA, and CIFAR10. Our code is available at https://github.com/ghliu/SB-FBSDE. △ Less

Submitted 3 April, 2023; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: fix appendix net arh error

arXiv:2110.06451 [pdf, other]

Maximum Entropy Differential Dynamic Programming

Authors: Oswin So, Ziyi Wang, Evangelos A. Theodorou

Abstract: In this paper, we present a novel maximum entropy formulation of the Differential Dynamic Programming algorithm and derive two variants using unimodal and multimodal value functions parameterizations. By combining the maximum entropy Bellman equations with a particular approximation of the cost function, we are able to obtain a new formulation of Differential Dynamic Programming which is able to e… ▽ More In this paper, we present a novel maximum entropy formulation of the Differential Dynamic Programming algorithm and derive two variants using unimodal and multimodal value functions parameterizations. By combining the maximum entropy Bellman equations with a particular approximation of the cost function, we are able to obtain a new formulation of Differential Dynamic Programming which is able to escape from local minima via exploration with a multimodal policy. To demonstrate the efficacy of the proposed algorithm, we provide experimental results using four systems on tasks that are represented by cost functions with multiple local minima and compare them against vanilla Differential Dynamic Programming. Furthermore, we discuss connections with previous work on the linearly solvable stochastic control framework and its extensions in relation to compositionality. △ Less

Submitted 28 February, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: Accepted to ICRA 2022. Supplementary video available at https://youtu.be/NHr9Kj_jnAI

arXiv:2109.14158 [pdf, other]

Second-Order Neural ODE Optimizer

Authors: Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou

Abstract: We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient computation by solving a backward ODE, deriving efficient second-order methods becomes highly nontrivial. Nevertheless, inspired by the recent Optimal Control (OC… ▽ More We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient computation by solving a backward ODE, deriving efficient second-order methods becomes highly nontrivial. Nevertheless, inspired by the recent Optimal Control (OC) interpretation of training deep networks, we show that a specific continuous-time OC methodology, called Differential Programming, can be adopted to derive backward ODEs for higher-order derivatives at the same O(1) memory cost. We further explore a low-rank representation of the second-order derivatives and show that it leads to efficient preconditioned updates with the aid of Kronecker-based factorization. The resulting method -- named SNOpt -- converges much faster than first-order baselines in wall-clock time, and the improvement remains consistent across various applications, e.g. image classification, generative flow, and time-series prediction. Our framework also enables direct architecture optimization, such as the integration time of Neural ODEs, with second-order feedback policies, strengthening the OC perspective as a principled tool of analyzing optimization in deep learning. Our code is available at https://github.com/ghliu/snopt. △ Less

Submitted 5 November, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: Accepted to Advances in Neural Information Processing Systems (NeurIPS) 2021 as Spotlight

arXiv:2109.00183 [pdf, other]

Deep $\mathcal{L}^1$ Stochastic Optimal Control Policies for Planetary Soft-landing

Authors: Marcus A. Pereira, Camilo A. Duarte, Ioannis Exarchos, Evangelos A. Theodorou

Abstract: In this paper, we introduce a novel deep learning based solution to the Powered-Descent Guidance (PDG) problem, grounded in principles of nonlinear Stochastic Optimal Control (SOC) and Feynman-Kac theory. Our algorithm solves the PDG problem by framing it as an $\mathcal{L}^1$ SOC problem for minimum fuel consumption. Additionally, it can handle practically useful control constraints, nonlinear dy… ▽ More In this paper, we introduce a novel deep learning based solution to the Powered-Descent Guidance (PDG) problem, grounded in principles of nonlinear Stochastic Optimal Control (SOC) and Feynman-Kac theory. Our algorithm solves the PDG problem by framing it as an $\mathcal{L}^1$ SOC problem for minimum fuel consumption. Additionally, it can handle practically useful control constraints, nonlinear dynamics and enforces state constraints as soft-constraints. This is achieved by building off of recent work on deep Forward-Backward Stochastic Differential Equations (FBSDEs) and differentiable non-convex optimization neural-network layers based on stochastic search. In contrast to previous approaches, our algorithm does not require convexification of the constraints or linearization of the dynamics and is empirically shown to be robust to stochastic disturbances and the initial position of the spacecraft. After training offline, our controller can be activated once the spacecraft is within a pre-specified radius of the landing zone and at a pre-specified altitude i.e., the base of an inverted cone with the tip at the landing zone. We demonstrate empirically that our controller can successfully and safely land all trajectories initialized at the base of this cone while minimizing fuel consumption. △ Less

Submitted 1 September, 2021; originally announced September 2021.

arXiv:2107.11722 [pdf, other]

doi 10.1109/LRA.2021.3125047

Learning Risk-aware Costmaps for Traversability in Challenging Environments

Authors: David D. Fan, Sharmita Dey, Ali-akbar Agha-mohammadi, Evangelos A. Theodorou

Abstract: One of the main challenges in autonomous robotic exploration and navigation in unknown and unstructured environments is determining where the robot can or cannot safely move. A significant source of difficulty in this determination arises from stochasticity and uncertainty, coming from localization error, sensor sparsity and noise, difficult-to-model robot-ground interactions, and disturbances to… ▽ More One of the main challenges in autonomous robotic exploration and navigation in unknown and unstructured environments is determining where the robot can or cannot safely move. A significant source of difficulty in this determination arises from stochasticity and uncertainty, coming from localization error, sensor sparsity and noise, difficult-to-model robot-ground interactions, and disturbances to the motion of the vehicle. Classical approaches to this problem rely on geometric analysis of the surrounding terrain, which can be prone to modeling errors and can be computationally expensive. Moreover, modeling the distribution of uncertain traversability costs is a difficult task, compounded by the various error sources mentioned above. In this work, we take a principled learning approach to this problem. We introduce a neural network architecture for robustly learning the distribution of traversability costs. Because we are motivated by preserving the life of the robot, we tackle this learning problem from the perspective of learning tail-risks, i.e. the Conditional Value-at-Risk (CVaR). We show that this approach reliably learns the expected tail risk given a desired probability risk threshold between 0 and 1, producing a traversability costmap which is more robust to outliers, more accurately captures tail risks, and is more computationally efficient, when compared against baselines. We validate our method on data collected a legged robot navigating challenging, unstructured environments including an abandoned subway, limestone caves, and lava tube caves. △ Less

Submitted 4 September, 2022; v1 submitted 25 July, 2021; originally announced July 2021.

Comments: Published in RA-L with ICRA presentation option (IEEE International Conference on Robotics and Automation, 2022)

Journal ref: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 1, January 2022)

arXiv:2105.14608 [pdf, other]

doi 10.1109/LRA.2022.3143301

Safety Embedded Differential Dynamic Programming Using Discrete Barrier States

Authors: Hassan Almubarak, Kyle Stachowicz, Nader Sadegh, Evangelos A. Theodorou

Abstract: Certified safe control is a growing challenge in robotics, especially when performance and safety objectives must be concurrently achieved. In this work, we extend the barrier state (BaS) concept, recently proposed for safe stabilization of continuous time systems, to safety embedded trajectory optimization for discrete time systems using discrete barrier states (DBaS). The constructed DBaS is emb… ▽ More Certified safe control is a growing challenge in robotics, especially when performance and safety objectives must be concurrently achieved. In this work, we extend the barrier state (BaS) concept, recently proposed for safe stabilization of continuous time systems, to safety embedded trajectory optimization for discrete time systems using discrete barrier states (DBaS). The constructed DBaS is embedded into the discrete model of the safety-critical system integrating safety objectives into the system's dynamics and performance objectives. Thereby, the control policy is directly supplied by safety-critical information through the barrier state. This allows us to employ the DBaS with differential dynamic programming (DDP) to plan and execute safe optimal trajectories. The proposed algorithm is leveraged on various safety-critical control and planning problems including a differential wheeled robot safe navigation in randomized and complex environments and on a quadrotor to safely perform reaching and tracking tasks. The DBaS-based DDP (DBaS-DDP) is shown to consistently outperform penalty methods commonly used to approximate constrained DDP problems as well as CBF-based safety filters. △ Less

Submitted 2 February, 2022; v1 submitted 30 May, 2021; originally announced May 2021.

Comments: Added extensive quantitative comparisons and analysis in the implementation examples, and revised discussions and illustrations

Journal ref: IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 2, APRIL 2022

arXiv:2105.03788 [pdf, other]

Dynamic Game Theoretic Neural Optimizer

Authors: Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou

Abstract: The connection between training deep neural networks (DNNs) and optimal control theory (OCT) has attracted considerable attention as a principled tool of algorithmic design. Despite few attempts being made, they have been limited to architectures where the layer propagation resembles a Markovian dynamical system. This casts doubts on their flexibility to modern networks that heavily rely on non-Ma… ▽ More The connection between training deep neural networks (DNNs) and optimal control theory (OCT) has attracted considerable attention as a principled tool of algorithmic design. Despite few attempts being made, they have been limited to architectures where the layer propagation resembles a Markovian dynamical system. This casts doubts on their flexibility to modern networks that heavily rely on non-Markovian dependencies between layers (e.g. skip connections in residual networks). In this work, we propose a novel dynamic game perspective by viewing each layer as a player in a dynamic game characterized by the DNN itself. Through this lens, different classes of optimizers can be seen as matching different types of Nash equilibria, depending on the implicit information structure of each (p)layer. The resulting method, called Dynamic Game Theoretic Neural Optimizer (DGNOpt), not only generalizes OCT-inspired optimizers to richer network class; it also motivates a new training principle by solving a multi-player cooperative game. DGNOpt shows convergence improvements over existing methods on image classification datasets with residual and inception networks. Our work marries strengths from both OCT and game theory, paving ways to new algorithmic opportunities from robust optimal control and bandit-based optimization. △ Less

Submitted 11 June, 2021; v1 submitted 8 May, 2021; originally announced May 2021.

Comments: Accepted in International Conference on Machine Learning (ICML) 2021 as Oral

arXiv:2104.00241 [pdf, other]

Variational Inference MPC using Tsallis Divergence

Authors: Ziyi Wang, Oswin So, Jason Gibson, Bogdan Vlahov, Manan S. Gandhi, Guan-Horng Liu, Evangelos A. Theodorou

Abstract: In this paper, we provide a generalized framework for Variational Inference-Stochastic Optimal Control by using thenon-extensive Tsallis divergence. By incorporating the deformed exponential function into the optimality likelihood function, a novel Tsallis Variational Inference-Model Predictive Control algorithm is derived, which includes prior works such as Variational Inference-Model Predictive… ▽ More In this paper, we provide a generalized framework for Variational Inference-Stochastic Optimal Control by using thenon-extensive Tsallis divergence. By incorporating the deformed exponential function into the optimality likelihood function, a novel Tsallis Variational Inference-Model Predictive Control algorithm is derived, which includes prior works such as Variational Inference-Model Predictive Control, Model Predictive PathIntegral Control, Cross Entropy Method, and Stein VariationalInference Model Predictive Control as special cases. The proposed algorithm allows for effective control of the cost/reward transform and is characterized by superior performance in terms of mean and variance reduction of the associated cost. The aforementioned features are supported by a theoretical and numerical analysis on the level of risk sensitivity of the proposed algorithm as well as simulation experiments on 5 different robotic systems with 3 different policy parameterizations. △ Less

Submitted 1 April, 2021; originally announced April 2021.

arXiv:2102.09144 [pdf, other]

Stochastic Spatio-Temporal Optimization for Control and Co-Design of Systems in Robotics and Applied Physics

Authors: Ethan N. Evans, Andrew P. Kendall, Evangelos A. Theodorou

Abstract: Correlated with the trend of increasing degrees of freedom in robotic systems is a similar trend of rising interest in Spatio-Temporal systems described by Partial Differential Equations (PDEs) among the robotics and control communities. These systems often exhibit dramatic under-actuation, high dimensionality, bifurcations, and multimodal instabilities. Their control represents many of the curren… ▽ More Correlated with the trend of increasing degrees of freedom in robotic systems is a similar trend of rising interest in Spatio-Temporal systems described by Partial Differential Equations (PDEs) among the robotics and control communities. These systems often exhibit dramatic under-actuation, high dimensionality, bifurcations, and multimodal instabilities. Their control represents many of the current-day challenges facing the robotics and automation communities. Not only are these systems challenging to control, but the design of their actuation is an NP-hard problem on its own. Recent methods either discretize the space before optimization, or apply tools from linear systems theory under restrictive linearity assumptions in order to arrive at a control solution. This manuscript provides a novel sampling-based stochastic optimization framework based entirely in Hilbert spaces suitable for the general class of \textit{semi-linear} SPDEs which describes many systems in robotics and applied physics. This framework is utilized for simultaneous policy optimization and actuator co-design optimization. The resulting algorithm is based on variational optimization, and performs joint episodic optimization of the feedback control law and the actuation design over episodes. We study first and second order systems, and in doing so, extend several results to the case of second order SPDEs. Finally, we demonstrate the efficacy of the proposed approach with several simulated experiments on a variety of SPDEs in robotics and applied physics including an infinite degree-of-freedom soft robotic manipulator. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: 34 pages, 10 figures. Submitted to Autonomous Robots special issue of RSS 2020. arXiv admin note: text overlap with arXiv:2002.01397

arXiv:2102.09104 [pdf, other]

Distributed Algorithms for Linearly-Solvable Optimal Control in Networked Multi-Agent Systems

Authors: Neng Wan, Aditya Gahlawat, Naira Hovakimyan, Evangelos A. Theodorou, Petros G. Voulgaris

Abstract: Distributed algorithms for both discrete-time and continuous-time linearly solvable optimal control (LSOC) problems of networked multi-agent systems (MASs) are investigated in this paper. A distributed framework is proposed to partition the optimal control problem of a networked MAS into several local optimal control problems in factorial subsystems, such that each (central) agent behaves optimall… ▽ More Distributed algorithms for both discrete-time and continuous-time linearly solvable optimal control (LSOC) problems of networked multi-agent systems (MASs) are investigated in this paper. A distributed framework is proposed to partition the optimal control problem of a networked MAS into several local optimal control problems in factorial subsystems, such that each (central) agent behaves optimally to minimize the joint cost function of a subsystem that comprises a central agent and its neighboring agents, and the local control actions (policies) only rely on the knowledge of local observations. Under this framework, we not only preserve the correlations between neighboring agents, but moderate the communication and computational complexities by decentralizing the sampling and computational processes over the network. For discrete-time systems modeled by Markov decision processes, the joint Bellman equation of each subsystem is transformed into a system of linear equations and solved using parallel programming. For continuous-time systems modeled by Itô diffusion processes, the joint optimality equation of each subsystem is converted into a linear partial differential equation, whose solution is approximated by a path integral formulation and a sample-efficient relative entropy policy search algorithm, respectively. The learned control policies are generalized to solve the unlearned tasks by resorting to the compositionality principle, and illustrative examples of cooperative UAV teams are provided to verify the effectiveness and advantages of these algorithms. △ Less

Submitted 17 February, 2021; originally announced February 2021.

arXiv:2102.04714 [pdf, other]

Interrogating the Black Box: Transparency through Information-Seeking Dialogues

Authors: Andrea Aler Tubella, Andreas Theodorou, Juan Carlos Nieves

Abstract: This paper is preoccupied with the following question: given a (possibly opaque) learning system, how can we understand whether its behaviour adheres to governance constraints? The answer can be quite simple: we just need to "ask" the system about it. We propose to construct an investigator agent to query a learning agent -- the suspect agent -- to investigate its adherence to a given ethical poli… ▽ More This paper is preoccupied with the following question: given a (possibly opaque) learning system, how can we understand whether its behaviour adheres to governance constraints? The answer can be quite simple: we just need to "ask" the system about it. We propose to construct an investigator agent to query a learning agent -- the suspect agent -- to investigate its adherence to a given ethical policy in the context of an information-seeking dialogue, modeled in formal argumentation settings. This formal dialogue framework is the main contribution of this paper. Through it, we break down compliance checking mechanisms into three modular components, each of which can be tailored to various needs in a vast amount of ways: an investigator agent, a suspect agent, and an acceptance protocol determining whether the responses of the suspect agent comply with the policy. This acceptance protocol presents a fundamentally different approach to aggregation: rather than using quantitative methods to deal with the non-determinism of a learning system, we leverage the use of argumentation semantics to investigate the notion of properties holding consistently. Overall, we argue that the introduced formal dialogue framework opens many avenues both in the area of compliance checking and in the analysis of properties of opaque systems. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: Accepted at AAMAS 2021

arXiv:2011.10890 [pdf, other]

Large-Scale Multi-Agent Deep FBSDEs

Authors: Tianrong Chen, Ziyi Wang, Ioannis Exarchos, Evangelos A. Theodorou

Abstract: In this paper we present a scalable deep learning framework for finding Markovian Nash Equilibria in multi-agent stochastic games using fictitious play. The motivation is inspired by theoretical analysis of Forward Backward Stochastic Differential Equations (FBSDE) and their implementation in a deep learning setting, which is the source of our algorithm's sample efficiency improvement. By taking a… ▽ More In this paper we present a scalable deep learning framework for finding Markovian Nash Equilibria in multi-agent stochastic games using fictitious play. The motivation is inspired by theoretical analysis of Forward Backward Stochastic Differential Equations (FBSDE) and their implementation in a deep learning setting, which is the source of our algorithm's sample efficiency improvement. By taking advantage of the permutation-invariant property of agents in symmetric games, the scalability and performance is further enhanced significantly. We showcase superior performance of our framework over the state-of-the-art deep fictitious play algorithm on an inter-bank lending/borrowing problem in terms of multiple metrics. More importantly, our approach scales up to 3000 agents in simulation, a scale which, to the best of our knowledge, represents a new state-of-the-art. We also demonstrate the applicability of our framework in robotics on a belief space autonomous racing problem. △ Less

Submitted 21 May, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

arXiv:2009.14775 [pdf, other]

Cooperative Path Integral Control for Stochastic Multi-Agent Systems

Authors: Neng Wan, Aditya Gahlawat, Naira Hovakimyan, Evangelos A. Theodorou, Petros G. Voulgaris

Abstract: A distributed stochastic optimal control solution is presented for cooperative multi-agent systems. The network of agents is partitioned into multiple factorial subsystems, each of which consists of a central agent and neighboring agents. Local control actions that rely only on agents' local observations are designed to optimize the joint cost functions of subsystems. When solving for the local co… ▽ More A distributed stochastic optimal control solution is presented for cooperative multi-agent systems. The network of agents is partitioned into multiple factorial subsystems, each of which consists of a central agent and neighboring agents. Local control actions that rely only on agents' local observations are designed to optimize the joint cost functions of subsystems. When solving for the local control actions, the joint optimality equation for each subsystem is cast as a linear partial differential equation and solved using the Feynman-Kac formula. The solution and the optimal control action are then formulated as path integrals and approximated by a Monte-Carlo method. Numerical verification is provided through a simulation example consisting of a team of cooperative UAVs. △ Less

Submitted 20 March, 2021; v1 submitted 30 September, 2020; originally announced September 2020.

Comments: To appear in American Control Conference 2021, New Orleans, LA, USA

arXiv:2009.13609 [pdf, other]

Compositionality of Linearly Solvable Optimal Control in Networked Multi-Agent Systems

Authors: Lin Song, Neng Wan, Aditya Gahlawat, Naira Hovakimyan, Evangelos A. Theodorou

Abstract: In this paper, we discuss the methodology of generalizing the optimal control law from learned component tasks to unlearned composite tasks on Multi-Agent Systems (MASs), by using the linearity composition principle of linearly solvable optimal control (LSOC) problems. The proposed approach achieves both the compositionality and optimality of control actions simultaneously within the cooperative M… ▽ More In this paper, we discuss the methodology of generalizing the optimal control law from learned component tasks to unlearned composite tasks on Multi-Agent Systems (MASs), by using the linearity composition principle of linearly solvable optimal control (LSOC) problems. The proposed approach achieves both the compositionality and optimality of control actions simultaneously within the cooperative MAS framework in both discrete- and continuous-time in a sample-efficient manner, which reduces the burden of re-computation of the optimal control solutions for the new task on the MASs. We investigate the application of the proposed approach on the MAS with coordination between agents. The experiments show feasible results in investigated scenarios, including both discrete and continuous dynamical systems for task generalization without resampling. △ Less

Submitted 22 March, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: Accepted to the 2021 American Control Conference (ACC)

arXiv:2009.01196 [pdf, other]

Safe Optimal Control Using Stochastic Barrier Functions and Deep Forward-Backward SDEs

Authors: Marcus Aloysius Pereira, Ziyi Wang, Ioannis Exarchos, Evangelos A. Theodorou

Abstract: This paper introduces a new formulation for stochastic optimal control and stochastic dynamic optimization that ensures safety with respect to state and control constraints. The proposed methodology brings together concepts such as Forward-Backward Stochastic Differential Equations, Stochastic Barrier Functions, Differentiable Convex Optimization and Deep Learning. Using the aforementioned concept… ▽ More This paper introduces a new formulation for stochastic optimal control and stochastic dynamic optimization that ensures safety with respect to state and control constraints. The proposed methodology brings together concepts such as Forward-Backward Stochastic Differential Equations, Stochastic Barrier Functions, Differentiable Convex Optimization and Deep Learning. Using the aforementioned concepts, a Neural Network architecture is designed for safe trajectory optimization in which learning can be performed in an end-to-end fashion. Simulations are performed on three systems to show the efficacy of the proposed methodology. △ Less

Submitted 2 September, 2020; originally announced September 2020.

Journal ref: Conference on Robot Learning 2020

arXiv:2009.01090 [pdf, other]

Adaptive Risk Sensitive Model Predictive Control with Stochastic Search

Authors: Ziyi Wang, Oswin So, Keuntaek Lee, Camilo A. Duarte, Evangelos A. Theodorou

Abstract: We present a general framework for optimizing the Conditional Value-at-Risk for dynamical systems using stochastic search. The framework is capable of handling the uncertainty from the initial condition, stochastic dynamics, and uncertain parameters in the model. The algorithm is compared against a risk-sensitive distributional reinforcement learning framework and demonstrates outperformance on a… ▽ More We present a general framework for optimizing the Conditional Value-at-Risk for dynamical systems using stochastic search. The framework is capable of handling the uncertainty from the initial condition, stochastic dynamics, and uncertain parameters in the model. The algorithm is compared against a risk-sensitive distributional reinforcement learning framework and demonstrates outperformance on a pendulum and cartpole with stochastic dynamics. We also showcase the applicability of the framework to robotics as an adaptive risk-sensitive controller by optimizing with respect to the fully nonlinear belief provided by a particle filter on a pendulum, cartpole, and quadcopter in simulation. △ Less

Submitted 12 February, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

Showing 1–50 of 75 results for author: Theodorou, A