Search | arXiv e-print repository

arXiv:2301.01997 [pdf, ps, other]

Data-Driven Inverse Reinforcement Learning for Expert-Learner Zero-Sum Games

Authors: Wenqian Xue, Bosen Lian, Jialu Fan, Tianyou Chai, Frank L. Lewis

Abstract: In this paper, we formulate inverse reinforcement learning (IRL) as an expert-learner interaction whereby the optimal performance intent of an expert or target agent is unknown to a learner agent. The learner observes the states and controls of the expert and hence seeks to reconstruct the expert's cost function intent and thus mimics the expert's optimal response. Next, we add non-cooperative dis… ▽ More In this paper, we formulate inverse reinforcement learning (IRL) as an expert-learner interaction whereby the optimal performance intent of an expert or target agent is unknown to a learner agent. The learner observes the states and controls of the expert and hence seeks to reconstruct the expert's cost function intent and thus mimics the expert's optimal response. Next, we add non-cooperative disturbances that seek to disrupt the learning and stability of the learner agent. This leads to the formulation of a new interaction we call zero-sum game IRL. We develop a framework to solve the zero-sum game IRL problem that is a modified extension of RL policy iteration (PI) to allow unknown expert performance intentions to be computed and non-cooperative disturbances to be rejected. The framework has two parts: a value function and control action update based on an extension of PI, and a cost function update based on standard inverse optimal control. Then, we eventually develop an off-policy IRL algorithm that does not require knowledge of the expert and learner agent dynamics and performs single-loop learning. Rigorous proofs and analyses are given. Finally, simulation experiments are presented to show the effectiveness of the new approach. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: 9 pages, 3 figures

arXiv:2112.14676 [pdf, other]

Learning nonlinear dynamics in synchronization of knowledge-based leader-following networks

Authors: Shimin Wang, Xiangyu Meng, Hongwei Zhang, Frank L. Lewis

Abstract: Knowledge-based leader-following synchronization of heterogeneous nonlinear multi-agent systems is a challenging problem since the leader's dynamic information is unknown to any follower node. This paper proposes a learning-based fully distributed observer for a class of nonlinear leader systems, which can simultaneously learn the leader's dynamics and states. This class of leader dynamics is rath… ▽ More Knowledge-based leader-following synchronization of heterogeneous nonlinear multi-agent systems is a challenging problem since the leader's dynamic information is unknown to any follower node. This paper proposes a learning-based fully distributed observer for a class of nonlinear leader systems, which can simultaneously learn the leader's dynamics and states. This class of leader dynamics is rather general and does not require a bounded Jacobian matrix. Based on this learning-based distributed observer, we further synthesize an adaptive distributed control law for solving the leader-following synchronization problem of multiple Euler-Lagrange systems subject to an uncertain nonlinear leader system. The results are illustrated by a simulation example. △ Less

Submitted 18 July, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

arXiv:2101.00202

Sequential Convex Programming for Collaboration of Connected and Automated Vehicles

Authors: Xiaoxue Zhang, Jun Ma, Zilong Cheng, Frank L. Lewis, Tong Heng Lee

Abstract: This paper investigates the collaboration of multiple connected and automated vehicles (CAVs) in different scenarios. In general, the collaboration of CAVs can be formulated as a nonlinear and nonconvex model predictive control (MPC) problem. Most of the existing approaches available for utilization to solve such an optimization problem suffer from the drawback of considerable computational burden… ▽ More This paper investigates the collaboration of multiple connected and automated vehicles (CAVs) in different scenarios. In general, the collaboration of CAVs can be formulated as a nonlinear and nonconvex model predictive control (MPC) problem. Most of the existing approaches available for utilization to solve such an optimization problem suffer from the drawback of considerable computational burden, which hinders the practical implementation in real time. This paper proposes the use of sequential convex programming (SCP), which is a powerful approach to solving the nonlinear and nonconvex MPC problem in real time. To appropriately deploy the methodology, as a first stage, SCP requires linearization and discretization when addressing the nonlinear dynamics of the system model adequately. Based on the linearization and discretization, the original MPC problem can be transformed into a quadratically constrained quadratic programming (QCQP) problem. Besides, SCP also involves convexification to handle the associated nonconvex constraints. Thus, the nonconvex QCQP can be reduced to a quadratic programming (QP) problem that can be solved rather quickly. Therefore, the computational efficiency is suitably improved despite the existence of nonlinear and nonconvex characteristics, whereby the implementation is realized in real time. Furthermore, simulation results in three different scenarios of autonomous driving are presented to validate the effectiveness and efficiency of our proposed approach. △ Less

Submitted 24 July, 2022; v1 submitted 1 January, 2021; originally announced January 2021.

Comments: With internal discussions and upon agreement from all co-authors, we would like to withdraw this preprint

arXiv:2101.00201 [pdf, other]

Semi-Definite Relaxation Based ADMM for Cooperative Planning and Control of Connected Autonomous Vehicles

Authors: Xiaoxue Zhang, Zilong Cheng, Jun Ma, Sunan Huang, Frank L. Lewis, Tong Heng Lee

Abstract: This paper investigates the cooperative planning and control problem for multiple connected autonomous vehicles (CAVs) in different scenarios. In the existing literature, most of the methods suffer from significant problems in computational efficiency. Besides, as the optimization problem is nonlinear and nonconvex, it typically poses great difficultly in determining the optimal solution. To addre… ▽ More This paper investigates the cooperative planning and control problem for multiple connected autonomous vehicles (CAVs) in different scenarios. In the existing literature, most of the methods suffer from significant problems in computational efficiency. Besides, as the optimization problem is nonlinear and nonconvex, it typically poses great difficultly in determining the optimal solution. To address this issue, this work proposes a novel and completely parallel computation framework by leveraging the alternating direction method of multipliers (ADMM). The nonlinear and nonconvex optimization problem in the autonomous driving problem can be divided into two manageable subproblems; and the resulting subproblems can be solved by using effective optimization methods in a parallel framework. Here, the differential dynamic programming (DDP) algorithm is capable of addressing the nonlinearity of the system dynamics rather effectively; and the nonconvex coupling constraints with small dimensions can be approximated by invoking the notion of semi-definite relaxation (SDR), which can also be solved in a very short time. Due to the parallel computation and efficient relaxation of nonconvex constraints, our proposed approach effectively realizes real-time implementation and thus also extra assurance of driving safety is provided. In addition, two transportation scenarios for multiple CAVs are used to illustrate the effectiveness and efficiency of the proposed method. △ Less

Submitted 1 January, 2021; originally announced January 2021.

Comments: 11 pages, 8 figures

arXiv:2001.08092 [pdf, other]

Local Policy Optimization for Trajectory-Centric Reinforcement Learning

Authors: Patrik Kolaric, Devesh K. Jha, Arvind U. Raghunathan, Frank L. Lewis, Mouhacine Benosman, Diego Romeres, Daniel Nikovski

Abstract: The goal of this paper is to present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning (MBRL). This is motivated by the fact that global policy optimization for non-linear systems could be a very challenging problem both algorithmically and numerically. However, a lot of robotic manipu… ▽ More The goal of this paper is to present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning (MBRL). This is motivated by the fact that global policy optimization for non-linear systems could be a very challenging problem both algorithmically and numerically. However, a lot of robotic manipulation tasks are trajectory-centric, and thus do not require a global model or policy. Due to inaccuracies in the learned model estimates, an open-loop trajectory optimization process mostly results in very poor performance when used on the real system. Motivated by these problems, we try to formulate the problem of trajectory optimization and local policy synthesis as a single optimization problem. It is then solved simultaneously as an instance of nonlinear programming. We provide some results for analysis as well as achieved performance of the proposed technique under some simplifying assumptions. △ Less

Submitted 22 January, 2020; originally announced January 2020.

Journal ref: ICRA 2020

arXiv:1810.11548

On the Identifiability of the Influence Model for Stochastic Spatiotemporal Spread Processes

Authors: Chenyuan He, Yan Wan, Frank L. Lewis

Abstract: The influence model is a discrete-time stochastic model that succinctly captures the interactions of a network of Markov chains. The model produces a reduced-order representation of the stochastic network, and can be used to describe and tractably analyze probabilistic spatiotemporal spread dynamics, and hence has found broad usage in network applications such as social networks, traffic managemen… ▽ More The influence model is a discrete-time stochastic model that succinctly captures the interactions of a network of Markov chains. The model produces a reduced-order representation of the stochastic network, and can be used to describe and tractably analyze probabilistic spatiotemporal spread dynamics, and hence has found broad usage in network applications such as social networks, traffic management, and failure cascades in power systems. This paper provides sufficient and necessary conditions for the identifiability of the influence model, and also develops estimators for the model structure through exploiting the model's special properties. In addition, we analyze conditions for the identifiability of the partially observed influence model (POIM), for which not all of the sites can be measured. △ Less

Submitted 6 November, 2018; v1 submitted 26 October, 2018; originally announced October 2018.

Comments: This temporary draft version of this paper has caused conflict of interest and we request to withdraw this paper from arXiv

Showing 1–6 of 6 results for author: Lewis, F L