-
Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management
Authors:
M. Saifullah,
K. G. Papakonstantinou,
C. P. Andriotis,
S. M. Stoffels
Abstract:
We present a multi-agent Deep Reinforcement Learning (DRL) framework for managing large transportation infrastructure systems over their life-cycle. Life-cycle management of such engineering systems is a computationally intensive task, requiring appropriate sequential inspection and maintenance decisions able to reduce long-term risks and costs, while dealing with different uncertainties and const…
▽ More
We present a multi-agent Deep Reinforcement Learning (DRL) framework for managing large transportation infrastructure systems over their life-cycle. Life-cycle management of such engineering systems is a computationally intensive task, requiring appropriate sequential inspection and maintenance decisions able to reduce long-term risks and costs, while dealing with different uncertainties and constraints that lie in high-dimensional spaces. To date, static age- or condition-based maintenance methods and risk-based or periodic inspection plans have mostly addressed this class of optimization problems. However, optimality, scalability, and uncertainty limitations are often manifested under such approaches. The optimization problem in this work is cast in the framework of constrained Partially Observable Markov Decision Processes (POMDPs), which provides a comprehensive mathematical basis for stochastic sequential decision settings with observation uncertainties, risk considerations, and limited resources. To address significantly large state and action spaces, a Deep Decentralized Multi-agent Actor-Critic (DDMAC) DRL method with Centralized Training and Decentralized Execution (CTDE), termed as DDMAC-CTDE is developed. The performance strengths of the DDMAC-CTDE method are demonstrated in a generally representative and realistic example application of an existing transportation network in Virginia, USA. The network includes several bridge and pavement components with nonstationary degradation, agency-imposed constraints, and traffic delay and risk considerations. Compared to traditional management policies for transportation networks, the proposed DDMAC-CTDE method vastly outperforms its counterparts. Overall, the proposed algorithmic framework provides near optimal solutions for transportation infrastructure management under real-world constraints and complexities.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning
Authors:
Pablo G. Morato,
Charalampos P. Andriotis,
Konstantinos G. Papakonstantinou,
Philippe Rigo
Abstract:
In the context of modern environmental and societal concerns, there is an increasing demand for methods able to identify management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level due to the computational complexity assoc…
▽ More
In the context of modern environmental and societal concerns, there is an increasing demand for methods able to identify management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Optimal Inspection and Maintenance Planning for Deteriorating Structural Components through Dynamic Bayesian Networks and Markov Decision Processes
Authors:
P. G. Morato,
K. G. Papakonstantinou,
C. P. Andriotis,
J. S. Nielsen,
P. Rigo
Abstract:
Civil and maritime engineering systems, among others, from bridges to offshore platforms and wind turbines, must be efficiently managed as they are exposed to deterioration mechanisms throughout their operational life, such as fatigue or corrosion. Identifying optimal inspection and maintenance policies demands the solution of a complex sequential decision-making problem under uncertainty, with th…
▽ More
Civil and maritime engineering systems, among others, from bridges to offshore platforms and wind turbines, must be efficiently managed as they are exposed to deterioration mechanisms throughout their operational life, such as fatigue or corrosion. Identifying optimal inspection and maintenance policies demands the solution of a complex sequential decision-making problem under uncertainty, with the main objective of efficiently controlling the risk associated with structural failures. Addressing this complexity, risk-based inspection planning methodologies, supported often by dynamic Bayesian networks, evaluate a set of pre-defined heuristic decision rules to reasonably simplify the decision problem. However, the resulting policies may be compromised by the limited space considered in the definition of the decision rules. Avoiding this limitation, Partially Observable Markov Decision Processes (POMDPs) provide a principled mathematical methodology for stochastic optimal control under uncertain action outcomes and observations, in which the optimal actions are prescribed as a function of the entire, dynamically updated, state probability distribution. In this paper, we combine dynamic Bayesian networks with POMDPs in a joint framework for optimal inspection and maintenance planning, and we provide the formulation for developing both infinite and finite horizon POMDPs in a structural reliability context. The proposed methodology is implemented and tested for the case of a structural component subject to fatigue deterioration, demonstrating the capability of state-of-the-art point-based POMDP solvers for solving the underlying planning optimization problem. Within the numerical experiments, POMDP and heuristic-based policies are thoroughly compared, and results showcase that POMDPs achieve substantially lower costs as compared to their counterparts, even for traditional problem settings.
△ Less
Submitted 28 November, 2021; v1 submitted 9 September, 2020;
originally announced September 2020.
-
Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints
Authors:
C. P. Andriotis,
K. G. Papakonstantinou
Abstract:
Determination of inspection and maintenance policies for minimizing long-term risks and costs in deteriorating engineering environments constitutes a complex optimization problem. Major computational challenges include the (i) curse of dimensionality, due to exponential scaling of state/action set cardinalities with the number of components; (ii) curse of history, related to exponentially growing…
▽ More
Determination of inspection and maintenance policies for minimizing long-term risks and costs in deteriorating engineering environments constitutes a complex optimization problem. Major computational challenges include the (i) curse of dimensionality, due to exponential scaling of state/action set cardinalities with the number of components; (ii) curse of history, related to exponentially growing decision-trees with the number of decision-steps; (iii) presence of state uncertainties, induced by inherent environment stochasticity and variability of inspection/monitoring measurements; (iv) presence of constraints, pertaining to stochastic long-term limitations, due to resource scarcity and other infeasible/undesirable system responses. In this work, these challenges are addressed within a joint framework of constrained Partially Observable Markov Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL). POMDPs optimally tackle (ii)-(iii), combining stochastic dynamic programming with Bayesian inference principles. Multi-agent DRL addresses (i), through deep function parametrizations and decentralized control assumptions. Challenge (iv) is herein handled through proper state augmentation and Lagrangian relaxation, with emphasis on life-cycle risk-based constraints and budget limitations. The underlying algorithmic steps are provided, and the proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions, in cases where decisions must be made in the most resource- and risk-aware manner.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
Value of structural health information in partially observable stochastic environments
Authors:
C. P. Andriotis,
K. G. Papakonstantinou,
E. N. Chatzi
Abstract:
Efficient integration of uncertain observations with decision-making optimization is key for prescribing informed intervention actions, able to preserve structural safety of deteriorating engineering systems. To this end, it is necessary that scheduling of inspection and monitoring strategies be objectively performed on the basis of their expected value-based gains that, among others, reflect quan…
▽ More
Efficient integration of uncertain observations with decision-making optimization is key for prescribing informed intervention actions, able to preserve structural safety of deteriorating engineering systems. To this end, it is necessary that scheduling of inspection and monitoring strategies be objectively performed on the basis of their expected value-based gains that, among others, reflect quantitative metrics such as the Value of Information (VoI) and the Value of Structural Health Monitoring (VoSHM). In this work, we introduce and study the theoretical and computational foundations of the above metrics within the context of Partially Observable Markov Decision Processes (POMDPs), thus alluding to a broad class of decision-making problems of partially observable stochastic deteriorating environments that can be modeled as POMDPs. Step-wise and life-cycle VoI and VoSHM definitions are devised and their bounds are analyzed as per the properties stemming from the Bellman equation and the resulting optimal value function. It is shown that a POMDP policy inherently leverages the notion of VoI to guide observational actions in an optimal way at every decision step, and that the permanent or intermittent information provided by SHM or inspection visits, respectively, can only improve the cost of this policy in the long-term, something that is not necessarily true under locally optimal policies, typically adopted in decision-making of structures and infrastructure. POMDP solutions are derived based on point-based value iteration methods, and the various definitions are quantified in stationary and non-stationary deteriorating environments, with both infinite and finite planning horizons, featuring single- or multi-component engineering systems.
△ Less
Submitted 20 July, 2020; v1 submitted 28 December, 2019;
originally announced December 2019.
-
Managing engineering systems with large state and action spaces through deep reinforcement learning
Authors:
C. P. Andriotis,
K. G. Papakonstantinou
Abstract:
Decision-making for engineering systems can be efficiently formulated as a Markov Decision Process (MDP) or a Partially Observable MDP (POMDP). Typical MDP and POMDP solution procedures utilize offline knowledge about the environment and provide detailed policies for relatively small systems with tractable state and action spaces. However, in large multi-component systems the sizes of these spaces…
▽ More
Decision-making for engineering systems can be efficiently formulated as a Markov Decision Process (MDP) or a Partially Observable MDP (POMDP). Typical MDP and POMDP solution procedures utilize offline knowledge about the environment and provide detailed policies for relatively small systems with tractable state and action spaces. However, in large multi-component systems the sizes of these spaces easily explode, as system states and actions scale exponentially with the number of components, whereas environment dynamics are difficult to be described in explicit forms for the entire system and may only be accessible through numerical simulators. In this work, to address these issues, an integrated Deep Reinforcement Learning (DRL) framework is introduced. The Deep Centralized Multi-agent Actor Critic (DCMAC) is developed, an off-policy actor-critic DRL approach, providing efficient life-cycle policies for large multi-component systems operating in high-dimensional spaces. Apart from deep function approximations that parametrize large state spaces, DCMAC also adopts a factorized representation of the system actions, being able to designate individualized component- and subsystem-level decisions, while maintaining a centralized value function for the entire system. DCMAC compares well against Deep Q-Network (DQN) solutions and exact policies, where applicable, and outperforms optimized baselines that are based on time-based, condition-based and periodic policies.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.