Search | arXiv e-print repository

Personalized Dynamic Pricing Policy for Electric Vehicles: Reinforcement learning approach

Authors: Sangjun Bae, Balazs Kulcsar, Sebastien Gros

Abstract: With the increasing number of fast-electric vehicle charging stations (fast-EVCSs) and the popularization of information technology, electricity price competition between fast-EVCSs is highly expected, in which the utilization of public and/or privacy-preserved information will play a crucial role. Self-interest electric vehicle (EV) users, on the other hand, try to select a fast-EVCS for charging… ▽ More With the increasing number of fast-electric vehicle charging stations (fast-EVCSs) and the popularization of information technology, electricity price competition between fast-EVCSs is highly expected, in which the utilization of public and/or privacy-preserved information will play a crucial role. Self-interest electric vehicle (EV) users, on the other hand, try to select a fast-EVCS for charging in a way to maximize their utilities based on electricity price, estimated waiting time, and their state of charge. While existing studies have largely focused on finding equilibrium prices, this study proposes a personalized dynamic pricing policy (PeDP) for a fast-EVCS to maximize revenue using a reinforcement learning (RL) approach. We first propose a multiple fast-EVCSs competing simulation environment to model the selfish behavior of EV users using a game-based charging station selection model with a monetary utility function. In the environment, we propose a Q-learning-based PeDP to maximize fast-EVCS' revenue. Through numerical simulations based on the environment: (1) we identify the importance of waiting time in the EV charging market by comparing the classic Bertrand competition model with the proposed PeDP for fast-EVCSs (from the system perspective); (2) we evaluate the performance of the proposed PeDP and analyze the effects of the information on the policy (from the service provider perspective); and (3) it can be seen that privacy-preserved information sharing can be misused by artificial intelligence-based PeDP in a certain situation in the EV charging market (from the customer perspective). △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2302.12667 [pdf, other]

Deep active learning for nonlinear system identification

Authors: Erlend Torje Berg Lundby, Adil Rasheed, Ivar Johan Halvorsen, Dirk Reinhardt, Sebastien Gros, Jan Tommy Gravdahl

Abstract: The exploding research interest for neural networks in modeling nonlinear dynamical systems is largely explained by the networks' capacity to model complex input-output relations directly from data. However, they typically need vast training data before they can be put to any good use. The data generation process for dynamical systems can be an expensive endeavor both in terms of time and resource… ▽ More The exploding research interest for neural networks in modeling nonlinear dynamical systems is largely explained by the networks' capacity to model complex input-output relations directly from data. However, they typically need vast training data before they can be put to any good use. The data generation process for dynamical systems can be an expensive endeavor both in terms of time and resources. Active learning addresses this shortcoming by acquiring the most informative data, thereby reducing the need to collect enormous datasets. What makes the current work unique is integrating the deep active learning framework into nonlinear system identification. We formulate a general static deep active learning acquisition problem for nonlinear system identification. This is enabled by exploring system dynamics locally in different regions of the input space to obtain a simulated dataset covering the broader input space. This simulated dataset can be used in a static deep active learning acquisition scheme referred to as global explorations. The global exploration acquires a batch of initial states corresponding to the most informative state-action trajectories according to a batch acquisition function. The local exploration solves an optimal control problem, finding the control trajectory that maximizes some measure of information. After a batch of informative initial states is acquired, a new round of local explorations from the initial states in the batch is conducted to obtain a set of corresponding control trajectories that are to be applied on the system dynamics to get data from the system. Information measures used in the acquisition scheme are derived from the predictive variance of an ensemble of neural networks. The novel method outperforms standard data acquisition methods used for system identification of nonlinear dynamical systems in the case study performed on simulated data. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2301.01667 [pdf, other]

Learning-based MPC from Big Data Using Reinforcement Learning

Authors: Shambhuraj Sawant, Akhil S Anand, Dirk Reinhardt, Sebastien Gros

Abstract: This paper presents an approach for learning Model Predictive Control (MPC) schemes directly from data using Reinforcement Learning (RL) methods. The state-of-the-art learning methods use RL to improve the performance of parameterized MPC schemes. However, these learning algorithms are often gradient-based methods that require frequent evaluations of computationally expensive MPC schemes, thereby… ▽ More This paper presents an approach for learning Model Predictive Control (MPC) schemes directly from data using Reinforcement Learning (RL) methods. The state-of-the-art learning methods use RL to improve the performance of parameterized MPC schemes. However, these learning algorithms are often gradient-based methods that require frequent evaluations of computationally expensive MPC schemes, thereby restricting their use on big datasets. We propose to tackle this issue by using tools from RL to learn a parameterized MPC scheme directly from data in an offline fashion. Our approach derives an MPC scheme without having to solve it over the collected dataset, thereby eliminating the computational complexity of existing techniques for big data. We evaluate the proposed method on three simulated experiments of varying complexity. △ Less

Submitted 4 January, 2023; originally announced January 2023.

arXiv:2212.03645 [pdf, ps, other]

doi 10.23919/MIPRO55190.2022.9803570

Systematic review of automatic translation of high-level security policy into firewall rules

Authors: Ivan Kovačević, Bruno Štengl, Stjepan Groš

Abstract: Firewalls are security devices that perform network traffic filtering. They are ubiquitous in the industry and are a common method used to enforce organizational security policy. Security policy is specified on a high level of abstraction, with statements such as "web browsing is allowed only on workstations inside the office network", and needs to be translated into low-level firewall rules to be… ▽ More Firewalls are security devices that perform network traffic filtering. They are ubiquitous in the industry and are a common method used to enforce organizational security policy. Security policy is specified on a high level of abstraction, with statements such as "web browsing is allowed only on workstations inside the office network", and needs to be translated into low-level firewall rules to be enforceable. There has been a lot of work regarding optimization, analysis and platform independence of firewall rules, but an area that has seen much less success is automatic translation of high-level security policies into firewall rules. In addition to improving rules' readability, such translation would make it easier to detect errors.This paper surveys of over twenty papers that aim to generate firewall rules according to a security policy specified on a higher level of abstraction. It also presents an overview of similar features in modern firewall systems. Most approaches define specialized domain languages that get compiled into firewall rule sets, with some of them relying on formal specification, ontology, or graphical models. The approaches' have improved over time, but there are still many drawbacks that need to be solved before wider application. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 6 pages, 1 figure; Published in the 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO)

arXiv:2205.08856 [pdf, other]

Bridging the gap between QP-based and MPC-based RL

Authors: Shambhuraj Sawant, Sebastien Gros

Abstract: Reinforcement learning methods typically use Deep Neural Networks to approximate the value functions and policies underlying a Markov Decision Process. Unfortunately, DNN-based RL suffers from a lack of explainability of the resulting policy. In this paper, we instead approximate the policy and value functions using an optimization problem, taking the form of Quadratic Programs (QPs). We propose s… ▽ More Reinforcement learning methods typically use Deep Neural Networks to approximate the value functions and policies underlying a Markov Decision Process. Unfortunately, DNN-based RL suffers from a lack of explainability of the resulting policy. In this paper, we instead approximate the policy and value functions using an optimization problem, taking the form of Quadratic Programs (QPs). We propose simple tools to promote structures in the QP, pushing it to resemble a linear MPC scheme. A generic unstructured QP offers high flexibility for learning, while a QP having the structure of an MPC scheme promotes the explainability of the resulting policy, additionally provides ways for its analysis. The tools we propose allow for continuously adjusting the trade-off between the former and the latter during learning. We illustrate the workings of our proposed method with the resulting structure using a point-mass task. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2204.12420 [pdf, other]

doi 10.1109/TTE.2022.3226683

Interpretable Battery Cycle Life Range Prediction Using Early Degradation Data at Cell Level

Authors: Huang Zhang, Yang Su, Faisal Altaf, Torsten Wik, Sebastien Gros

Abstract: Battery cycle life prediction using early degradation data has many potential applications throughout the battery product life cycle. For that reason, various data-driven methods have been proposed for point prediction of battery cycle life with minimum knowledge of the battery degradation mechanisms. However, managing the rapidly increasing amounts of batteries at end-of-life with lower economic… ▽ More Battery cycle life prediction using early degradation data has many potential applications throughout the battery product life cycle. For that reason, various data-driven methods have been proposed for point prediction of battery cycle life with minimum knowledge of the battery degradation mechanisms. However, managing the rapidly increasing amounts of batteries at end-of-life with lower economic and technical risk requires prediction of cycle life with quantified uncertainty, which is still lacking. The interpretability (i.e., the reason for high prediction accuracy) of these advanced data-driven methods is also worthy of investigation. Here, a Quantile Regression Forest (QRF) model, having the advantage of not assuming any specific distribution of cycle life, is introduced to make cycle life range prediction with uncertainty quantified as the width of the prediction interval, in addition to point predictions with high accuracy. The hyperparameters of the QRF model are optimized with a proposed alpha-logistic-weighted criterion so that the coverage probabilities associated with the prediction intervals are calibrated. The interpretability of the final QRF model is explored with two global model-agnostic methods, namely permutation importance and partial dependence plot. △ Less

Submitted 23 April, 2023; v1 submitted 26 April, 2022; originally announced April 2022.

arXiv:2203.13854 [pdf, other]

Quasi-Newton Iteration in Deterministic Policy Gradient

Authors: Arash Bahari Kordabad, Hossein Nejatbakhsh Esfahani, Wenqi Cai, Sebastien Gros

Abstract: This paper presents a model-free approximation for the Hessian of the performance of deterministic policies to use in the context of Reinforcement Learning based on Quasi-Newton steps in the policy parameters. We show that the approximate Hessian converges to the exact Hessian at the optimal policy, and allows for a superlinear convergence in the learning, provided that the policy parametrization… ▽ More This paper presents a model-free approximation for the Hessian of the performance of deterministic policies to use in the context of Reinforcement Learning based on Quasi-Newton steps in the policy parameters. We show that the approximate Hessian converges to the exact Hessian at the optimal policy, and allows for a superlinear convergence in the learning, provided that the policy parametrization is rich. The natural policy gradient method can be interpreted as a particular case of the proposed method. We analytically verify the formulation in a simple linear case and compare the convergence of the proposed method with the natural policy gradient in a nonlinear example. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: This paper has been accepted to 2022 American Control Conference (ACC). 6 pages

arXiv:2111.04146 [pdf, other]

Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

Authors: Eivind Bøhn, Sebastien Gros, Signe Moe, Tor Arne Johansen

Abstract: Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC param… ▽ More Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC parameters must be tuned, which is largely a trial-and-error process that affects the control performance, the robustness and the computational complexity of the controller to a high degree. In this paper, we propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning(RL), with the goal of simultaneously optimizing the control performance and the power usage of the control algorithm. We propose the novel idea of optimizing the meta-parameters of MPCwith RL, i.e. parameters affecting the structure of the MPCproblem as opposed to the solution to a given problem. Our control algorithm is based on an event-triggered MPC where we learn when the MPC should be re-computed, and a dual mode MPC and linear state feedback control law applied in between MPC computations. We formulate a novel mixture-distribution policy and show that with joint optimization we achieve improvements that do not present themselves when optimizing the same parameters in isolation. We demonstrate our framework on the inverted pendulum control task, reducing the total computation time of the control system by 36% while also improving the control performance by 18.4% over the best-performing MPC baseline. △ Less

Submitted 7 November, 2021; originally announced November 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2107.11102 [pdf, other]

doi 10.1109/ACCESS.2022.3147312

Automatically generating models of IT systems

Authors: Ivan Kovačević, Stjepan Groš, Ante Đerek

Abstract: Information technology system (ITS), informally, consists of hardware and software infrastructure (e.g., workstations, servers, laptops, installed software packages, databases, LANs, firewalls, etc.), along with physical and logical connections and inter-dependencies between various items. Nowadays, every company owns and operates an ITS, but detailed information about the system is rarely publicl… ▽ More Information technology system (ITS), informally, consists of hardware and software infrastructure (e.g., workstations, servers, laptops, installed software packages, databases, LANs, firewalls, etc.), along with physical and logical connections and inter-dependencies between various items. Nowadays, every company owns and operates an ITS, but detailed information about the system is rarely publicly available. However, there are many situations where the availability of such data would be beneficial. For example, cyber ranges need descriptions of complex realistic IT systems in order to provide an effective training and education platform. Furthermore, various algorithms in cybersecurity, in particular attack tree generation, need to be validated on realistic models of IT systems. In this paper, we describe a system we call the Generator that, based on the high-level requirements such as the number of employees and the business area the target company belongs to, generates a model of an ITS that satisfies the given requirements. We put special emphasis on the following two criteria: the generated ITS models a large amount of details, and ideally resembles a real system. Our survey of related literature found no sufficiently similar prior works, so we believe that this is the first attempt of building something like this. We created a proof-of-concept implementation of the Generator, validated it by generating ITS models for a simplified fictional financial institution, and analyzed the Generators performance with respect to the problem size. The research was done in an iterative manner, with coauthors continuously providing feedback on intermediate results. (...) We intend to extend this prototype to allow probabilistic generation of IT systems when only a subset of parameters is explicitly defined, and further develop and validate our approach with the help of domain experts. △ Less

Submitted 31 January, 2022; v1 submitted 23 July, 2021; originally announced July 2021.

Comments: 20 pages, 16 figures

Journal ref: IEEE Access (2022)

arXiv:2106.06000 [pdf, ps, other]

Use of a non-peer reviewed sources in cyber-security scientific research

Authors: Dalibor Gernhardt, Stjepan Groš

Abstract: Most publicly available data on cyber incidents comes from private companies and non-academic sources. Common sources of information include various security bulletins, white papers, reports, court cases, and blog posts describing specific events, often from a single point of view, followed by occasional academic sources, usually conference proceedings. The main characteristics of the available da… ▽ More Most publicly available data on cyber incidents comes from private companies and non-academic sources. Common sources of information include various security bulletins, white papers, reports, court cases, and blog posts describing specific events, often from a single point of view, followed by occasional academic sources, usually conference proceedings. The main characteristics of the available data sources are: lack of peer review and unavailability of confidential data. In this paper, we use an indirect approach to identify trusted sources used in scientific work. We analyze how top-rated peer reviewed literature relies on the use of non-peer reviewed sources on cybersecurity incidents. To identify current non-peer reviewed sources on cybersecurity we analyze references in top rated peer reviewed computer security conferences. We also analyze how non-peer reviewed sources are used, to motivate or support research. We examined 808 articles from top conferences in field of computer security. The result of this work are list of the most commonly used non-peer reviewed data sources and information about the context in which this data is used. Since these sources are accepted in top conferences, other researchers can consider them in their future research. To the best of our knowledge, analysis on how non-peer reviewed sources are used in cyber-security scientific research has not been done before. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 9 pages, 6 tables

arXiv:2106.05702 [pdf, ps, other]

Myths and Misconceptions about Attackers and Attacks

Authors: Stjepan Groš

Abstract: This paper is based on a three year project during which we studied attackers' behavior, reading military planning literature, and thinking on how would we do the same things they do, and what problems would we, as attackers, face. This research is still ongoing, but while participating in applications for other projects and talking to cyber security experts we constantly face the same issues, nam… ▽ More This paper is based on a three year project during which we studied attackers' behavior, reading military planning literature, and thinking on how would we do the same things they do, and what problems would we, as attackers, face. This research is still ongoing, but while participating in applications for other projects and talking to cyber security experts we constantly face the same issues, namely attackers' behavior is not well understood, and consequently, there are a number of misconceptions floating around that are simply not true, or are only partially true. This is actually expected as someone who casually follows news about incidents easily gets impression that attackers and attacks are everywhere and every one is under attack. Our goal in this paper is to debunk these myths, to show what attackers really can and can not, what dilemmas they face, what we don't know about attackers and attacks, etc. The conclusion is that, while attackers do have upper hand, they don't have absolute advantage, i.e. they also operate in an uncertain environment. Knowing this, means that defenses could be well established. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 8 pages, 27 reference. This paper is work in progress and as such may contain inaccuracies, missing or unfinished sentences and paragraphs

arXiv:2106.01154 [pdf, other]

Controlled Update of Software Components using Concurrent Exection of Patched and Unpatched Versions

Authors: Stjepan Groš, Ivan Kovačević, Ivan Dujmić, Matej Petrinović

Abstract: Software patching is a common method of removing vulnerabilities in software components to make IT systems more secure. However, there are many cases where software patching is not possible due to the critical nature of the application, especially when the vendor providing the application guarantees correct operation only in a specific configuration. In this paper, we propose a method to solve thi… ▽ More Software patching is a common method of removing vulnerabilities in software components to make IT systems more secure. However, there are many cases where software patching is not possible due to the critical nature of the application, especially when the vendor providing the application guarantees correct operation only in a specific configuration. In this paper, we propose a method to solve this problem. The idea is to run unpatched and patched application instances concurrently, with the unpatched one having complete control and the output of the patched one being used only for comparison, to watch for differences that are consequences of introduced bugs. To test this idea, we developed a system that allows us to run web applications in parallel and tested three web applications. The experiments have shown that the idea is promising for web applications from the technical side. Furthermore, we discuss the potential limitations of this system and the idea in general, how long two instances should run in order to be able to claim with some probability that the patched version has not introduced any new bugs, other potential use cases of the proposed system where two application instances run concurrently, and finally the potential uses of this system with different types of applications, such as SCADA systems. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: 9 pages, 4 figures

arXiv:2104.02743 [pdf, other]

Approximate Robust NMPC using Reinforcement Learning

Authors: Hossein Nejatbakhsh Esfahani, Arash Bahari Kordabad, Sebastien Gros

Abstract: We present a Reinforcement Learning-based Robust Nonlinear Model Predictive Control (RL-RNMPC) framework for controlling nonlinear systems in the presence of disturbances and uncertainties. An approximate Robust Nonlinear Model Predictive Control (RNMPC) of low computational complexity is used in which the state trajectory uncertainty is modelled via ellipsoids. Reinforcement Learning is then used… ▽ More We present a Reinforcement Learning-based Robust Nonlinear Model Predictive Control (RL-RNMPC) framework for controlling nonlinear systems in the presence of disturbances and uncertainties. An approximate Robust Nonlinear Model Predictive Control (RNMPC) of low computational complexity is used in which the state trajectory uncertainty is modelled via ellipsoids. Reinforcement Learning is then used in order to handle the ellipsoidal approximation and improve the closed-loop performance of the scheme by adjusting the MPC parameters generating the ellipsoids. The approach is tested on a simulated Wheeled Mobile Robot (WMR) tracking a desired trajectory while avoiding static obstacles. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: This paper has been accepted to 2021 European Control Conference (ECC)

arXiv:2104.02411 [pdf, other]

MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage

Authors: Arash Bahari Kordabad, Wenqi Cai, Sebastien Gros

Abstract: In this paper, we are interested in optimal control problems with purely economic costs, which often yield optimal policies having a (nearly) bang-bang structure. We focus on policy approximations based on Model Predictive Control (MPC) and the use of the deterministic policy gradient method to optimize the MPC closed-loop performance in the presence of unmodelled stochasticity or model error. Whe… ▽ More In this paper, we are interested in optimal control problems with purely economic costs, which often yield optimal policies having a (nearly) bang-bang structure. We focus on policy approximations based on Model Predictive Control (MPC) and the use of the deterministic policy gradient method to optimize the MPC closed-loop performance in the presence of unmodelled stochasticity or model error. When the policy has a (nearly) bang-bang structure, we observe that the policy gradient method can struggle to produce meaningful steps in the policy parameters. To tackle this issue, we propose a homotopy strategy based on the interior-point method, providing a relaxation of the policy during the learning. We investigate a specific well-known battery storage problem, and show that the proposed method delivers a homogeneous and faster learning than a classical policy gradient approach. △ Less

Submitted 6 April, 2021; originally announced April 2021.

Comments: This paper has been accepted to ECC2021. 6 pages

arXiv:2102.11122 [pdf, other]

Reinforcement Learning of the Prediction Horizon in Model Predictive Control

Authors: Eivind Bøhn, Sebastien Gros, Signe Moe, Tor Arne Johansen

Abstract: Model predictive control (MPC) is a powerful trajectory optimization control technique capable of controlling complex nonlinear systems while respecting system constraints and ensuring safe operation. The MPC's capabilities come at the cost of a high online computational complexity, the requirement of an accurate model of the system dynamics, and the necessity of tuning its parameters to the speci… ▽ More Model predictive control (MPC) is a powerful trajectory optimization control technique capable of controlling complex nonlinear systems while respecting system constraints and ensuring safe operation. The MPC's capabilities come at the cost of a high online computational complexity, the requirement of an accurate model of the system dynamics, and the necessity of tuning its parameters to the specific control application. The main tunable parameter affecting the computational complexity is the prediction horizon length, controlling how far into the future the MPC predicts the system response and thus evaluates the optimality of its computed trajectory. A longer horizon generally increases the control performance, but requires an increasingly powerful computing platform, excluding certain control applications.The performance sensitivity to the prediction horizon length varies over the state space, and this motivated the adaptive horizon model predictive control (AHMPC), which adapts the prediction horizon according to some criteria. In this paper we propose to learn the optimal prediction horizon as a function of the state using reinforcement learning (RL). We show how the RL learning problem can be formulated and test our method on two control tasks, showing clear improvements over the fixed horizon MPC scheme, while requiring only minutes of learning. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: This work has been submitted to IFAC NMPC 2021 for possible publication

arXiv:2102.01383 [pdf, other]

Stability-Constrained Markov Decision Processes Using MPC

Authors: Mario Zanon, Sébastien Gros, Michele Palladino

Abstract: In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured policy in the context of Reinforcement Learning to make it possible to introduce stability… ▽ More In this paper, we consider solving discounted Markov Decision Processes (MDPs) under the constraint that the resulting policy is stabilizing. In practice MDPs are solved based on some form of policy approximation. We will leverage recent results proposing to use Model Predictive Control (MPC) as a structured policy in the context of Reinforcement Learning to make it possible to introduce stability requirements directly inside the MPC-based policy. This will restrict the solution of the MDP to stabilizing policies by construction. The stability theory for MPC is most mature for the undiscounted MPC case. Hence, we will first show in this paper that stable discounted MDPs can be reformulated as undiscounted ones. This observation will entail that the MPC-based policy with stability requirements will produce the optimal policy for the discounted MDP if it is stable, and the best stabilizing policy otherwise. △ Less

Submitted 2 February, 2021; originally announced February 2021.

arXiv:2012.07369 [pdf, other]

Learning for MPC with Stability & Safety Guarantees

Authors: Sébastien Gros, Mario Zanon

Abstract: The combination of learning methods with Model Predictive Control (MPC) has attracted a significant amount of attention in the recent literature. The hope of this combination is to reduce the reliance of MPC schemes on accurate models, and to tap into the fast developing machine learning and reinforcement learning tools to exploit the growing amount of data available for many systems. In particula… ▽ More The combination of learning methods with Model Predictive Control (MPC) has attracted a significant amount of attention in the recent literature. The hope of this combination is to reduce the reliance of MPC schemes on accurate models, and to tap into the fast developing machine learning and reinforcement learning tools to exploit the growing amount of data available for many systems. In particular, the combination of reinforcement learning and MPC has been proposed as a viable and theoretically justified approach to introduce explainable, safe and stable policies in reinforcement learning. However, a formal theory detailing how the safety and stability of an MPC-based policy can be maintained through the parameter updates delivered by the learning tools is still lacking. This paper addresses this gap. The theory is developed for the generic Robust MPC case, and applied in simulation in the robust tube-based linear MPC case, where the theory is fairly easy to deploy in practice. The paper focuses on Reinforcement Learning as a learning tool, but it applies to any learning method that updates the MPC parameters online. △ Less

Submitted 22 July, 2022; v1 submitted 14 December, 2020; originally announced December 2020.

arXiv:2011.13365 [pdf, other]

Optimization of the Model Predictive Control Update Interval Using Reinforcement Learning

Authors: Eivind Bøhn, Sebastien Gros, Signe Moe, Tor Arne Johansen

Abstract: In control applications there is often a compromise that needs to be made with regards to the complexity and performance of the controller and the computational resources that are available. For instance, the typical hardware platform in embedded control applications is a microcontroller with limited memory and processing power, and for battery powered applications the control system can account f… ▽ More In control applications there is often a compromise that needs to be made with regards to the complexity and performance of the controller and the computational resources that are available. For instance, the typical hardware platform in embedded control applications is a microcontroller with limited memory and processing power, and for battery powered applications the control system can account for a significant portion of the energy consumption. We propose a controller architecture in which the computational cost is explicitly optimized along with the control objective. This is achieved by a three-part architecture where a high-level, computationally expensive controller generates plans, which a computationally simpler controller executes by compensating for prediction errors, while a recomputation policy decides when the plan should be recomputed. In this paper, we employ model predictive control (MPC) as the high-level plan-generating controller, a linear state feedback controller as the simpler compensating controller, and reinforcement learning (RL) to learn the recomputation policy. Simulation results for two examples showcase the architecture's ability to improve upon the MPC approach and find reasonable compromises weighing the performance on the control objective and the computational resources expended. △ Less

Submitted 26 November, 2020; originally announced November 2020.

Comments: Submitted to 3rd Annual Learning for Dynamics and Control Conference (L4DC 2021)

arXiv:2004.01430 [pdf, ps, other]

Reinforcement Learning for Mixed-Integer Problems Based on MPC

Authors: Sebastien Gros, Mario Zanon

Abstract: Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic methods, both in the context of nominal Economic MPC and Robust (N)MPC, showing very promising results. In that context, actor-critic methods seem to be the mo… ▽ More Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic methods, both in the context of nominal Economic MPC and Robust (N)MPC, showing very promising results. In that context, actor-critic methods seem to be the most reliable approach. Many applications include a mixture of continuous and integer inputs, for which the classical actor-critic methods need to be adapted. In this paper, we present a policy approximation based on mixed-integer MPC schemes, and propose a computationally inexpensive technique to generate exploration in the mixed-integer input space that ensures a satisfaction of the constraints. We then propose a simple compatible advantage function approximation for the proposed policy, that allows one to build the gradient of the mixed-integer MPC-based policy. △ Less

Submitted 3 April, 2020; originally announced April 2020.

Comments: Accepted at IFAC 2020

arXiv:2004.00915 [pdf, ps, other]

Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?

Authors: Sebastien Gros, Mario Zanon, Alberto Bemporad

Abstract: For all its successes, Reinforcement Learning (RL) still struggles to deliver formal guarantees on the closed-loop behavior of the learned policy. Among other things, guaranteeing the safety of RL with respect to safety-critical systems is a very active research topic. Some recent contributions propose to rely on projections of the inputs delivered by the learned policy into a safe set, ensuring t… ▽ More For all its successes, Reinforcement Learning (RL) still struggles to deliver formal guarantees on the closed-loop behavior of the learned policy. Among other things, guaranteeing the safety of RL with respect to safety-critical systems is a very active research topic. Some recent contributions propose to rely on projections of the inputs delivered by the learned policy into a safe set, ensuring that the system safety is never jeopardized. Unfortunately, it is unclear whether this operation can be performed without disrupting the learning process. This paper addresses this issue. The problem is analysed in the context of $Q$-learning and policy gradient techniques. We show that the projection approach is generally disruptive in the context of $Q$-learning though a simple alternative solves the issue, while simple corrections can be used in the context of policy gradient methods in order to ensure that the policy gradients are unbiased. The proposed results extend to safe projections based on robust MPC techniques. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: Accepted at IFAC 2020

arXiv:2001.06616 [pdf, ps, other]

Research Directions in Cyber Threat Intelligence

Authors: Stjepan Groš

Abstract: Cyber threat intelligence is a relatively new field that has grown from two distinct fields, cyber security and intelligence. As such, it draws knowledge from and mixes the two fields. Yet, looking into current scientific research on cyber threat intelligence research, it is relatively scarce, which opens up a lot of opportunities. In this paper we define what cyber threat intelligence is, briefly… ▽ More Cyber threat intelligence is a relatively new field that has grown from two distinct fields, cyber security and intelligence. As such, it draws knowledge from and mixes the two fields. Yet, looking into current scientific research on cyber threat intelligence research, it is relatively scarce, which opens up a lot of opportunities. In this paper we define what cyber threat intelligence is, briefly review some aspects for cyber threat intelligence. Then, we analyze existing research fields that are much older that cyber threat intelligence but related to it. This opens up an opportunity to draw knowledge and methods from those older field, and in that way advance cyber threat intelligence much faster than it would by following its own path. With such an approach we effectively give a research directions for CTI. △ Less

Submitted 18 January, 2020; originally announced January 2020.

Comments: 6 pages

arXiv:1910.01721 [pdf, ps, other]

A Critical View on CIS Controls

Authors: Stjepan Groš

Abstract: CIS Controls is a set of 20 controls and 171 sub-controls that were created with an idea of having a list of something to implement so that organizations can increase their security. While good in theory, it is a big question of how viable this approach is in practice, and does it really help. There is only a minor number of critical views of CIS Controls and since CIS Controls are marketed by two… ▽ More CIS Controls is a set of 20 controls and 171 sub-controls that were created with an idea of having a list of something to implement so that organizations can increase their security. While good in theory, it is a big question of how viable this approach is in practice, and does it really help. There is only a minor number of critical views of CIS Controls and since CIS Controls are marketed by two very influential organizations they are very popular. Yet, there are alternatives published by ISO, NIST and even PCI consortium. In this paper we critically assess CIS Controls, assumptions on which they are based as well as validity of approach and claims made in its favor. The conclusion is that scientific community should be more active regarding this topic, but also that more material is necessary. This is something that CIS and SANS should support if they want to make CIS Controls viable alternative to other approaches. △ Less

Submitted 2 May, 2020; v1 submitted 3 October, 2019; originally announced October 2019.

Comments: 7 pages

Showing 1–22 of 22 results for author: Gros, S