Search | arXiv e-print repository

Structured Reinforcement Learning for Media Streaming at the Wireless Edge

Authors: Archana Bura, Sarat Chandra Bobbili, Shreyas Rameshkumar, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai

Abstract: Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to… ▽ More Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to dynamically prioritize in a video streaming setting. We formulate the policy design question as a constrained Markov decision problem (CMDP), and observe that by using a Lagrangian relaxation we can decompose it into single-client problems. Further, the optimal policy takes a threshold form in the video buffer length, which enables us to design an efficient constrained reinforcement learning (CRL) algorithm to learn it. Specifically, we show that a natural policy gradient (NPG) based algorithm that is derived using the structure of our problem converges to the globally optimal policy. We then develop a simulation environment for training, and a real-world intelligent controller attached to a WiFi access point for evaluation. We empirically show that the structured learning approach enables fast learning. Furthermore, such a structured policy can be easily deployed due to low computational complexity, leading to policy execution taking only about 15$μ$s. Using YouTube streaming experiments in a resource constrained scenario, we demonstrate that the CRL approach can increase quality of experience (QOE) by over 30\%. △ Less

Submitted 16 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 15 pages, 14 figures

arXiv:2312.15340 [pdf, other]

Meta-Learning-Based Adaptive Stability Certificates for Dynamical Systems

Authors: Amit Jena, Dileep Kalathil, Le Xie

Abstract: This paper addresses the problem of Neural Network (NN) based adaptive stability certification in a dynamical system. The state-of-the-art methods, such as Neural Lyapunov Functions (NLFs), use NN-based formulations to assess the stability of a non-linear dynamical system and compute a Region of Attraction (ROA) in the state space. However, under parametric uncertainty, if the values of system par… ▽ More This paper addresses the problem of Neural Network (NN) based adaptive stability certification in a dynamical system. The state-of-the-art methods, such as Neural Lyapunov Functions (NLFs), use NN-based formulations to assess the stability of a non-linear dynamical system and compute a Region of Attraction (ROA) in the state space. However, under parametric uncertainty, if the values of system parameters vary over time, the NLF methods fail to adapt to such changes and may lead to conservative stability assessment performance. We circumvent this issue by integrating Model Agnostic Meta-learning (MAML) with NLFs and propose meta-NLFs. In this process, we train a meta-function that adapts to any parametric shifts and updates into an NLF for the system with new test-time parameter values. We demonstrate the stability assessment performance of meta-NLFs on some standard benchmark autonomous dynamical systems. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: This article has been accepted for AAAI-24 (The 38th Annual AAAI Conference on Artificial Intelligence)

arXiv:2311.00226 [pdf, other]

Transformers are Provably Optimal In-context Estimators for Wireless Communications

Authors: Vishnu Teja Kunde, Vicram Rajagopalan, Chandra Shekhara Kaushik Valmeekam, Krishna Narayanan, Srinivas Shakkottai, Dileep Kalathil, Jean-Francois Chamberland

Abstract: Pre-trained transformers exhibit the capability of adapting to new tasks through in-context learning (ICL), where they efficiently utilize a limited set of prompts without explicit model optimization. The canonical communication problem of estimating transmitted symbols from received observations can be modelled as an in-context learning problem: Received observations are essentially a noisy fun… ▽ More Pre-trained transformers exhibit the capability of adapting to new tasks through in-context learning (ICL), where they efficiently utilize a limited set of prompts without explicit model optimization. The canonical communication problem of estimating transmitted symbols from received observations can be modelled as an in-context learning problem: Received observations are essentially a noisy function of transmitted symbols, and this function can be represented by an unknown parameter whose statistics depend on an (also unknown) latent context. This problem, which we term in-context estimation (ICE), has significantly greater complexity than the extensively studied linear regression problem. The optimal solution to the ICE problem is a non-linear function of the underlying context. In this paper, we prove that, for a subclass of such problems, a single layer softmax attention transformer (SAT) computes the optimal solution of the above estimation problem in the limit of large prompt length. We also prove that the optimal configuration of such transformer is indeed the minimizer of the corresponding training loss. Further, we empirically demonstrate the proficiency of multi-layer transformers in efficiently solving broader in-context estimation problems. Through extensive simulations, we show that solving ICE problems using transformers significantly outperforms standard approaches. Moreover, just with a few context examples, it achieves the same performance as an estimator with perfect knowledge of the latent context. △ Less

Submitted 14 June, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: 13 pages, 2 figures, 2 tables, preprint; abstract, references, theory updated

arXiv:2302.12320 [pdf, other]

Dynamic Regret Analysis of Safe Distributed Online Optimization for Convex and Non-convex Problems

Authors: Ting-Jui Chang, Sapana Chaudhary, Dileep Kalathil, Shahin Shahrampour

Abstract: This paper addresses safe distributed online optimization over an unknown set of linear safety constraints. A network of agents aims at jointly minimizing a global, time-varying function, which is only partially observable to each individual agent. Therefore, agents must engage in local communications to generate a safe sequence of actions competitive with the best minimizer sequence in hindsight,… ▽ More This paper addresses safe distributed online optimization over an unknown set of linear safety constraints. A network of agents aims at jointly minimizing a global, time-varying function, which is only partially observable to each individual agent. Therefore, agents must engage in local communications to generate a safe sequence of actions competitive with the best minimizer sequence in hindsight, and the gap between the two sequences is quantified via dynamic regret. We propose distributed safe online gradient descent (D-Safe-OGD) with an exploration phase, where all agents estimate the constraint parameters collaboratively to build estimated feasible sets, ensuring the action selection safety during the optimization phase. We prove that for convex functions, D-Safe-OGD achieves a dynamic regret bound of $O(T^{2/3} \sqrt{\log T} + T^{1/3}C_T^*)$, where $C_T^*$ denotes the path-length of the best minimizer sequence. We further prove a dynamic regret bound of $O(T^{2/3} \sqrt{\log T} + T^{2/3}C_T^*)$ for certain non-convex problems, which establishes the first dynamic regret bound for a safe distributed algorithm in the non-convex setting. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2210.06734 [pdf, other]

Optimal Control of Material Micro-Structures

Authors: Aayushman Sharma, Zirui Mao, Haiying Yang, Suman Chakravorty, Michael J Demkowicz, Dileep Kalathil

Abstract: In this paper, we consider the optimal control of material micro-structures. Such material micro-structures are modeled by the so-called phase field model. We study the underlying physical structure of the model and propose a data based approach for its optimal control, along with a comparison to the control using a state of the art Reinforcement Learning (RL) algorithm. Simulation results show th… ▽ More In this paper, we consider the optimal control of material micro-structures. Such material micro-structures are modeled by the so-called phase field model. We study the underlying physical structure of the model and propose a data based approach for its optimal control, along with a comparison to the control using a state of the art Reinforcement Learning (RL) algorithm. Simulation results show the feasibility of optimally controlling such micro-structures to attain desired material properties and complex target micro-structures. △ Less

Submitted 13 October, 2022; originally announced October 2022.

arXiv:2208.10259 [pdf, ps, other]

Meta-Learning Online Control for Linear Dynamical Systems

Authors: Deepan Muthirayan, Dileep Kalathil, Pramod P. Khargonekar

Abstract: In this paper, we consider the problem of finding a meta-learning online control algorithm that can learn across the tasks when faced with a sequence of $N$ (similar) control tasks. Each task involves controlling a linear dynamical system for a finite horizon of $T$ time steps. The cost function and system noise at each time step are adversarial and unknown to the controller before taking the cont… ▽ More In this paper, we consider the problem of finding a meta-learning online control algorithm that can learn across the tasks when faced with a sequence of $N$ (similar) control tasks. Each task involves controlling a linear dynamical system for a finite horizon of $T$ time steps. The cost function and system noise at each time step are adversarial and unknown to the controller before taking the control action. Meta-learning is a broad approach where the goal is to prescribe an online policy for any new unseen task exploiting the information from other tasks and the similarity between the tasks. We propose a meta-learning online control algorithm for the control setting and characterize its performance by \textit{meta-regret}, the average cumulative regret across the tasks. We show that when the number of tasks are sufficiently large, our proposed approach achieves a meta-regret that is smaller by a factor $D/D^{*}$ compared to an independent-learning online control algorithm which does not perform learning across the tasks, where $D$ is a problem constant and $D^{*}$ is a scalar that decreases with increase in the similarity between tasks. Thus, when the sequence of tasks are similar the regret of the proposed meta-learning online control is significantly lower than that of the naive approaches without meta-learning. We also present experiment results to demonstrate the superior performance achieved by our meta-learning algorithm. △ Less

Submitted 18 August, 2022; originally announced August 2022.

arXiv:2207.07731 [pdf, other]

Distributed Learning of Neural Lyapunov Functions for Large-Scale Networked Dissipative Systems

Authors: Amit Jena, Tong Huang, S. Sivaranjani, Dileep Kalathil, Le Xie

Abstract: This paper considers the problem of characterizing the stability region of a large-scale networked system comprised of dissipative nonlinear subsystems, in a distributed and computationally tractable way. One standard approach to estimate the stability region of a general nonlinear system is to first find a Lyapunov function for the system and characterize its region of attraction as the stability… ▽ More This paper considers the problem of characterizing the stability region of a large-scale networked system comprised of dissipative nonlinear subsystems, in a distributed and computationally tractable way. One standard approach to estimate the stability region of a general nonlinear system is to first find a Lyapunov function for the system and characterize its region of attraction as the stability region. However, classical approaches, such as sum-of-squares methods and quadratic approximation, for finding a Lyapunov function either do not scale to large systems or give very conservative estimates for the stability region. In this context, we propose a new distributed learning based approach by exploiting the dissipativity structure of the subsystems. Our approach has two parts: the first part is a distributed approach to learn the storage functions (similar to the Lyapunov functions) for all the subsystems, and the second part is a distributed optimization approach to find the Lyapunov function for the networked system using the learned storage functions of the subsystems. We demonstrate the superior performance of our proposed approach through extensive case studies in microgrid networks. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2203.04430 [pdf, other]

The Impact of Heavy-Duty Vehicle Electrification on Large Power Grids: a Synthetic Texas Case Study

Authors: Rayan El Helou, S. Sivaranjani, Dileep Kalathil, Andrew Schaper, Le Xie

Abstract: The electrification of heavy-duty vehicles (HDEVs) is a nascent and rapidly emerging avenue for decarbonization of the transportation sector. In this paper, we examine the impacts of increased vehicle electrification on the power grid infrastructure, with particular focus on HDEVs. We utilize a synthetic representation of the 2000-bus Texas transmission grid, and realistic representations of multi… ▽ More The electrification of heavy-duty vehicles (HDEVs) is a nascent and rapidly emerging avenue for decarbonization of the transportation sector. In this paper, we examine the impacts of increased vehicle electrification on the power grid infrastructure, with particular focus on HDEVs. We utilize a synthetic representation of the 2000-bus Texas transmission grid, and realistic representations of multiple distribution grids in Travis county, Texas, as well as transit data pertaining to HDEVs, to uncover the consequences of HDEV electrification, and expose the limitations imposed by existing electric grid infrastructure. Our analysis reveals that grid-wide voltage problems that are spatiotemporally correlated with the mobility of HDEVs may occur even at modest penetration levels. In fact, we find that as little as 11% of heavy duty vehicles in Texas charging simultaneously can lead to significant voltage violations on the transmission network that compromise grid reliability. Furthermore, we find that just a few dozen EVs charging simultaneously can lead to voltage violations at the distribution level. △ Less

Submitted 8 March, 2022; originally announced March 2022.

arXiv:2111.15063 [pdf, ps, other]

Online Robust Control of Linear Dynamical Systems with Limited Prediction

Authors: Deepan Muthirayan, Dileep Kalathil, Pramod P. Khargonekar

Abstract: We study the online robust control problem for linear dynamical systems with disturbances and uncertainties in the cost functions, with limited preview of the future disturbances and the cost functions, $N$. Our goal is to find an online control policy that can minimize the disturbance gain, defined as the ratio of the cumulative cost and the cumulative energy in the disturbances over a period of… ▽ More We study the online robust control problem for linear dynamical systems with disturbances and uncertainties in the cost functions, with limited preview of the future disturbances and the cost functions, $N$. Our goal is to find an online control policy that can minimize the disturbance gain, defined as the ratio of the cumulative cost and the cumulative energy in the disturbances over a period of time, in the face of the uncertainties, and characterize its achievable gain in terms of the system relevant parameters. Our goals contrast with prior online control works for the same problem, which either focus on minimizing the static regret, a weaker performance metric, or assume a very large preview of the future uncertainties. Specifically, we consider a class of cost functions characterized by $β$ ($β< 1$), a number whose inverse bounds the variation of the cost functions. We propose a novel variation of the Receding Horizon Control as the online control policy. We show that, under standard system assumptions, when $N > 4/β^3$, the proposed algorithm can achieve a disturbance gain $(2/β+ρ(N)) \overlineγ^2$, where $\overlineγ^2$ is the best (minimum) possible disturbance gain for an oracle policy with full knowledge of the cost functions and disturbances, with $ρ(N) = O(1/N)$. We also demonstrate through simulations that the proposed policy satisfies the derived bounds and is consistently better than the standard RHC approach. △ Less

Submitted 30 October, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.15041 [pdf, ps, other]

Online Learning for Predictive Control with Provable Regret Guarantees

Authors: Deepan Muthirayan, Jianjun Yuan, Dileep Kalathil, Pramod P. Khargonekar

Abstract: We study the problem of online learning in predictive control of an unknown linear dynamical system with time varying cost functions which are unknown apriori. Specifically, we study the online learning problem where the control algorithm does not know the true system model and has only access to a fixed-length (that does not grow with the control horizon) preview of the future cost functions. The… ▽ More We study the problem of online learning in predictive control of an unknown linear dynamical system with time varying cost functions which are unknown apriori. Specifically, we study the online learning problem where the control algorithm does not know the true system model and has only access to a fixed-length (that does not grow with the control horizon) preview of the future cost functions. The goal of the online algorithm is to minimize the dynamic regret, defined as the difference between the cumulative cost incurred by the algorithm and that of the best sequence of actions in hindsight. We propose two different online Model Predictive Control (MPC) algorithms to address this problem, namely Certainty Equivalence MPC (CE-MPC) algorithm and Optimistic MPC (O-MPC) algorithm. We show that under the standard stability assumption for the model estimate, the CE-MPC algorithm achieves $\mathcal{O}(T^{2/3})$ dynamic regret. We then extend this result to the setting where the stability assumption holds only for the true system model by proposing the O-MPC algorithm. We show that the O-MPC algorithm also achieves $\mathcal{O}(T^{2/3})$ dynamic regret, at the cost of some additional computation. We also present numerical studies to demonstrate the performance of our algorithm. △ Less

Submitted 31 October, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

arXiv:2109.05802 [pdf, other]

PyProD: A Machine Learning-Friendly Platform for Protection Analytics in Distribution Systems

Authors: Dongqi Wu, Dileep Kalathil, Miroslav Begovic, Le Xie

Abstract: This paper introduces PyProD, a Python-based machine learning (ML)-compatible test-bed for evaluating the efficacy of protection schemes in electric distribution grids. This testbed is designed to bridge the gap between conventional power distribution grid analysis and growing capability of ML-based decision making algorithms, in particular in the context of protection system design and configurat… ▽ More This paper introduces PyProD, a Python-based machine learning (ML)-compatible test-bed for evaluating the efficacy of protection schemes in electric distribution grids. This testbed is designed to bridge the gap between conventional power distribution grid analysis and growing capability of ML-based decision making algorithms, in particular in the context of protection system design and configuration. PyProD is shown to be capable of facilitating efficient design and evaluation of ML-based decision making algorithms for protection devices in the future electric distribution grid, in which many distributed energy resources and pro-sumers permeate the system. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: This paper has been accepted for HICSS 2022 and will appear in the conference proceedings

arXiv:2008.05699 [pdf, other]

A Vision-Based Control Method for Autonomous Landing of Vertical Flight Aircraft On a Moving Platform Without Using GPS

Authors: Bochan Lee, Vishnu Saj, Moble Benedict, Dileep Kalathil

Abstract: The paper discusses a novel vision-based estimation and control approach to enable fully autonomous tracking and landing of vertical take-off and landing (VTOL) capable unmanned aerial vehicles (UAVs) on moving platforms without relying on a GPS signal. A unique feature of the present method is that it accomplishes this task without tracking the landing pad itself; however, by utilizing a standard… ▽ More The paper discusses a novel vision-based estimation and control approach to enable fully autonomous tracking and landing of vertical take-off and landing (VTOL) capable unmanned aerial vehicles (UAVs) on moving platforms without relying on a GPS signal. A unique feature of the present method is that it accomplishes this task without tracking the landing pad itself; however, by utilizing a standardized visual cue installed normal to the landing pad and parallel to the pilot's/vehicle's line of sight. A computer vision system using a single monocular camera is developed to detect the visual cue and then accurately estimate the heading of the UAV and its relative distances in all three directions to the landing pad. Through comparison with a Vicon-based motion capture system, the capability of the present vision system to measure distances in real-time within an accuracy of less than a centimeter and heading within a degree with the right visual cue, is demonstrated. A gain-scheduled proportional integral derivative (PID) control system is integrated with the vision system and then implemented on a quad-rotor-UAV dynamic model in a realistic simulation program called Gazebo. Extensive simulations are conducted to demonstrate the ability of the controller to achieve robust tracking and landing on platforms moving in arbitrary trajectories. Repeated flight tests, using both stationary and moving platforms are successfully conducted with less than 5 centimeters of landing error. △ Less

Submitted 16 August, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

Comments: Presented at the VFS International 76th Annual Forum & Technology Display, October 6-8, 2020. Submitted to the Journal of Guidance, Control, and Dynamics(under review)

arXiv:2008.01231 [pdf, other]

Fully Decentralized Reinforcement Learning-based Control of Photovoltaics in Distribution Grids for Joint Provision of Real and Reactive Power

Authors: Rayan El Helou, Dileep Kalathil, Le Xie

Abstract: In this paper, we introduce a new framework to address the problem of voltage regulation in unbalanced distribution grids with deep photovoltaic penetration. In this framework, both real and reactive power setpoints are explicitly controlled at each solar panel smart inverter, and the objective is to simultaneously minimize system-wide voltage deviation and maximize solar power output. We formulat… ▽ More In this paper, we introduce a new framework to address the problem of voltage regulation in unbalanced distribution grids with deep photovoltaic penetration. In this framework, both real and reactive power setpoints are explicitly controlled at each solar panel smart inverter, and the objective is to simultaneously minimize system-wide voltage deviation and maximize solar power output. We formulate the problem as a Markov decision process with continuous action spaces and use proximal policy optimization, a reinforcement learning-based approach, to solve it, without the need for any forecast or explicit knowledge of network topology or line parameters. By representing the system in a quasi-steady state manner, and by carefully formulating the Markov decision process, we reduce the complexity of the problem and allow for fully decentralized (communication-free) policies, all of which make the trained policies much more practical and interpretable. Numerical simulations on a 240-node unbalanced distribution grid, based on a real network in Midwest U.S., are used to validate the proposed framework and reinforcement learning approach. △ Less

Submitted 29 April, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

arXiv:2006.11608 [pdf, other]

Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

Authors: Kishan Panaganti, Dileep Kalathil

Abstract: This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDP framework is to find a policy that is robust against the parameter uncertainties due to the mismatch between the simulator model and real-world settings. We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-… ▽ More This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDP framework is to find a policy that is robust against the parameter uncertainties due to the mismatch between the simulator model and real-world settings. We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation. We prove the convergence of this algorithm using stochastic approximation techniques. We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy. We also give a general weighted Euclidean norm bound on the error (closeness to optimality) of the resulting policy. Finally, we demonstrate the performance of our RLSPI algorithm on some standard benchmark problems. △ Less

Submitted 11 February, 2021; v1 submitted 20 June, 2020; originally announced June 2020.

Comments: 26 pages, 12 figures, 2 tables

arXiv:2004.00472 [pdf, other]

Learning to Cache and Caching to Learn: Regret Analysis of Caching Algorithms

Authors: Archana Bura, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai, Jean-Francois Chamberland-Tremblay

Abstract: Crucial performance metrics of a caching algorithm include its ability to quickly and accurately learn a popularity distribution of requests. However, a majority of work on analytical performance analysis focuses on hit probability after an asymptotically large time has elapsed. We consider an online learning viewpoint, and characterize the "regret" in terms of the finite time difference between t… ▽ More Crucial performance metrics of a caching algorithm include its ability to quickly and accurately learn a popularity distribution of requests. However, a majority of work on analytical performance analysis focuses on hit probability after an asymptotically large time has elapsed. We consider an online learning viewpoint, and characterize the "regret" in terms of the finite time difference between the hits achieved by a candidate caching algorithm with respect to a genie-aided scheme that places the most popular items in the cache. We first consider the Full Observation regime wherein all requests are seen by the cache. We show that the Least Frequently Used (LFU) algorithm is able to achieve order optimal regret, which is matched by an efficient counting algorithm design that we call LFU-Lite. We then consider the Partial Observation regime wherein only requests for items currently cached are seen by the cache, making it similar to an online learning problem related to the multi-armed bandit problem. We show how approaching this "caching bandit" using traditional approaches yields either high complexity or regret, but a simple algorithm design that exploits the structure of the distribution can ensure order optimal regret. We conclude by illustrating our insights using numerical simulations. △ Less

Submitted 1 April, 2020; originally announced April 2020.

arXiv:2003.02422 [pdf, other]

Deep Reinforcement Learning-BasedRobust Protection in DER-Rich Distribution Grids

Authors: Dongqi Wu, Dileep Kalathil, Miroslav Begovic, Le Xie

Abstract: This paper introduces the concept of Deep Reinforcement Learning based architecture for protective relay design in power distribution systems with many distributed energy resources (DERs). The performance of widely-used overcurrent protection scheme is hindered by the presence of distributed generation, power electronic interfaced devices and fault impedance. In this paper, a reinforcement learnin… ▽ More This paper introduces the concept of Deep Reinforcement Learning based architecture for protective relay design in power distribution systems with many distributed energy resources (DERs). The performance of widely-used overcurrent protection scheme is hindered by the presence of distributed generation, power electronic interfaced devices and fault impedance. In this paper, a reinforcement learning-based approach is proposed to design and implement protective relays in the distribution grid. The particular algorithm used is an Long Short-Term Memory (LSTM) enhanced deep neural network that is highly accurate, communication-free and easy to implement. The proposed relay design is tested in OpenDSS simulation on the IEEE 34-node test feeder and demonstrated much more superior performance over traditional overcurrent protection from the aspect of failure rate, robustness and response speed. △ Less

Submitted 1 June, 2021; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: Submitted to IEEE Transactions of Smart Grid, under review

arXiv:1908.08180 [pdf, ps, other]

Creation of Synthetic Networked PMU Data: A Generative Adversarial Network Approach

Authors: Xiangtian Zheng, Bin Wang, Dileep Kalathil, Le Xie

Abstract: This paper introduces a machine learning-based approach to synthetically creating realistic phasor measurement unit (PMU) data streams of multiple transient types. In contrast to the existing literature of transient simulation-based data generation methods, we propose a generative adversarial network (GAN) based approach to learning directly from the historical data and simultaneously reproduce mu… ▽ More This paper introduces a machine learning-based approach to synthetically creating realistic phasor measurement unit (PMU) data streams of multiple transient types. In contrast to the existing literature of transient simulation-based data generation methods, we propose a generative adversarial network (GAN) based approach to learning directly from the historical data and simultaneously reproduce multiple PMU data streams. The synthetic PMU data streams reflect meaningful dynamic characteristics which observe first principles such as Kirchhoff's laws. The efficacy of this approach is demonstrated by numerical studies on the IEEE 39-bus system. We validate the fidelity and flexibility of the synthetic data via statistical resemblance and modal analysis approaches. Finally we illustrate a practical application scenario for the usage of the synthetic PMU data, i.e. leverage the synthetic data to improve the performance of the event classification algorithms. △ Less

Submitted 6 April, 2020; v1 submitted 21 August, 2019; originally announced August 2019.

Comments: This manuscript has been submitted to IEEE Transactions on Power Systems

arXiv:1906.10815 [pdf, other]

Nested Reinforcement Learning Based Control for Protective Relays in Power Distribution Systems

Authors: Dongqi Wu, Xiangtian Zheng, Dileep Kalathil, Le Xie

Abstract: This paper envisions a new control architecture for the protective relay setting in future power distribution systems. With deepening penetration of distributed energy resources at the end users level, it has been recognized as a key engineering challenge to redesign the protective relays in the future distribution system. Conceptually, these protective relays are the discrete ON/OFF control devic… ▽ More This paper envisions a new control architecture for the protective relay setting in future power distribution systems. With deepening penetration of distributed energy resources at the end users level, it has been recognized as a key engineering challenge to redesign the protective relays in the future distribution system. Conceptually, these protective relays are the discrete ON/OFF control devices at the end of each branch and node in a power network. The key technical difficulty lies in how to set up the relay control logic so that the protection could successfully differentiate heavy load and faulty operating conditions. This paper proposes a new nested reinforcement learning approach to take advantage of the structural properties of distribution networks and develop a new set of training methods for tuning the protective relays. △ Less

Submitted 25 June, 2019; originally announced June 2019.

arXiv:1906.01069 [pdf, ps, other]

Selling Demand Response Using Options

Authors: Deepan Muthirayan, Dileep Kalathil, Sen Li, Kameshwar Poolla, Pravin Varaiya

Abstract: Wholesale electricity markets in many jurisdictions use a two-settlement structure: a day-ahead market for bulk power transactions and a real-time market for fine-grain supply-demand balancing. This paper explores trading demand response assets within this two-settlement market structure. We consider two approaches for trading demand response assets: (a) an intermediate spot market with contingent… ▽ More Wholesale electricity markets in many jurisdictions use a two-settlement structure: a day-ahead market for bulk power transactions and a real-time market for fine-grain supply-demand balancing. This paper explores trading demand response assets within this two-settlement market structure. We consider two approaches for trading demand response assets: (a) an intermediate spot market with contingent pricing, and (b) an over-the-counter options contract. In the first case, we characterize the competitive equilibrium of the spot market, and show that it is socially optimal. Economic orthodoxy advocates spot markets, but these require expensive infrastructure and regulatory blessing. In the second case, we characterize competitive equilibria and compare its efficiency with the idealized spot market. Options contract are private bilateral over-the-counter transactions and do not require regulatory approval. We show that the optimal social welfare is, in general, not supported. We then design optimal option prices that minimize the social welfare gap. This optimal design serves to approximate the ideal spot market for demand response using options with modest loss of efficiency. Our results are validated through numerical simulations. △ Less

Submitted 2 August, 2020; v1 submitted 3 June, 2019; originally announced June 2019.

arXiv:1904.08361 [pdf, other]

Decoupled Data Based Approach for Learning to Control Nonlinear Dynamical Systems

Authors: Ran Wang, Karthikeya Parunandi, Dan Yu, Dileep Kalathil, Suman Chakravorty

Abstract: This paper addresses the problem of learning the optimal control policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. This class of problems are typically addressed in stochastic adaptive control and reinforcement learning literature using model-based and model-free approaches respectively. Both methods rely on solving a dyna… ▽ More This paper addresses the problem of learning the optimal control policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. This class of problems are typically addressed in stochastic adaptive control and reinforcement learning literature using model-based and model-free approaches respectively. Both methods rely on solving a dynamic programming problem, either directly or indirectly, for finding the optimal closed loop control policy. The inherent `curse of dimensionality' associated with dynamic programming method makes these approaches also computationally difficult. This paper proposes a novel decoupled data-based control (D2C) algorithm that addresses this problem using a decoupled, `open loop - closed loop', approach. First, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a closed loop control is developed around this open loop trajectory by linearization of the dynamics about this nominal trajectory. By virtue of linearization, a linear quadratic regulator based algorithm can be used for this closed loop control. We show that the performance of D2C algorithm is approximately optimal. Moreover, simulation performance suggests significant reduction in training time compared to other state of the art algorithms. △ Less

Submitted 17 April, 2019; originally announced April 2019.

arXiv:1901.00959 [pdf, other]

QFlow: A Learning Approach to High QoE Video Streaming at the Wireless Edge

Authors: Rajarshi Bhattacharyya, Archana Bura, Desik Rengarajan, Mason Rumuly, Bainan Xia, Srinivas Shakkottai, Dileep Kalathil, Ricky K. P. Mok, Amogh Dhamdhere

Abstract: The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose. However, current access networks treat all packets identically, and lack the agility to determine which clients are most in need of service at a given time. Software reconfigurability of networking devices has seen wide adopt… ▽ More The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose. However, current access networks treat all packets identically, and lack the agility to determine which clients are most in need of service at a given time. Software reconfigurability of networking devices has seen wide adoption, and this in turn implies that agile control policies can be now instantiated on access networks. The goal of this work is to design, develop and demonstrate QFlow, a learning approach to create a value chain from the application on one side, to algorithms operating over reconfigurable infrastructure on the other, so that applications are able to obtain necessary resources for optimal performance. Using YouTube video streaming as an example, we illustrate how QFlow is able to adaptively provide such resources and attain a high QoE for all clients at a wireless access point. △ Less

Submitted 13 May, 2020; v1 submitted 3 January, 2019; originally announced January 2019.

Comments: Submitted to ToN in May, 2020

arXiv:1710.05394 [pdf, other]

Estimating Phase Duration for SPaT Messages

Authors: Shahana Ibrahim, Dileep Kalathil, Rene O. Sanchez, Pravin Varaiya

Abstract: A SPaT (Signal Phase and Timing) message describes for each lane the current phase at a signalized intersection together with an estimate of the residual time of that phase. Accurate SPaT messages can be used to construct a speed profile for a vehicle that reduces its fuel consumption as it approaches or leaves an intersection. This paper presents SPaT estimation algorithms at an intersection with… ▽ More A SPaT (Signal Phase and Timing) message describes for each lane the current phase at a signalized intersection together with an estimate of the residual time of that phase. Accurate SPaT messages can be used to construct a speed profile for a vehicle that reduces its fuel consumption as it approaches or leaves an intersection. This paper presents SPaT estimation algorithms at an intersection with a semi-actuated signal, using real-time signal phase measurements. The algorithms are evaluated using high-resolution data from two intersections in Montgomery County, MD. The algorithms can be readily implemented at signal controllers. The study supports three findings. First, real-time information dramatically improves the accuracy of the prediction of the residual time compared with prediction based on historical data alone. Second, as time increases the prediction of the residual time may increase or decrease. Third, as drivers differently weight errors in predicting `end of green' and `end of red', drivers on two different approaches may prefer different estimates of the residual time of the same phase. △ Less

Submitted 10 January, 2018; v1 submitted 15 October, 2017; originally announced October 2017.

Comments: 9 Pages, 13 Figures, Under review

arXiv:1608.06990 [pdf, other]

The Sharing Economy for the Smart Grid

Authors: Dileep Kalathil, Chenye Wu, Kameshwar Poolla, Pravin Varaiya

Abstract: The sharing economy has disrupted housing and transportation sectors. Homeowners can rent out their property when they are away on vacation, car owners can offer ride sharing services. These sharing economy business models are based on monetizing under-utilized infrastructure. They are enabled by peer-to-peer platforms that match eager sellers with willing buyers. Are there compelling sharing ec… ▽ More The sharing economy has disrupted housing and transportation sectors. Homeowners can rent out their property when they are away on vacation, car owners can offer ride sharing services. These sharing economy business models are based on monetizing under-utilized infrastructure. They are enabled by peer-to-peer platforms that match eager sellers with willing buyers. Are there compelling sharing economy opportunities in the electricity sector? What products or services can be shared in tomorrow's Smart Grid? We begin by exploring sharing economy opportunities in the electricity sector, and discuss regulatory and technical obstacles to these opportunities. We then study the specific problem of a collection of firms sharing their electricity storage. We characterize equilibrium prices for shared storage in a spot market. We formulate storage investment decisions of the firms as a non-convex non-cooperative game. We show that under a mild alignment condition, a Nash equilibrium exists, it is unique, and it supports the social welfare. We discuss technology platforms necessary for the physical exchange of power, and market platforms necessary to trade electricity storage. We close with synthetic examples to illustrate our ideas. △ Less

Submitted 5 September, 2016; v1 submitted 24 August, 2016; originally announced August 2016.

Comments: 11 pages, 11 figures

arXiv:1411.0728 [pdf, ps, other]

Approachability in Stackelberg Stochastic Games with Vector Costs

Authors: Dileep Kalathil, Vivek Borkar, Rahul Jain

Abstract: The notion of approachability was introduced by Blackwell [1] in the context of vector-valued repeated games. The famous Blackwell's approachability theorem prescribes a strategy for approachability, i.e., for `steering' the average cost of a given agent towards a given target set, irrespective of the strategies of the other agents. In this paper, motivated by the multi-objective optimization/deci… ▽ More The notion of approachability was introduced by Blackwell [1] in the context of vector-valued repeated games. The famous Blackwell's approachability theorem prescribes a strategy for approachability, i.e., for `steering' the average cost of a given agent towards a given target set, irrespective of the strategies of the other agents. In this paper, motivated by the multi-objective optimization/decision making problems in dynamically changing environments, we address the approachability problem in Stackelberg stochastic games with vector valued cost functions. We make two main contributions. Firstly, we give a simple and computationally tractable strategy for approachability for Stackelberg stochastic games along the lines of Blackwell's. Secondly, we give a reinforcement learning algorithm for learning the approachable strategy when the transition kernel is unknown. We also recover as a by-product Blackwell's necessary and sufficient condition for approachability for convex sets in this set up and thus a complete characterization. We also give sufficient conditions for non-convex sets. △ Less

Submitted 20 June, 2016; v1 submitted 3 November, 2014; originally announced November 2014.

Comments: 18 Pages, Submitted to Dynamic Games and Applications

arXiv:1206.3582 [pdf, other]

doi 10.1109/CDC.2012.6426587

Decentralized Learning for Multi-player Multi-armed Bandits

Authors: Dileep Kalathil, Naumaan Nayyar, Rahul Jain

Abstract: We consider the problem of distributed online learning with multiple players in multi-armed bandits (MAB) models. Each player can pick among multiple arms. When a player picks an arm, it gets a reward. We consider both i.i.d. reward model and Markovian reward model. In the i.i.d. model each arm is modelled as an i.i.d. process with an unknown distribution with an unknown mean. In the Markovian mod… ▽ More We consider the problem of distributed online learning with multiple players in multi-armed bandits (MAB) models. Each player can pick among multiple arms. When a player picks an arm, it gets a reward. We consider both i.i.d. reward model and Markovian reward model. In the i.i.d. model each arm is modelled as an i.i.d. process with an unknown distribution with an unknown mean. In the Markovian model, each arm is modelled as a finite, irreducible, aperiodic and reversible Markov chain with an unknown probability transition matrix and stationary distribution. The arms give different rewards to different players. If two players pick the same arm, there is a "collision", and neither of them get any reward. There is no dedicated control channel for coordination or communication among the players. Any other communication between the users is costly and will add to the regret. We propose an online index-based distributed learning policy called ${\tt dUCB_4}$ algorithm that trades off \textit{exploration v. exploitation} in the right way, and achieves expected regret that grows at most as near-$O(\log^2 T)$. The motivation comes from opportunistic spectrum access by multiple secondary users in cognitive radio networks wherein they must pick among various wireless channels that look different to different users. This is the first distributed learning algorithm for multi-player MABs to the best of our knowledge. △ Less

Submitted 14 June, 2012; originally announced June 2012.

Comments: 33 pages, 3 figures. Submitted to IEEE Transactions on Information Theory

Showing 1–25 of 25 results for author: Kalathil, D