Search | arXiv e-print repository

Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs

Authors: Doseok Jang, Lucas Spangher, Manan Khattar, Utkarsha Agwan, Selvaprabuh Nadarajah, Costas Spanos

Abstract: Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costly and will be limited. In this work, we examine how offline training can be leveraged to minimize data costs (accelerate convergence) and program impl… ▽ More Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costly and will be limited. In this work, we examine how offline training can be leveraged to minimize data costs (accelerate convergence) and program implementation costs. We present two approaches to doing so: pretraining our model to warm start the experiment with simulated tasks, and using a planning model trained to simulate the real world's rewards to the agent. We present results that demonstrate the utility of offline reinforcement learning to efficient price-setting in the energy demand response problem. △ Less

Submitted 14 August, 2021; originally announced August 2021.

Comments: arXiv admin note: text overlap with arXiv:2104.14670

arXiv:2011.10690 [pdf, other]

Self-adapting Robustness in Demand Learning

Authors: Boxiao Chen, Selvaprabu Nadarajah, Parshan Pakiman, Stefanus Jasin

Abstract: We study dynamic pricing over a finite number of periods in the presence of demand model ambiguity. Departing from the typical no-regret learning environment, where price changes are allowed at any time, pricing decisions are made at pre-specified points in time and each price can be applied to a large number of arrivals. In this environment, which arises in retailing, a pricing decision based on… ▽ More We study dynamic pricing over a finite number of periods in the presence of demand model ambiguity. Departing from the typical no-regret learning environment, where price changes are allowed at any time, pricing decisions are made at pre-specified points in time and each price can be applied to a large number of arrivals. In this environment, which arises in retailing, a pricing decision based on an incorrect demand model can significantly impact cumulative revenue. We develop an adaptively-robust-learning (ARL) pricing policy that learns the true model parameters from the data while actively managing demand model ambiguity. It optimizes an objective that is robust with respect to a self-adapting set of demand models, where a given model is included in this set only if the sales data revealed from prior pricing decisions makes it "probable". As a result, it gracefully transitions from being robust when demand model ambiguity is high to minimizing regret when this ambiguity diminishes upon receiving more data. We characterize the stochastic behavior of ARL's self-adapting ambiguity sets and derive a regret bound that highlights the link between the scale of revenue loss and the customer arrival pattern. We also show that ARL, by being conscious of both model ambiguity and revenue, bridges the gap between a distributionally robust policy and a follow-the-leader policy, which focus on model ambiguity and revenue, respectively. We numerically find that the ARL policy, or its extension thereof, exhibits superior performance compared to distributionally robust, follow-the-leader, and upper-confidence-bound policies in terms of expected revenue and/or value at risk. △ Less

Submitted 20 November, 2020; originally announced November 2020.

MSC Class: 90C17; 68T05; 68W27; 68Q25

arXiv:2001.02798 [pdf, other]

Self-guided Approximate Linear Programs

Authors: Parshan Pakiman, Selvaprabu Nadarajah, Negar Soheili, Qihang Lin

Abstract: Approximate linear programs (ALPs) are well-known models based on value function approximations (VFAs) to obtain policies and lower bounds on the optimal policy cost of discounted-cost Markov decision processes (MDPs). Formulating an ALP requires (i) basis functions, the linear combination of which defines the VFA, and (ii) a state-relevance distribution, which determines the relative importance o… ▽ More Approximate linear programs (ALPs) are well-known models based on value function approximations (VFAs) to obtain policies and lower bounds on the optimal policy cost of discounted-cost Markov decision processes (MDPs). Formulating an ALP requires (i) basis functions, the linear combination of which defines the VFA, and (ii) a state-relevance distribution, which determines the relative importance of different states in the ALP objective for the purpose of minimizing VFA error. Both these choices are typically heuristic: basis function selection relies on domain knowledge while the state-relevance distribution is specified using the frequency of states visited by a heuristic policy. We propose a self-guided sequence of ALPs that embeds random basis functions obtained via inexpensive sampling and uses the known VFA from the previous iteration to guide VFA computation in the current iteration. Self-guided ALPs mitigate the need for domain knowledge during basis function selection as well as the impact of the initial choice of the state-relevance distribution, thus significantly reducing the ALP implementation burden. We establish high probability error bounds on the VFAs from this sequence and show that a worst-case measure of policy performance is improved. We find that these favorable implementation and theoretical properties translate to encouraging numerical results on perishable inventory control and options pricing applications, where self-guided ALP policies improve upon policies from problem-specific methods. More broadly, our research takes a meaningful step toward application-agnostic policies and bounds for MDPs. △ Less

Submitted 12 October, 2021; v1 submitted 8 January, 2020; originally announced January 2020.

Comments: 52 pages

MSC Class: 90C39; 90C40; 90C05; 90C06; 90C15; 90C22; 90C90; 46C07; 93E20; 93E35; 68T99; 65K99 ACM Class: I.2.8; G.1.2; G.3

arXiv:1912.00539 [pdf, other]

An asynchronous incomplete block LU preconditioner for computational fluid dynamics on unstructured grids

Authors: Aditya Kashi, Siva Nadarajah

Abstract: We present a study of the effectiveness of asynchronous incomplete LU factorization preconditioners for the time-implicit solution of compressible flow problems while exploiting thread-parallelism within a compute node. A block variant of the asynchronous fine-grain parallel preconditioner adapted to a finite volume discretization of the compressible Navier-Stokes equations on unstructured grids i… ▽ More We present a study of the effectiveness of asynchronous incomplete LU factorization preconditioners for the time-implicit solution of compressible flow problems while exploiting thread-parallelism within a compute node. A block variant of the asynchronous fine-grain parallel preconditioner adapted to a finite volume discretization of the compressible Navier-Stokes equations on unstructured grids is presented, and convergence theory is extended to the new variant. Experimental (numerical) results on the performance of these preconditioners on inviscid and viscous laminar two-dimensional steady-state test cases are reported. It is found, for these compressible flow problems, that the block variant performs much better in terms of convergence, parallel scalability and reliability than the original scalar asynchronous ILU preconditioner. For viscous flow, it is found that the ordering of unknowns may determine the success or failure of asynchronous block-ILU preconditioning, and an ordering of grid cells suitable for solving viscous problems is presented. △ Less

Submitted 4 October, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

Comments: Accepted by SIAM SISC

MSC Class: 65F08; 65Y05; 65N22

arXiv:1908.03077 [pdf, ps, other]

A Data Efficient and Feasible Level Set Method for Stochastic Convex Optimization with Expectation Constraints

Authors: Qihang Lin, Selvaprabu Nadarajah, Negar Soheili, Tianbao Yang

Abstract: Stochastic convex optimization problems with expectation constraints (SOECs) are encountered in statistics and machine learning, business, and engineering. In data-rich environments, the SOEC objective and constraints contain expectations defined with respect to large datasets. Therefore, efficient algorithms for solving such SOECs need to limit the fraction of data points that they use, which we… ▽ More Stochastic convex optimization problems with expectation constraints (SOECs) are encountered in statistics and machine learning, business, and engineering. In data-rich environments, the SOEC objective and constraints contain expectations defined with respect to large datasets. Therefore, efficient algorithms for solving such SOECs need to limit the fraction of data points that they use, which we refer to as algorithmic data complexity. Recent stochastic first order methods exhibit low data complexity when handling SOECs but guarantee near-feasibility and near-optimality only at convergence. These methods may thus return highly infeasible solutions when heuristically terminated, as is often the case, due to theoretical convergence criteria being highly conservative. This issue limits the use of first order methods in several applications where the SOEC constraints encode implementation requirements. We design a stochastic feasible level set method (SFLS) for SOECs that has low data complexity and emphasizes feasibility before convergence. Specifically, our level-set method solves a root-finding problem by calling a novel first order oracle that computes a stochastic upper bound on the level-set function by extending mirror descent and online validation techniques. We establish that SFLS maintains a high-probability feasible solution at each root-finding iteration and exhibits favorable iteration complexity compared to state-of-the-art deterministic feasible level set and stochastic subgradient methods. Numerical experiments on three diverse applications validate the low data complexity of SFLS relative to the former approach and highlight how SFLS finds feasible solutions with small optimality gaps significantly faster than the latter method. △ Less

Submitted 1 January, 2020; v1 submitted 7 August, 2019; originally announced August 2019.

arXiv:1804.03731 [pdf, ps, other]

doi 10.1016/j.cma.2019.04.002

On the Geometric Conservation Law for the Non Linear Frequency Domain and Time-Spectral Methods

Authors: Marc Benoit, Siva Nadarajah

Abstract: The aim of this paper is to present and validate two new procedures to enforce the Geometric Conservation Law (GCL) on a moving grid for an Arbitrary Lagrangian Eulerian (ALE) formulation of the Euler equations discretized in time for either the Non Linear Frequency Domain (NLFD) or Time-Spectral (TS) methods. The equations are spatially discretized by a structured finite-volume scheme on a hexahe… ▽ More The aim of this paper is to present and validate two new procedures to enforce the Geometric Conservation Law (GCL) on a moving grid for an Arbitrary Lagrangian Eulerian (ALE) formulation of the Euler equations discretized in time for either the Non Linear Frequency Domain (NLFD) or Time-Spectral (TS) methods. The equations are spatially discretized by a structured finite-volume scheme on a hexahedral mesh. The derived methodologies follow a general approach where the positions and the velocities of the grid points are known at each time step. The integrated face mesh velocities are derived either from the Approximation of the Exact Volumetric Increments (AEVI) relative to the undeformed mesh or exactly computed based on a Trilinear Mapping (TRI-MAP) between the physical space and the computational domain. The accuracy of the AEVI method highly depends on the computation of the volumetric increments and limits the temporal-order of accuracy of the deduced integrated face mesh velocities to between one and two. Thus defeating the purpose of the NLFD method which possesses spectral rate of convergence. However, the TRI-MAP method has proven to be more computationally efficient, ensuring the satisfaction of the GCL once the convergence of the time derivative of the cell volume is reached in Fourier space. The methods are validated numerically by verifying the conservation of uniform flow and by comparing the integrated face mesh velocities to the exact values derived from the mapping. △ Less

Submitted 10 April, 2018; originally announced April 2018.

Showing 1–6 of 6 results for author: Nadarajah, S