Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: Kalwar, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.02125  [pdf, other

    cs.LG cs.AI math.OC

    Using General Value Functions to Learn Domain-Backed Inventory Management Policies

    Authors: Durgesh Kalwar, Omkar Shelke, Harshad Khadilkar

    Abstract: We consider the inventory management problem, where the goal is to balance conflicting objectives such as availability and wastage of a large range of products in a store. We propose a reinforcement learning (RL) approach that utilises General Value Functions (GVFs) to derive domain-backed inventory replenishment policies. The inventory replenishment decisions are modelled as a sequential decision… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  2. arXiv:2311.02119  [pdf, other

    math.OC cs.AI

    Safe Sequential Optimization for Switching Environments

    Authors: Durgesh Kalwar, Vineeth B. S

    Abstract: We consider the problem of designing a sequential decision making agent to maximize an unknown time-varying function which switches with time. At each step, the agent receives an observation of the function's value at a point decided by the agent. The observation could be corrupted by noise. The agent is also constrained to take safe decisions with high probability, i.e., the chosen points should… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  3. arXiv:2203.00874  [pdf, other

    cs.LG cs.AI

    Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

    Authors: Durgesh Kalwar, Omkar Shelke, Somjit Nath, Hardik Meisheri, Harshad Khadilkar

    Abstract: Improving sample efficiency is a key challenge in reinforcement learning, especially in environments with large state spaces and sparse rewards. In literature, this is resolved either through the use of auxiliary tasks (subgoals) or through clever exploration strategies. Exploration methods have been used to sample better trajectories in large environments while auxiliary tasks have been incorpora… ▽ More

    Submitted 27 February, 2023; v1 submitted 2 March, 2022; originally announced March 2022.