Skip to main content

Showing 1–24 of 24 results for author: Khadilkar, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.15478  [pdf, other

    cs.LG stat.ML

    Transformers are Expressive, But Are They Expressive Enough for Regression?

    Authors: Swaroop Nath, Harshad Khadilkar, Pushpak Bhattacharyya

    Abstract: Transformers have become pivotal in Natural Language Processing, demonstrating remarkable success in applications like Machine Translation and Summarization. Given their widespread adoption, several works have attempted to analyze the expressivity of Transformers. Expressivity of a neural network is the class of functions it can approximate. A neural network is fully expressive if it can act as a… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: 18 pages, 10 figures, 3 tables

  2. arXiv:2402.15473  [pdf, other

    cs.CL cs.LG

    Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization

    Authors: Swaroop Nath, Tejpalsingh Siledar, Sankara Sri Raghava Ravindra Muddu, Rupasai Rangaraju, Harshad Khadilkar, Pushpak Bhattacharyya, Suman Banerjee, Amey Patil, Sudhanshu Shekhar Singh, Muthusamy Chelliah, Nikesh Garera

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has become a dominating strategy in aligning Language Models (LMs) with human values/goals. The key to the strategy is learning a reward model ($\varphi$), which can reflect the latent reward model of humans. While this strategy has proven effective, the training methodology requires a lot of human preference annotation (usually in the order of ten… ▽ More

    Submitted 18 April, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: 19 pages, 6 figures, 21 tables

  3. arXiv:2311.17514  [pdf, other

    cs.CL cs.AI

    Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement Learning

    Authors: Swaroop Nath, Harshad Khadilkar, Pushpak Bhattacharyya

    Abstract: Query-focused Summarization (QfS) deals with systems that generate summaries from document(s) based on a query. Motivated by the insight that Reinforcement Learning (RL) provides a generalization to Supervised Learning (SL) for Natural Language Generation, and thereby performs better (empirically) than SL, we use an RL-based approach for this task of QfS. Additionally, we also resolve the conflict… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  4. arXiv:2311.16171  [pdf, other

    cs.AI cs.LG cs.MA

    Multi-Agent Learning of Efficient Fulfilment and Routing Strategies in E-Commerce

    Authors: Omkar Shelke, Pranavi Pathakota, Anandsingh Chauhan, Harshad Khadilkar, Hardik Meisheri, Balaraman Ravindran

    Abstract: This paper presents an integrated algorithmic framework for minimising product delivery costs in e-commerce (known as the cost-to-serve or C2S). One of the major challenges in e-commerce is the large volume of spatio-temporally diverse orders from multiple customers, each of which has to be fulfilled from one of several warehouses using a fleet of vehicles. This results in two levels of decision-m… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  5. arXiv:2311.02125  [pdf, other

    cs.LG cs.AI math.OC

    Using General Value Functions to Learn Domain-Backed Inventory Management Policies

    Authors: Durgesh Kalwar, Omkar Shelke, Harshad Khadilkar

    Abstract: We consider the inventory management problem, where the goal is to balance conflicting objectives such as availability and wastage of a large range of products in a store. We propose a reinforcement learning (RL) approach that utilises General Value Functions (GVFs) to derive domain-backed inventory replenishment policies. The inventory replenishment decisions are modelled as a sequential decision… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  6. arXiv:2307.05189  [pdf, other

    cs.LG

    Using Linear Regression for Iteratively Training Neural Networks

    Authors: Harshad Khadilkar

    Abstract: We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. Ho… ▽ More

    Submitted 14 July, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: 10 pages

  7. arXiv:2306.15913  [pdf, other

    cs.LG cs.AI

    DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces

    Authors: Pranavi Pathakota, Hardik Meisheri, Harshad Khadilkar

    Abstract: The ability to learn robust policies while generalizing over large discrete action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality. In this paper, we present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future sta… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: 17 pages

  8. arXiv:2305.07571  [pdf, other

    cs.NE cs.AI

    Supplementing Gradient-Based Reinforcement Learning with Simple Evolutionary Ideas

    Authors: Harshad Khadilkar

    Abstract: We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL), through the use of evolutionary operators. The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space. Unlike prior literature on… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: 17 pages

  9. arXiv:2210.17296  [pdf, other

    cs.LG cs.AI

    Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

    Authors: Harshad Khadilkar, Hardik Meisheri

    Abstract: A significant challenge in reinforcement learning is quantifying the complex relationship between actions and long-term rewards. The effects may manifest themselves over a long sequence of state-action pairs, making them hard to pinpoint. In this paper, we propose a method to link transitions with significant deviations in state with unusually large variations in subsequent rewards. Such transitio… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  10. arXiv:2207.13916  [pdf, other

    cs.CV cs.AI cs.LG

    A Novel Data Augmentation Technique for Out-of-Distribution Sample Detection using Compounded Corruptions

    Authors: Ramya S. Hebbalaguppe, Soumya Suvra Goshal, Jatin Prakash, Harshad Khadilkar, Chetan Arora

    Abstract: Modern deep neural network models are known to erroneously classify out-of-distribution (OOD) test data into one of the in-distribution (ID) training classes with high confidence. This can have disastrous consequences for safety-critical applications. A popular mitigation strategy is to train a separate classifier that can detect such OOD samples at the test time. In most practical settings OOD ex… ▽ More

    Submitted 21 September, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: 16 pages of the main text, and supplemental material. Accepted in Research Track ECML'22. Project webpage: https://cnc-ood.github.io/

  11. arXiv:2206.06618  [pdf, other

    cs.AI

    Solving the capacitated vehicle routing problem with timing windows using rollouts and MAX-SAT

    Authors: Harshad Khadilkar

    Abstract: The vehicle routing problem is a well known class of NP-hard combinatorial optimisation problems in literature. Traditional solution methods involve either carefully designed heuristics, or time-consuming metaheuristics. Recent work in reinforcement learning has been a promising alternative approach, but has found it difficult to compete with traditional methods in terms of solution quality. This… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: 6 pages, 2 figures

    MSC Class: 90-08 ACM Class: I.2

  12. arXiv:2203.00885  [pdf, other

    cs.LG cs.AI math.OC

    A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

    Authors: Hardik Meisheri, Somjit Nath, Mayank Baranwal, Harshad Khadilkar

    Abstract: Most existing literature on supply chain and inventory management consider stochastic demand processes with zero or constant lead times. While it is true that in certain niche scenarios, uncertainty in lead times can be ignored, most real-world scenarios exhibit stochasticity in lead times. These random fluctuations can be caused due to uncertainty in arrival of raw materials at the manufacturer's… ▽ More

    Submitted 8 March, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

  13. arXiv:2203.00874  [pdf, other

    cs.LG cs.AI

    Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

    Authors: Durgesh Kalwar, Omkar Shelke, Somjit Nath, Hardik Meisheri, Harshad Khadilkar

    Abstract: Improving sample efficiency is a key challenge in reinforcement learning, especially in environments with large state spaces and sparse rewards. In literature, this is resolved either through the use of auxiliary tasks (subgoals) or through clever exploration strategies. Exploration methods have been used to sample better trajectories in large environments while auxiliary tasks have been incorpora… ▽ More

    Submitted 27 February, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

  14. arXiv:2112.08736  [pdf, other

    cs.AI cs.LG

    Learning to Minimize Cost-to-Serve for Multi-Node Multi-Product Order Fulfilment in Electronic Commerce

    Authors: Pranavi Pathakota, Kunwar Zaid, Anulekha Dhara, Hardik Meisheri, Shaun D Souza, Dheeraj Shah, Harshad Khadilkar

    Abstract: We describe a novel decision-making problem developed in response to the demands of retail electronic commerce (e-commerce). While working with logistics and retail industry business collaborators, we found that the cost of delivery of products from the most opportune node in the supply chain (a quantity called the cost-to-serve or CTS) is a key challenge. The large scale, high stochasticity, and… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  15. arXiv:2108.07555  [pdf, other

    cs.LG cs.AI eess.SY math.OC

    Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

    Authors: Somjit Nath, Mayank Baranwal, Harshad Khadilkar

    Abstract: Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that algorithms fail to learn anything substantial. This paper formally describes the notion of Markov Decision Processes (MDPs) with stochastic delays and shows that dela… ▽ More

    Submitted 17 August, 2021; originally announced August 2021.

    Comments: Accepted at CIKM'21

  16. arXiv:2102.12088  [pdf, other

    cs.AI cs.LG

    Fast Approximate Solutions using Reinforcement Learning for Dynamic Capacitated Vehicle Routing with Time Windows

    Authors: Nazneen N Sultana, Vinita Baniwal, Ansuma Basumatary, Piyush Mittal, Supratim Ghosh, Harshad Khadilkar

    Abstract: This paper develops an inherently parallelised, fast, approximate learning-based solution to the generic class of Capacitated Vehicle Routing Problems with Time Windows and Dynamic Routing (CVRP-TWDR). Considering vehicles in a fleet as decentralised agents, we postulate that using reinforcement learning (RL) based adaptation is a key enabler for real-time route formation in a dynamic environment.… ▽ More

    Submitted 14 April, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: 9 pages

  17. arXiv:2102.11762  [pdf, other

    cs.AI cs.LG cs.MA

    School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

    Authors: Omkar Shelke, Hardik Meisheri, Harshad Khadilkar

    Abstract: Pommerman is a hybrid cooperative/adversarial multi-agent environment, with challenging characteristics in terms of partial observability, limited or no communication, sparse and delayed rewards, and restrictive computational time limits. This makes it a challenging environment for reinforcement learning (RL) approaches. In this paper, we focus on developing a curriculum for learning a robust and… ▽ More

    Submitted 24 February, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: 8 pages, Submitted to ALA workshop 2021

    Journal ref: CODS-COMAD 2022: 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)

  18. arXiv:2011.00424  [pdf, other

    cs.LG cs.MA

    Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication

    Authors: Hardik Meisheri, Harshad Khadilkar

    Abstract: We describe our solution approach for Pommerman TeamRadio, a competition environment associated with NeurIPS 2019. The defining feature of our algorithm is achieving sample efficiency within a restrictive computational budget while beating the previous years learning agents. The proposed algorithm (i) uses imitation learning to seed the policy, (ii) explicitly defines the communication protocol be… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

  19. arXiv:2007.00463  [pdf, other

    cs.AI

    A Generalized Reinforcement Learning Algorithm for Online 3D Bin-Packing

    Authors: Richa Verma, Aniruddha Singhal, Harshad Khadilkar, Ansuma Basumatary, Siddharth Nayak, Harsh Vardhan Singh, Swagat Kumar, Rajesh Sinha

    Abstract: We propose a Deep Reinforcement Learning (Deep RL) algorithm for solving the online 3D bin packing problem for an arbitrary number of bins and any bin size. The focus is on producing decisions that can be physically implemented by a robotic loading arm, a laboratory prototype used for testing the concept. The problem considered in this paper is novel in two ways. First, unlike the traditional 3D b… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: 9 pages, 9 figures

  20. arXiv:2006.04037  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

    Authors: Nazneen N Sultana, Hardik Meisheri, Vinita Baniwal, Somjit Nath, Balaraman Ravindran, Harshad Khadilkar

    Abstract: This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains. The problem description and solution are both adapted from a real-world business solution. The novelty of this problem with respect to supply chain literature is (i) we consider concurrent inventory management of a large number (50 to 1000) of products with shared capacity, (… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

  21. arXiv:2004.09846  [pdf, other

    cs.LG cs.AI stat.ML

    SIBRE: Self Improvement Based REwards for Adaptive Feedback in Reinforcement Learning

    Authors: Somjit Nath, Richa Verma, Abhik Ray, Harshad Khadilkar

    Abstract: We propose a generic reward shaping approach for improving the rate of convergence in reinforcement learning (RL), called Self Improvement Based REwards, or SIBRE. The approach is designed for use in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance. We prove that SIBRE converges in expectation under the same conditions as the o… ▽ More

    Submitted 21 December, 2020; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: 7 pages, 10 figures

  22. arXiv:2003.14093  [pdf, other

    physics.soc-ph cs.AI cs.LG q-bio.PE stat.ML

    Optimising Lockdown Policies for Epidemic Control using Reinforcement Learning

    Authors: Harshad Khadilkar, Tanuja Ganu, Deva P Seetharam

    Abstract: In the context of the ongoing Covid-19 pandemic, several reports and studies have attempted to model and predict the spread of the disease. There is also intense debate about policies for limiting the damage, both to health and to the economy. On the one hand, the health and safety of the population is the principal consideration for most countries. On the other hand, we cannot ignore the potentia… ▽ More

    Submitted 1 May, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

  23. arXiv:1911.04947  [pdf, other

    cs.LG stat.ML

    Accelerating Training in Pommerman with Imitation and Reinforcement Learning

    Authors: Hardik Meisheri, Omkar Shelke, Richa Verma, Harshad Khadilkar

    Abstract: The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a multi-agent setting. We focus on the 2$\times$2 team version of Pommerman, developed for a competition at NeurIPS 2018. Our methodology involves training an agent initially through imitation learning on a noisy expert policy, followed by a proximal-policy optimizat… ▽ More

    Submitted 13 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: Presented at Deep Reinforcement Learning workshop, NeurIPS-2019

  24. arXiv:1910.00211  [pdf, other

    cs.AI cs.LG eess.SY

    Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems

    Authors: Hardik Meisheri, Vinita Baniwal, Nazneen N Sultana, Balaraman Ravindran, Harshad Khadilkar

    Abstract: This paper describes a purely data-driven solution to a class of sequential decision-making problems with a large number of concurrent online decisions, with applications to computing systems and operations research. We assume that while the micro-level behaviour of the system can be broadly captured by analytical expressions or simulation, the macro-level or emergent behaviour is complicated by n… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: 22 pages, 10 figures