Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Meisheri, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.16171  [pdf, other

    cs.AI cs.LG cs.MA

    Multi-Agent Learning of Efficient Fulfilment and Routing Strategies in E-Commerce

    Authors: Omkar Shelke, Pranavi Pathakota, Anandsingh Chauhan, Harshad Khadilkar, Hardik Meisheri, Balaraman Ravindran

    Abstract: This paper presents an integrated algorithmic framework for minimising product delivery costs in e-commerce (known as the cost-to-serve or C2S). One of the major challenges in e-commerce is the large volume of spatio-temporally diverse orders from multiple customers, each of which has to be fulfilled from one of several warehouses using a fleet of vehicles. This results in two levels of decision-m… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  2. arXiv:2306.15913  [pdf, other

    cs.LG cs.AI

    DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces

    Authors: Pranavi Pathakota, Hardik Meisheri, Harshad Khadilkar

    Abstract: The ability to learn robust policies while generalizing over large discrete action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality. In this paper, we present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future sta… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: 17 pages

  3. arXiv:2210.17296  [pdf, other

    cs.LG cs.AI

    Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

    Authors: Harshad Khadilkar, Hardik Meisheri

    Abstract: A significant challenge in reinforcement learning is quantifying the complex relationship between actions and long-term rewards. The effects may manifest themselves over a long sequence of state-action pairs, making them hard to pinpoint. In this paper, we propose a method to link transitions with significant deviations in state with unusually large variations in subsequent rewards. Such transitio… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  4. arXiv:2203.00885  [pdf, other

    cs.LG cs.AI math.OC

    A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

    Authors: Hardik Meisheri, Somjit Nath, Mayank Baranwal, Harshad Khadilkar

    Abstract: Most existing literature on supply chain and inventory management consider stochastic demand processes with zero or constant lead times. While it is true that in certain niche scenarios, uncertainty in lead times can be ignored, most real-world scenarios exhibit stochasticity in lead times. These random fluctuations can be caused due to uncertainty in arrival of raw materials at the manufacturer's… ▽ More

    Submitted 8 March, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

  5. arXiv:2203.00874  [pdf, other

    cs.LG cs.AI

    Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

    Authors: Durgesh Kalwar, Omkar Shelke, Somjit Nath, Hardik Meisheri, Harshad Khadilkar

    Abstract: Improving sample efficiency is a key challenge in reinforcement learning, especially in environments with large state spaces and sparse rewards. In literature, this is resolved either through the use of auxiliary tasks (subgoals) or through clever exploration strategies. Exploration methods have been used to sample better trajectories in large environments while auxiliary tasks have been incorpora… ▽ More

    Submitted 27 February, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

  6. arXiv:2112.08736  [pdf, other

    cs.AI cs.LG

    Learning to Minimize Cost-to-Serve for Multi-Node Multi-Product Order Fulfilment in Electronic Commerce

    Authors: Pranavi Pathakota, Kunwar Zaid, Anulekha Dhara, Hardik Meisheri, Shaun D Souza, Dheeraj Shah, Harshad Khadilkar

    Abstract: We describe a novel decision-making problem developed in response to the demands of retail electronic commerce (e-commerce). While working with logistics and retail industry business collaborators, we found that the cost of delivery of products from the most opportune node in the supply chain (a quantity called the cost-to-serve or CTS) is a key challenge. The large scale, high stochasticity, and… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  7. arXiv:2102.11762  [pdf, other

    cs.AI cs.LG cs.MA

    School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

    Authors: Omkar Shelke, Hardik Meisheri, Harshad Khadilkar

    Abstract: Pommerman is a hybrid cooperative/adversarial multi-agent environment, with challenging characteristics in terms of partial observability, limited or no communication, sparse and delayed rewards, and restrictive computational time limits. This makes it a challenging environment for reinforcement learning (RL) approaches. In this paper, we focus on developing a curriculum for learning a robust and… ▽ More

    Submitted 24 February, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: 8 pages, Submitted to ALA workshop 2021

    Journal ref: CODS-COMAD 2022: 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)

  8. arXiv:2011.00424  [pdf, other

    cs.LG cs.MA

    Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication

    Authors: Hardik Meisheri, Harshad Khadilkar

    Abstract: We describe our solution approach for Pommerman TeamRadio, a competition environment associated with NeurIPS 2019. The defining feature of our algorithm is achieving sample efficiency within a restrictive computational budget while beating the previous years learning agents. The proposed algorithm (i) uses imitation learning to seed the policy, (ii) explicitly defines the communication protocol be… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

  9. arXiv:2006.04037  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

    Authors: Nazneen N Sultana, Hardik Meisheri, Vinita Baniwal, Somjit Nath, Balaraman Ravindran, Harshad Khadilkar

    Abstract: This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains. The problem description and solution are both adapted from a real-world business solution. The novelty of this problem with respect to supply chain literature is (i) we consider concurrent inventory management of a large number (50 to 1000) of products with shared capacity, (… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

  10. arXiv:1911.04947  [pdf, other

    cs.LG stat.ML

    Accelerating Training in Pommerman with Imitation and Reinforcement Learning

    Authors: Hardik Meisheri, Omkar Shelke, Richa Verma, Harshad Khadilkar

    Abstract: The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a multi-agent setting. We focus on the 2$\times$2 team version of Pommerman, developed for a competition at NeurIPS 2018. Our methodology involves training an agent initially through imitation learning on a noisy expert policy, followed by a proximal-policy optimizat… ▽ More

    Submitted 13 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: Presented at Deep Reinforcement Learning workshop, NeurIPS-2019

  11. arXiv:1911.02771  [pdf, other

    cs.SI cs.CY physics.soc-ph

    Characterizing behavioral trends in a community driven discussion platform

    Authors: Sachin Thukral, Arnab Chatterjee, Hardik Meisheri, Tushar Kataria, Aman Agarwal, Ishan Verma, Lipika Dey

    Abstract: This article presents a systematic analysis of the patterns of behavior of individuals as well as groups observed in community-driven platforms for discussion like Reddit, where users usually exchange information and viewpoints on their topics of interest. We perform a statistical analysis of the behavior of posts and model the users' interactions around them. A platform like Reddit which has grow… ▽ More

    Submitted 7 November, 2019; originally announced November 2019.

    Comments: 19 pages. Extended version of arxiv:1809.07087. Springer Lecture Notes Format, to be published in Lecture Notes in Social Networks (Springer)

  12. arXiv:1910.00211  [pdf, other

    cs.AI cs.LG eess.SY

    Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems

    Authors: Hardik Meisheri, Vinita Baniwal, Nazneen N Sultana, Balaraman Ravindran, Harshad Khadilkar

    Abstract: This paper describes a purely data-driven solution to a class of sequential decision-making problems with a large number of concurrent online decisions, with applications to computing systems and operations research. We assume that while the micro-level behaviour of the system can be broadly captured by analytical expressions or simulation, the macro-level or emergent behaviour is complicated by n… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: 22 pages, 10 figures

  13. arXiv:1809.07087  [pdf, other

    cs.SI cs.CY cs.MA physics.soc-ph

    Analyzing behavioral trends in community driven discussion platforms like Reddit

    Authors: Sachin Thukral, Hardik Meisheri, Tushar Kataria, Aman Agarwal, Ishan Verma, Arnab Chatterjee, Lipika Dey

    Abstract: The aim of this paper is to present methods to systematically analyze individual and group behavioral patterns observed in community driven discussion platforms like Reddit where users exchange information and views on various topics of current interest. We conduct this study by analyzing the statistical behavior of posts and modeling user interactions around them. We have chosen Reddit as an exam… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: 8 pages, 9 figs, ASONAM 2018

  14. arXiv:1802.09046  [pdf, other

    cs.NE q-bio.NC

    Multiclass Common Spatial Pattern for EEG based Brain Computer Interface with Adaptive Learning Classifier

    Authors: Hardik Meisheri, Nagraj Ramrao, Suman Mitra

    Abstract: In Brain Computer Interface (BCI), data generated from Electroencephalogram (EEG) is non-stationary with low signal to noise ratio and contaminated with artifacts. Common Spatial Pattern (CSP) algorithm has been proved to be effective in BCI for extracting features in motor imagery tasks, but it is prone to overfitting. Many algorithms have been devised to regularize CSP for two class problem, how… ▽ More

    Submitted 6 March, 2021; v1 submitted 25 February, 2018; originally announced February 2018.

  15. arXiv:1710.02745  [pdf, other

    cs.CL

    Multi-Document Summarization using Distributed Bag-of-Words Model

    Authors: Kaustubh Mani, Ishan Verma, Hardik Meisheri, Lipika Dey

    Abstract: As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since it can provide the main ideas in a document set in short time. In this paper, we present an unsupervised centroid-based document-level reconstruction framework using distributed bag of words model. Specifically, our approach selects summary sentences in order to mi… ▽ More

    Submitted 11 June, 2018; v1 submitted 7 October, 2017; originally announced October 2017.