Zum Hauptinhalt springen

Showing 1–50 of 54 results for author: Gopalan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15648  [pdf, ps, other

    cs.LG math.ST stat.ML

    Testing the Feasibility of Linear Programs with Bandit Feedback

    Authors: Aditya Gangrade, Aditya Gopalan, Venkatesh Saligrama, Clayton Scott

    Abstract: While the recent literature has seen a surge in the study of constrained bandit problems, all existing methods for these begin by assuming the feasibility of the underlying problem. We initiate the study of testing such feasibility assumptions, and in particular address the problem in the linear bandit setting, thus characterising the costs of feasibility testing for an unknown linear program usin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Spotlight presentation at ICML 2024

  2. A Multilingual Virtual Guide for Self-Attachment Technique

    Authors: Alicia Jiayun Law, Ruoyu Hu, Lisa Alazraki, Anandha Gopalan, Neophytos Polydorou, Abbas Edalat

    Abstract: In this work, we propose a computational framework that leverages existing out-of-language data to create a conversational agent for the delivery of Self-Attachment Technique (SAT) in Mandarin. Our framework does not require large-scale human translations, yet it achieves a comparable performance whilst also maintaining safety and reliability. We propose two different methods of augmenting availab… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Journal ref: 2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)

  3. arXiv:2310.09358  [pdf, other

    cs.LG cs.AI

    Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs

    Authors: Debangshu Banerjee, Aditya Gopalan

    Abstract: Parametric, feature-based reward models are employed by a variety of algorithms in decision-making settings such as bandits and Markov decision processes (MDPs). The typical assumption under which the algorithms are analysed is realizability, i.e., that the true values of actions are perfectly explained by some parametric model in the class. We are, however, interested in the situation where the t… ▽ More

    Submitted 22 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  4. arXiv:2309.02898  [pdf, other

    cs.LG cs.CV

    A Unified Framework for Discovering Discrete Symmetries

    Authors: Pavan Karjol, Rohan Kashyap, Aditya Gopalan, Prathosh A. P

    Abstract: We consider the problem of learning a function respecting a symmetry from among a class of symmetries. We develop a unified framework that enables symmetry discovery across a broad range of subgroups including locally symmetric, dihedral and cyclic subgroups. At the core of the framework is a novel architecture composed of linear, matrix-valued and non-linear functions that expresses functions inv… ▽ More

    Submitted 27 October, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  5. arXiv:2305.04516  [pdf, other

    cs.CV cs.AI cs.LG

    Robust Traffic Light Detection Using Salience-Sensitive Loss: Computational Framework and Evaluations

    Authors: Ross Greer, Akshay Gopalkrishnan, Jacob Landgren, Lulua Rakla, Anish Gopalan, Mohan Trivedi

    Abstract: One of the most important tasks for ensuring safe autonomous driving systems is accurately detecting road traffic lights and accurately determining how they impact the driver's actions. In various real-world driving situations, a scene may have numerous traffic lights with varying levels of relevance to the driver, and thus, distinguishing and detecting the lights that are relevant to the driver a… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

  6. arXiv:2301.05838  [pdf

    cs.CV cs.AI cs.HC

    (Safe) SMART Hands: Hand Activity Analysis and Distraction Alerts Using a Multi-Camera Framework

    Authors: Ross Greer, Lulua Rakla, Anish Gopalan, Mohan Trivedi

    Abstract: Manual (hand-related) activity is a significant source of crash risk while driving. Accordingly, analysis of hand position and hand activity occupation is a useful component to understanding a driver's readiness to take control of a vehicle. Visual sensing through cameras provides a passive means of observing the hands, but its effectiveness varies depending on camera location. We introduce an alg… ▽ More

    Submitted 29 January, 2023; v1 submitted 14 January, 2023; originally announced January 2023.

  7. arXiv:2301.03597  [pdf, ps, other

    cs.LG

    On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces

    Authors: Debangshu Banerjee, Aditya Gopalan

    Abstract: As noted in the works of \cite{lattimore2020bandit}, it has been mentioned that it is an open problem to characterize the minimax regret of linear bandits in a wide variety of action spaces. In this article we present an optimal regret lower bound for a wide class of convex action spaces.

    Submitted 9 January, 2023; originally announced January 2023.

  8. arXiv:2207.11597  [pdf, other

    cs.LG cs.AI stat.ML

    Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

    Authors: Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

    Abstract: We present a non-asymptotic lower bound on the eigenspectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as $Ω(\sqrt{n})$ whenever the expected cumulative regret of the algorithm is $O(\sqrt{n})$, where $n$ is the learning ho… ▽ More

    Submitted 7 January, 2023; v1 submitted 23 July, 2022; originally announced July 2022.

    Comments: Resubmit

  9. arXiv:2207.09090  [pdf, other

    cs.LG cs.AI eess.SY

    Actor-Critic based Improper Reinforcement Learning

    Authors: Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

    Abstract: We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.08201

  10. arXiv:2205.13617  [pdf, other

    cs.LG math.OC

    Demystifying Approximate Value-based RL with $ε$-greedy Exploration: A Differential Inclusion View

    Authors: Aditya Gopalan, Gugan Thoppe

    Abstract: Q-learning and SARSA with $ε$-greedy exploration are leading reinforcement learning methods. Their tabular forms converge to the optimal Q-function under reasonable conditions. However, with function approximation, these methods exhibit strange behaviors such as policy oscillation, chattering, and convergence to different attractors (possibly even the worst policy) on different runs, apart from th… ▽ More

    Submitted 10 February, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 22 pages, 3 figures

    MSC Class: 93E35; 68Q32 ACM Class: I.2.0

  11. arXiv:2203.16810  [pdf, other

    cs.LG

    Adaptive Estimation of Random Vectors with Bandit Feedback: A mean-squared error viewpoint

    Authors: Dipayan Sen, L. A. Prashanth, Aditya Gopalan

    Abstract: We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round. We first establish a concentration bound for MSE estimation. We then frame the estimation problem with bandit feedback, and propose a variant of the successive elimination algorithm. We also derive a… ▽ More

    Submitted 11 January, 2024; v1 submitted 31 March, 2022; originally announced March 2022.

  12. arXiv:2201.07306  [pdf, other

    cs.LG

    Bregman Deviations of Generic Exponential Families

    Authors: Sayak Ray Chowdhury, Patrick Saux, Odalric-Ambrym Maillard, Aditya Gopalan

    Abstract: We revisit the method of mixture technique, also known as the Laplace method, to study the concentration phenomenon in generic exponential families. Combining the properties of Bregman divergence associated with log-partition function of the family with the method of mixtures for super-martingales, we establish a generic bound controlling the Bregman divergence between the parameter of the family… ▽ More

    Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

  13. arXiv:2112.00819  [pdf, other

    cs.CL

    CO-STAR: Conceptualisation of Stereotypes for Analysis and Reasoning

    Authors: Teyun Kwon, Anandha Gopalan

    Abstract: Warning: this paper contains material which may be offensive or upsetting. While much of recent work has focused on the detection of hate speech and overtly offensive content, very little research has explored the more subtle but equally harmful language in the form of implied stereotypes. This is a challenging domain, made even more so by the fact that humans often struggle to understand and re… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: 12 pages, 1 figure

  14. arXiv:2110.12916  [pdf, other

    cs.LG stat.ML

    On Slowly-varying Non-stationary Bandits

    Authors: Ramakrishnan Krishnamurthy, Aditya Gopalan

    Abstract: We consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms' rewards are stochastic and independent over time, but that the absolute difference between the expected rewards of any arm at any two consecutive time-steps is at most a drift limit $δ> 0$. For this setting that has not received enough attention in the past, we give a n… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: Under submission

  15. arXiv:2110.09648  [pdf, other

    math.PR cs.NI

    Data Flow Dissemination in a Network

    Authors: Aditya Gopalan, Alexander Stolyar

    Abstract: We consider the following network model motivated, in particular, by blockchains and peer-to-peer live streaming. Data packet flows arrive at the network nodes and need to be disseminated to all other nodes. Packets are relayed through the network via links of finite capacity. A packet leaves the network when it is disseminated to all nodes. Our focus is on two communication disciplines, which det… ▽ More

    Submitted 5 September, 2023; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: Revision. 26 pages, 6 figures, 3 tables

    MSC Class: 90B15; 60K25

  16. arXiv:2107.10492  [pdf, other

    cs.LG cs.IT stat.ML

    Bandit Quickest Changepoint Detection

    Authors: Aditya Gopalan, Venkatesh Saligrama, Braghadeesh Lakshminarayanan

    Abstract: Many industrial and security applications employ a suite of sensors for detecting abrupt changes in temporal behavior patterns. These abrupt changes typically manifest locally, rendering only a small subset of sensors informative. Continuous monitoring of every sensor can be expensive due to resource constraints, and serves as a motivation for the bandit quickest changepoint detection problem, whe… ▽ More

    Submitted 13 June, 2023; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Some typos fixed in the NeurIPS 2021 version

  17. arXiv:2105.12849  [pdf, ps, other

    cs.LG

    CARLS: Cross-platform Asynchronous Representation Learning System

    Authors: Chun-Ta Lu, Yun Zeng, Da-Cheng Juan, Yicheng Fan, Zhe Li, Jan Dlabal, Yi-Ting Chen, Arjun Gopalan, Allan Heydon, Chun-Sung Ferng, Reah Miyara, Ariel Fuxman, Futang Peng, Zhen Li, Tom Duerig, Andrew Tomkins

    Abstract: In this work, we propose CARLS, a novel framework for augmenting the capacity of existing deep learning frameworks by enabling multiple components -- model trainers, knowledge makers and knowledge banks -- to concertedly work together in an asynchronous fashion across hardware platforms. The proposed CARLS is particularly suitable for learning paradigms where model training benefits from additiona… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  18. arXiv:2105.00210  [pdf, other

    cs.LG

    Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling

    Authors: Mohammani Zaki, Avi Mohan, Aditya Gopalan, Shie Mannor

    Abstract: We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. Modern communication systems are becoming increasingly complex, and are required to handle multiple types of traffic with widely varying characteristics such as arrival rates and service times. This, coupled with the need for rapid network deployment, render a bottom up approach of first… ▽ More

    Submitted 1 May, 2021; originally announced May 2021.

    Comments: 4 pages, 5 figures, RLNQ workshop at the SIGMETRICS 2021

  19. arXiv:2102.08201  [pdf, other

    cs.LG eess.SY

    Improper Reinforcement Learning with Gradient-based Policy Optimization

    Authors: Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

    Abstract: We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a… ▽ More

    Submitted 3 July, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

  20. arXiv:2012.11898  [pdf, other

    cs.LG cs.AI

    Graph Autoencoders with Deconvolutional Networks

    Authors: Jia Li, Tomas Yu, Da-Cheng Juan, Arjun Gopalan, Hong Cheng, Andrew Tomkins

    Abstract: Recent studies have indicated that Graph Convolutional Networks (GCNs) act as a \emph{low pass} filter in spectral domain and encode smoothed node representations. In this paper, we consider their opposite, namely Graph Deconvolutional Networks (GDNs) that reconstruct graph signals from smoothed node representations. We motivate the design of Graph Deconvolutional Networks via a combination of inv… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  21. arXiv:2011.01016  [pdf, other

    cs.LG

    Stochastic Linear Bandits with Protected Subspace

    Authors: Advait Parulekar, Soumya Basu, Aditya Gopalan, Karthikeyan Shanmugam, Sanjay Shakkottai

    Abstract: We study a variant of the stochastic linear bandit problem wherein we optimize a linear objective function but rewards are accrued only orthogonal to an unknown subspace (which we interpret as a \textit{protected space}) given only zero-order stochastic oracle access to both the objective itself and protected subspace. In particular, at each round, the learner must choose whether to query the obje… ▽ More

    Submitted 1 March, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

  22. arXiv:2008.08885  [pdf, other

    cs.LG stat.ML

    No-regret Algorithms for Multi-task Bayesian Optimization

    Authors: Sayak Ray Chowdhury, Aditya Gopalan

    Abstract: We consider multi-objective optimization (MOO) of an unknown vector-valued function in the non-parametric Bayesian optimization (BO) setting, with the aim being to learn points on the Pareto front of the objectives. Most existing BO algorithms do not model the fact that the multiple objectives, or equivalently, tasks can share similarities, and even the few that do lack rigorous, finite-time regre… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

  23. arXiv:2007.12961  [pdf, ps, other

    cs.IT

    Sequential Multi-hypothesis Testing in Multi-armed Bandit Problems:An Approach for Asymptotic Optimality

    Authors: Gayathri R Prabhu, Srikrishna Bhashyam, Aditya Gopalan, Rajesh Sundaresan

    Abstract: We consider a multi-hypothesis testing problem involving a K-armed bandit. Each arm's signal follows a distribution from a vector exponential family. The actual parameters of the arms are unknown to the decision maker. The decision maker incurs a delay cost for delay until a decision and a switching cost whenever he switches from one arm to another. His goal is to minimise the overall cost until a… ▽ More

    Submitted 28 July, 2020; v1 submitted 25 July, 2020; originally announced July 2020.

    Comments: arXiv admin note: text overlap with arXiv:1712.03682

  24. arXiv:2006.07562  [pdf, other

    cs.LG stat.ML

    Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners

    Authors: Mohammadi Zaki, Avi Mohan, Aditya Gopalan

    Abstract: We study the problem of best arm identification in linearly parameterised multi-armed bandits. Given a set of feature vectors $\mathcal{X}\subset\mathbb{R}^d,$ a confidence parameter $δ$ and an unknown vector $θ^*,$ the goal is to identify $\arg\max_{x\in\mathcal{X}}x^Tθ^*$, with probability at least $1-δ,$ using noisy measurements of the form $x^Tθ^*.$ For this fixed confidence ($δ$-PAC) setting,… ▽ More

    Submitted 13 June, 2020; originally announced June 2020.

  25. arXiv:2004.12782  [pdf, other

    cs.SI cs.LG q-bio.PE stat.AP stat.ML

    How Reliable are Test Numbers for Revealing the COVID-19 Ground Truth and Applying Interventions?

    Authors: Aditya Gopalan, Himanshu Tyagi

    Abstract: The number of confirmed cases of COVID-19 is often used as a proxy for the actual number of ground truth COVID-19 infected cases in both public discourse and policy making. However, the number of confirmed cases depends on the testing policy, and it is important to understand how the number of positive cases obtained using different testing policies reveals the unknown ground truth. We develop an… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

  26. arXiv:2002.08583   

    cs.LG stat.ML

    Regret Minimization in Stochastic Contextual Dueling Bandits

    Authors: Aadirupa Saha, Aditya Gopalan

    Abstract: We consider the problem of stochastic $K$-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of $K$ items, each represented by a $d$-dimensional feature vector, and the goal of the learner is to identify the best arm of each context sets. However, unlike the classical contextual bandit setup, our framework only allows the learner to rece… ▽ More

    Submitted 8 May, 2021; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: Wrong result with incremental contribution, major revision required

  27. Throughput Optimal Decentralized Scheduling with Single-bit State Feedback for a Class of Queueing Systems

    Authors: Avinash Mohan, Aditya Gopalan, Anurag Kumar

    Abstract: Motivated by medium access control for resource-challenged wireless Internet of Things (IoT), we consider the problem of queue scheduling with reduced queue state information. In particular, we consider a time-slotted scheduling model with $N$ sensor nodes, with pair-wise dependence, such that Nodes $i$ and $i + 1,~0 < i < N$ cannot transmit together. We develop new throughput-optimal scheduling p… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 53 pages, 18 figures, IEEE/ACM Transactions on Networking

    Journal ref: IEEE/ACM Transactions on Networking, April 2020

  28. arXiv:2002.07994  [pdf, other

    cs.LG cs.AI stat.ML

    Best-item Learning in Random Utility Models with Subset Choices

    Authors: Aadirupa Saha, Aditya Gopalan

    Abstract: We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities. We identify a new property of such a RUM, termed the… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

    Comments: Accepted to 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020

  29. arXiv:2002.02567  [pdf, other

    cs.DC cs.IT cs.SI

    Stability and Scalability of Blockchain Systems

    Authors: Aditya Gopalan, Abishek Sankararaman, Anwar Walid, Sriram Vishwanath

    Abstract: The blockchain paradigm provides a mechanism for content dissemination and distributed consensus on Peer-to-Peer (P2P) networks. While this paradigm has been widely adopted in industry, it has not been carefully analyzed in terms of its network scaling with respect to the number of peers. Applications for blockchain systems, such as cryptocurrencies and IoT, require this form of network scaling.… ▽ More

    Submitted 18 December, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: This is the revised version of the paper

    MSC Class: 94A06 (Primary); 60H06 (Secondary) ACM Class: C.4; G.3; H.4.3

    Journal ref: Proc. ACM Meas. Anal. Comput. Syst. Vol. 4 No. 2 (2020) Article 35, pages 1-35

  30. arXiv:1911.08197  [pdf, other

    cs.LG cs.IT stat.ML

    Sequential Mode Estimation with Oracle Queries

    Authors: Dhruti Shah, Tuhinangshu Choudhury, Nikhil Karamchandani, Aditya Gopalan

    Abstract: We consider the problem of adaptively PAC-learning a probability distribution $\mathcal{P}$'s mode by querying an oracle for information about a sequence of i.i.d. samples $X_1, X_2, \ldots$ generated from $\mathcal{P}$. We consider two different query models: (a) each query is an index $i$ for which the oracle reveals the value of the sample $X_i$, (b) each query is comprised of two indices $i$ a… ▽ More

    Submitted 19 November, 2019; originally announced November 2019.

    Comments: A shorter version of this paper has been accepted for publication at Association for the Advancement of Artificial Intelligence - AAAI 2020

  31. arXiv:1911.01871  [pdf, ps, other

    cs.LG stat.ML

    On Online Learning in Kernelized Markov Decision Processes

    Authors: Sayak Ray Chowdhury, Aditya Gopalan

    Abstract: We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness… ▽ More

    Submitted 4 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: text overlap with arXiv:1805.08052

  32. arXiv:1911.01695  [pdf, other

    cs.LG math.OC stat.ML

    Towards Optimal and Efficient Best Arm Identification in Linear Bandits

    Authors: Mohammadi Zaki, Avinash Mohan, Aditya Gopalan

    Abstract: We give a new algorithm for best arm identification in linearly parameterised bandits in the fixed confidence setting. The algorithm generalises the well-known LUCB algorithm of Kalyanakrishnan et al. (2012) by playing an arm which minimises a suitable notion of geometric overlap of the statistical confidence set for the unknown parameter, and is fully adaptive and computationally efficient as com… ▽ More

    Submitted 7 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

  33. arXiv:1911.01032  [pdf, other

    cs.LG stat.ML

    On Batch Bayesian Optimization

    Authors: Sayak Ray Chowdhury, Aditya Gopalan

    Abstract: We present two algorithms for Bayesian optimization in the batch feedback setting, based on Gaussian process upper confidence bound and Thompson sampling approaches, along with frequentist regret guarantees and numerical results.

    Submitted 4 November, 2019; originally announced November 2019.

    Comments: All of Bayesian Nonparametrics workshop, Neural Information Processing Systems, 2018

  34. arXiv:1910.08805  [pdf, ps, other

    cs.LG stat.ML

    On Adaptivity in Information-constrained Online Learning

    Authors: Siddharth Mitra, Aditya Gopalan

    Abstract: We study how to adapt to smoothly-varying ('easy') environments in well-known online learning problems where acquiring information is expensive. For the problem of label efficient prediction, which is a budgeted version of prediction with expert advice, we present an online algorithm whose regret depends optimally on the number of labels allowed and $Q^*$ (the quadratic variation of the losses of… ▽ More

    Submitted 6 December, 2019; v1 submitted 19 October, 2019; originally announced October 2019.

    Comments: 34th AAAI Conference on Artificial Intelligence (AAAI 2020). Short version at 11th Optimization for Machine Learning workshop (OPT 2019)

  35. arXiv:1909.07040  [pdf, other

    cs.LG stat.ML

    Bayesian Optimization under Heavy-tailed Payoffs

    Authors: Sayak Ray Chowdhury, Aditya Gopalan

    Abstract: We consider black box optimization of an unknown function in the nonparametric Gaussian process setting when the noise in the observed function values can be heavy tailed. This is in contrast to existing literature that typically assumes sub-Gaussian noise distributions for queries. Under the assumption that the unknown function belongs to the Reproducing Kernel Hilbert Space (RKHS) induced by a k… ▽ More

    Submitted 16 September, 2019; originally announced September 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019

  36. arXiv:1903.00558  [pdf, other

    cs.LG stat.ML

    From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model

    Authors: Aadirupa Saha, Aditya Gopalan

    Abstract: We consider PAC-learning a good item from $k$-subsetwise feedback information sampled from a Plackett-Luce probability model, with instance-dependent sample complexity performance. In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an algorithm with optimal instance-dependent sample complexity, for PAC best arm identificatio… ▽ More

    Submitted 26 February, 2020; v1 submitted 1 March, 2019; originally announced March 2019.

    Comments: 56 pages, 17 figures

  37. arXiv:1903.00543  [pdf, other

    cs.LG stat.ML

    Combinatorial Bandits with Relative Feedback

    Authors: Aadirupa Saha, Aditya Gopalan

    Abstract: We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute. Specifically, we study two regret minimisation problems over subsets of a finite ground set $[n]$, with subset-wise relative preference information feedback according to the Multinomial logit choice model. In t… ▽ More

    Submitted 26 February, 2020; v1 submitted 1 March, 2019; originally announced March 2019.

    Comments: 47 pages, 12 fgures

    Journal ref: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019

  38. arXiv:1810.10321  [pdf, other

    cs.LG stat.ML

    Active Ranking with Subset-wise Preferences

    Authors: Aadirupa Saha, Aditya Gopalan

    Abstract: We consider the problem of probably approximately correct (PAC) ranking $n$ items by adaptively eliciting subset-wise preference feedback. At each round, the learner chooses a subset of $k$ items and observes stochastic feedback indicating preference information of the winner (most preferred) item of the chosen subset drawn according to a Plackett-Luce (PL) subset choice model unknown a priori. Th… ▽ More

    Submitted 1 March, 2019; v1 submitted 23 October, 2018; originally announced October 2018.

    Comments: In 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. (44 pages, 8 figures). arXiv admin note: text overlap with arXiv:1808.04008

  39. arXiv:1808.04008  [pdf, ps, other

    cs.LG stat.ML

    PAC Battling Bandits in the Plackett-Luce Model

    Authors: Aadirupa Saha, Aditya Gopalan

    Abstract: We introduce the probably approximately correct (PAC) \emph{Battling-Bandit} problem with the Plackett-Luce (PL) subset choice model--an online learning framework where at each trial the learner chooses a subset of $k$ arms from a fixed set of $n$ arms, and subsequently observes a stochastic feedback indicating preference information of the items in the chosen subset, e.g., the most preferred item… ▽ More

    Submitted 1 March, 2019; v1 submitted 12 August, 2018; originally announced August 2018.

    Comments: In 30th International Conference on Algorithmic Learning Theory (ALT), 2019. (45 pages)

  40. arXiv:1805.08052  [pdf, ps, other

    cs.LG stat.ML

    Online Learning in Kernelized Markov Decision Processes

    Authors: Sayak Ray Chowdhury, Aditya Gopalan

    Abstract: We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ nonparametric Gaussian process priors to generalize across the state and action spaces. When the transition and reward functions of the true MDP are members of the associated Reprodu… ▽ More

    Submitted 2 January, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019

  41. arXiv:1712.03682  [pdf, ps, other

    cs.IT

    Learning to detect an oddball target with observations from an exponential family

    Authors: Gayathri R Prabhu, Srikrishna Bhashyam, Aditya Gopalan, Rajesh Sundaresan

    Abstract: The problem of detecting an odd arm from a set of K arms of a multi-armed bandit, with fixed confidence, is studied in a sequential decision-making scenario. Each arm's signal follows a distribution from a vector exponential family. All arms have the same parameters except the odd arm. The actual parameters of the odd and non-odd arms are unknown to the decision maker. Further, the decision maker… ▽ More

    Submitted 12 June, 2022; v1 submitted 11 December, 2017; originally announced December 2017.

  42. arXiv:1706.04125  [pdf, ps, other

    cs.LG

    Online Learning for Structured Loss Spaces

    Authors: Siddharth Barman, Aditya Gopalan, Aadirupa Saha

    Abstract: We consider prediction with expert advice when the loss vectors are assumed to lie in a set described by the sum of atomic norm balls. We derive a regret bound for a general version of the online mirror descent (OMD) algorithm that uses a combination of regularizers, each adapted to the constituent atomic norms. The general result recovers standard OMD regret bounds, and yields regret bounds for n… ▽ More

    Submitted 12 November, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

    Comments: 24 pages

  43. arXiv:1704.06880  [pdf, other

    cs.LG

    Misspecified Linear Bandits

    Authors: Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan

    Abstract: We consider the problem of online learning in misspecified linear stochastic multi-armed bandit problems. Regret guarantees for state-of-the-art linear bandit algorithms such as Optimism in the Face of Uncertainty Linear bandit (OFUL) hold under the assumption that the arms expected rewards are perfectly linear in their features. It is, however, of interest to investigate the impact of potential m… ▽ More

    Submitted 23 April, 2017; originally announced April 2017.

    Comments: Thirty-First AAAI Conference on Artificial Intelligence, 2017

  44. arXiv:1704.00445  [pdf, other

    cs.LG

    On Kernelized Multi-armed Bandits

    Authors: Sayak Ray Chowdhury, Aditya Gopalan

    Abstract: We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward func… ▽ More

    Submitted 17 May, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

  45. arXiv:1611.10283  [pdf, ps, other

    cs.LG stat.ML

    Bandit algorithms to emulate human decision making using probabilistic distortions

    Authors: Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus

    Abstract: Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the reward distributions: the classic $K$-armed bandit and the linearly parameterized bandit settings. We consider the aforementioned problems in the regret minimization as… ▽ More

    Submitted 31 October, 2023; v1 submitted 30 November, 2016; originally announced November 2016.

    Comments: The material in this paper was presented in part at the 2017 AAAI Conference on Artificial Intelligence

  46. arXiv:1609.01508  [pdf, ps, other

    cs.LG

    Low-rank Bandits with Latent Mixtures

    Authors: Aditya Gopalan, Odalric-Ambrym Maillard, Mohammadi Zaki

    Abstract: We study the task of maximizing rewards from recommending items (actions) to users sequentially interacting with a recommender system. Users are modeled as latent mixtures of C many representative user classes, where each class specifies a mean reward profile across actions. Both the user features (mixture distribution over classes) and the item features (mean reward vector per class) are unknown… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

  47. arXiv:1603.09233  [pdf, ps, other

    cs.LG

    Optimal Recommendation to Users that React: Online Learning for a Class of POMDPs

    Authors: Rahul Meshram, Aditya Gopalan, D. Manjunath

    Abstract: We describe and study a model for an Automated Online Recommendation System (AORS) in which a user's preferences can be time-dependent and can also depend on the history of past recommendations and play-outs. The three key features of the model that makes it more realistic compared to existing models for recommendation systems are (1) user preference is inherently latent, (2) current recommendatio… ▽ More

    Submitted 30 March, 2016; originally announced March 2016.

    Comments: 8 pages, submitted to conference

  48. arXiv:1602.08886  [pdf, other

    cs.LG stat.ML

    Collaborative Learning of Stochastic Bandits over a Social Network

    Authors: Ravi Kumar Kolla, Krishna Jagannathan, Aditya Gopalan

    Abstract: We consider a collaborative online learning paradigm, wherein a group of agents connected through a social network are engaged in playing a stochastic multi-armed bandit game. Each time an agent takes an action, the corresponding reward is instantaneously observed by the agent, as well as its neighbours in the social network. We perform a regret analysis of various policies in this collaborative l… ▽ More

    Submitted 11 July, 2016; v1 submitted 29 February, 2016; originally announced February 2016.

    Comments: 14 Pages, 6 Figures

  49. arXiv:1410.7528  [pdf, other

    cs.IT

    Optimal WiFi Sensing via Dynamic Programming

    Authors: Abhinav Kumar, Rahul Vaze, Sibi Raj B Pillai, Aditya Gopalan

    Abstract: The problem of finding an optimal sensing schedule for a mobile device that encounters an intermittent WiFi access opportunity is considered. At any given time, the WiFi is in any of the two modes, ON or OFF, and the mobile's incentive is to connect to the WiFi in the ON mode as soon as possible, while spending as little sensing energy. We introduce a dynamic programming framework which enables th… ▽ More

    Submitted 28 October, 2014; originally announced October 2014.

  50. arXiv:1406.7498  [pdf, ps, other

    stat.ML cs.LG

    Thompson Sampling for Learning Parameterized Markov Decision Processes

    Authors: Aditya Gopalan, Shie Mannor

    Abstract: We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards. Consequently, observing a particular state transition might yield useful information about other, unobserved, parts of the MDP. We present a version of Thompson sampling for parameterized reinforcement learning problems,… ▽ More

    Submitted 31 March, 2015; v1 submitted 29 June, 2014; originally announced June 2014.