Zum Hauptinhalt springen

Showing 1–50 of 108 results for author: Shakkottai, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.06325  [pdf, other

    cs.LG cs.DC math.OC

    CONGO: Compressive Online Gradient Optimization with Application to Microservices Management

    Authors: Jeremy Carleton, Prathik Vijaykumar, Divyanshu Saxena, Dheeraj Narasimha, Srinivas Shakkottai, Aditya Akella

    Abstract: We address the challenge of online convex optimization where the objective function's gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function's gradient even when the only information available is a limited number of function samples. Our motivation stems from… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 28 pages, 7 figures

  2. arXiv:2405.17401  [pdf, other

    cs.LG cs.CV stat.ML

    RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

    Authors: Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

    Abstract: We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of styl… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Preprint. Under review

  3. arXiv:2404.07315  [pdf, other

    eess.SY cs.AI cs.LG

    Structured Reinforcement Learning for Media Streaming at the Wireless Edge

    Authors: Archana Bura, Sarat Chandra Bobbili, Shreyas Rameshkumar, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai

    Abstract: Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to… ▽ More

    Submitted 16 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: 15 pages, 14 figures

  4. arXiv:2402.11639  [pdf, other

    cs.LG cs.AI cs.CL

    In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness

    Authors: Liam Collins, Advait Parulekar, Aryan Mokhtari, Sujay Sanghavi, Sanjay Shakkottai

    Abstract: A striking property of transformers is their ability to perform in-context learning (ICL), a machine learning framework in which the learner is presented with a novel context during inference implicitly through some data, and tasked with making a prediction in that context. As such, that learner must adapt to the context without additional training. We explore the role of softmax attention in an I… ▽ More

    Submitted 28 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  5. arXiv:2312.00852  [pdf, other

    cs.LG cs.CV stat.ML

    Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion

    Authors: Litu Rout, Yujia Chen, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

    Abstract: Sampling from the posterior distribution poses a major computational challenge in solving inverse problems using latent diffusion models. Common methods rely on Tweedie's first-order moments, which are known to induce a quality-limiting bias. Existing second-order approximations are impractical due to prohibitive computational costs, making standard reverse diffusion processes intractable for post… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Preprint

  6. arXiv:2311.00226  [pdf, other

    eess.SP cs.LG

    Transformers are Provably Optimal In-context Estimators for Wireless Communications

    Authors: Vishnu Teja Kunde, Vicram Rajagopalan, Chandra Shekhara Kaushik Valmeekam, Krishna Narayanan, Srinivas Shakkottai, Dileep Kalathil, Jean-Francois Chamberland

    Abstract: Pre-trained transformers exhibit the capability of adapting to new tasks through in-context learning (ICL), where they efficiently utilize a limited set of prompts without explicit model optimization. The canonical communication problem of estimating transmitted symbols from received observations can be modelled as an in-context learning problem: Received observations are essentially a noisy fun… ▽ More

    Submitted 14 June, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: 13 pages, 2 figures, 2 tables, preprint; abstract, references, theory updated

  7. arXiv:2307.06887  [pdf, other

    cs.LG

    Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

    Authors: Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network. This approach yields strong downstream performance in a variety of contexts, demonstrating that multitask pretraining leads to effective feature learning. Although several recent theoretical… ▽ More

    Submitted 6 June, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

  8. arXiv:2307.00619  [pdf, other

    cs.LG cs.AI stat.ML

    Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

    Authors: Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros G. Dimakis, Sanjay Shakkottai

    Abstract: We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often c… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: Preprint

  9. arXiv:2306.04050  [pdf, ps, other

    cs.IT cs.CL cs.LG

    LLMZip: Lossless Text Compression using Large Language Models

    Authors: Chandra Shekhara Kaushik Valmeekam, Krishna Narayanan, Dileep Kalathil, Jean-Francois Chamberland, Srinivas Shakkottai

    Abstract: We provide new estimates of an asymptotic upper bound on the entropy of English using the large language model LLaMA-7B as a predictor for the next token given a window of past tokens. This estimate is significantly smaller than currently available estimates in \cite{cover1978convergent}, \cite{lutati2023focus}. A natural byproduct is an algorithm for lossless compression of English text which com… ▽ More

    Submitted 26 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 7 pages, 4 figures, 4 tables, preprint, added results on using LLMs with arithmetic coding

  10. arXiv:2305.18784  [pdf, ps, other

    cs.LG cs.DC cs.MA cs.SI stat.ML

    Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits

    Authors: Ronshee Chawla, Daniel Vial, Sanjay Shakkottai, R. Srikant

    Abstract: The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$ stochastic multi-armed bandits to minimize their group cumulative regret. We develop decentralized algorithms which facilitate collaboration between the agents under… ▽ More

    Submitted 2 July, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: To appear in the proceedings of ICML 2023

  11. arXiv:2305.03097  [pdf, other

    cs.LG cs.AI

    Federated Ensemble-Directed Offline Reinforcement Learning

    Authors: Desik Rengarajan, Nitin Ragothaman, Dileep Kalathil, Srinivas Shakkottai

    Abstract: We consider the problem of federated offline reinforcement learning (RL), a scenario under which distributed learning agents must collaboratively learn a high-quality control policy only using small pre-collected datasets generated according to different unknown behavior policies. Naively combining a standard offline RL approach with a standard federated learning approach to solve this problem can… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  12. arXiv:2304.11199  [pdf, other

    cs.NI eess.SY

    EdgeRIC: Empowering Realtime Intelligent Optimization and Control in NextG Networks

    Authors: Woo-Hyun Ko, Ushasi Ghosh, Ujwal Dinesha, Raini Wu, Srinivas Shakkottai, Dinesh Bharadia

    Abstract: Radio Access Networks (RAN) are increasingly softwarized and accessible via data-collection and control interfaces. RAN intelligent control (RIC) is an approach to manage these interfaces at different timescales. In this paper, we develop a RIC platform called RICworld, consisting of (i) EdgeRIC, which is colocated, but decoupled from the RAN stack, and can access RAN and application-level informa… ▽ More

    Submitted 2 May, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: 16 pages, 15 figures

  13. arXiv:2302.07920  [pdf, other

    cs.LG

    InfoNCE Loss Provably Learns Cluster-Preserving Representations

    Authors: Advait Parulekar, Liam Collins, Karthikeyan Shanmugam, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: The goal of contrasting learning is to learn a representation that preserves underlying clusters by keeping samples with similar content, e.g. the ``dogness'' of a dog, close to each other in the space generated by the representation. A common and successful approach for tackling this unsupervised learning problem is minimizing the InfoNCE loss associated with the training samples, where each samp… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  14. arXiv:2302.06570  [pdf, ps, other

    stat.ML cs.LG math.OC

    Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD

    Authors: Matthew Faw, Litu Rout, Constantine Caramanis, Sanjay Shakkottai

    Abstract: This work considers the problem of finding a first-order stationary point of a non-convex function with potentially unbounded smoothness constant using a stochastic gradient oracle. We focus on the class of $(L_0,L_1)$-smooth functions proposed by Zhang et al. (ICLR'20). Empirical evidence suggests that these functions more closely captures practical machine learning problems as compared to the pe… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  15. arXiv:2302.01217  [pdf, other

    stat.ML cs.AI cs.LG math.ST

    A Theoretical Justification for Image Inpainting using Denoising Diffusion Probabilistic Models

    Authors: Litu Rout, Advait Parulekar, Constantine Caramanis, Sanjay Shakkottai

    Abstract: We provide a theoretical justification for sample recovery using diffusion based image inpainting in a linear model setting. While most inpainting algorithms require retraining with each new mask, we prove that diffusion based inpainting generalizes well to unseen masks without retraining. We analyze a recently proposed popular diffusion based inpainting algorithm called RePaint (Lugmayr et al., 2… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 30 pages, 5 figures, 1 Table

  16. arXiv:2211.04584  [pdf, other

    cs.AI eess.SY

    Energy System Digitization in the Era of AI: A Three-Layered Approach towards Carbon Neutrality

    Authors: Le Xie, Tong Huang, Xiangtian Zheng, Yan Liu, Mengdi Wang, Vijay Vittal, P. R. Kumar, Srinivas Shakkottai, Yi Cui

    Abstract: The transition towards carbon-neutral electricity is one of the biggest game changers in addressing climate change since it addresses the dual challenges of removing carbon emissions from the two largest sectors of emitters: electricity and transportation. The transition to a carbon-neutral electric grid poses significant challenges to conventional paradigms of modern grid planning and operation.… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: To be published in Patterns (Cell Press)

  17. arXiv:2209.13048  [pdf, other

    cs.LG cs.RO

    Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

    Authors: Desik Rengarajan, Sapana Chaudhary, Jaewon Kim, Dileep Kalathil, Srinivas Shakkottai

    Abstract: Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often assoc… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Accepted to NeurIPS 2022; first two authors contributed equally

  18. arXiv:2209.11328  [pdf, other

    cs.RO eess.SY

    Learning Certifiably Robust Controllers Using Fragile Perception

    Authors: Dawei Sun, Negin Musavi, Geir Dullerud, Sanjay Shakkottai, Sayan Mitra

    Abstract: Advances in computer vision and machine learning enable robots to perceive their surroundings in powerful new ways, but these perception modules have well-known fragilities. We consider the problem of synthesizing a safe controller that is robust despite perception errors. The proposed method constructs a state estimator based on Gaussian processes with input-dependent noises. This estimator compu… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  19. arXiv:2205.15196  [pdf, other

    cs.LG stat.ML

    PAC Generalization via Invariant Representations

    Authors: Advait Parulekar, Karthikeyan Shanmugam, Sanjay Shakkottai

    Abstract: One method for obtaining generalizable solutions to machine learning tasks when presented with diverse training environments is to find \textit{invariant representations} of the data. These are representations of the covariates such that the best model on top of the representation is invariant across training environments. In the context of linear Structural Equation Models (SEMs), invariant repre… ▽ More

    Submitted 14 August, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

  20. arXiv:2205.14790  [pdf, ps, other

    cs.LG cs.DS

    Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret

    Authors: Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

    Abstract: The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures natural behavioral aspects of the users which crucially determine the performance of recommendation platforms, ad placement systems, and more. Even assuming pr… ▽ More

    Submitted 12 October, 2022; v1 submitted 29 May, 2022; originally announced May 2022.

    Comments: Accepted for publication to NeurIPS 2022

  21. arXiv:2205.13692  [pdf, other

    cs.LG

    FedAvg with Fine Tuning: Local Updates Lead to Representation Learning

    Authors: Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the output model of FedAvg, after a few fine-tuning steps, leads… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

  22. arXiv:2203.12577  [pdf, other

    cs.LG stat.ML

    Minimax Regret for Cascading Bandits

    Authors: Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant

    Abstract: Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting. For the case of unstructured rewards, we prove matching upper and lower bounds for the problem-independent (i.e., gap-free) regret, both of which strictly improve the best known. A key observation is that the hard instances of this problem are those with smal… ▽ More

    Submitted 10 October, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Journal ref: Conference on Neural Information Processing Systems (NeurIPS) 2022

  23. arXiv:2203.04410  [pdf, other

    cs.AI eess.SY

    OpenGridGym: An Open-Source AI-Friendly Toolkit for Distribution Market Simulation

    Authors: Rayan El Helou, Kiyeob Lee, Dongqi Wu, Le Xie, Srinivas Shakkottai, Vijay Subramanian

    Abstract: This paper presents OpenGridGym, an open-source Python-based package that allows for seamless integration of distribution market simulation with state-of-the-art artificial intelligence (AI) decision-making algorithms. We present the architecture and design choice for the proposed framework, elaborate on how users interact with OpenGridGym, and highlight its value by providing multiple cases to de… ▽ More

    Submitted 6 March, 2022; originally announced March 2022.

  24. arXiv:2203.00076  [pdf, other

    cs.LG cs.MA stat.ML

    Robust Multi-Agent Bandits Over Undirected Graphs

    Authors: Daniel Vial, Sanjay Shakkottai, R. Srikant

    Abstract: We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a network to minimize regret but $m$ malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur $O( (m + K/n) \log (T) / Δ)$ regret in this setting, where $K$ is the number of arms and $Δ$ is the arm gap. For $m \ll K$, this improves over th… ▽ More

    Submitted 26 January, 2023; v1 submitted 28 February, 2022; originally announced March 2022.

    Journal ref: Proceedings of the ACM on Measurement and Analysis of Computing Systems, December 2022

  25. arXiv:2202.05791  [pdf, other

    stat.ML cs.LG math.OC

    The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

    Authors: Matthew Faw, Isidoros Tziotis, Constantine Caramanis, Aryan Mokhtari, Sanjay Shakkottai, Rachel Ward

    Abstract: We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, all prior works rely on some subset of the following a… ▽ More

    Submitted 25 July, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: Accepted to COLT 2022

  26. arXiv:2202.04628  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

    Authors: Desik Rengarajan, Gargi Vaidya, Akshay Sarvesh, Dileep Kalathil, Srinivas Shakkottai

    Abstract: A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. T… ▽ More

    Submitted 13 February, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

  27. arXiv:2202.03483  [pdf, other

    cs.LG

    MAML and ANIL Provably Learn Representations

    Authors: Liam Collins, Aryan Mokhtari, Sewoong Oh, Sanjay Shakkottai

    Abstract: Recent empirical evidence has driven conventional wisdom to believe that gradient-based meta-learning (GBML) methods perform well at few-shot learning because they learn an expressive data representation that is shared across tasks. However, the mechanics of GBML have remained largely mysterious from a theoretical perspective. In this paper, we prove that two well-known GBML methods, MAML and ANIL… ▽ More

    Submitted 4 June, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

  28. arXiv:2112.00885  [pdf, other

    cs.LG cs.AI

    DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning

    Authors: Archana Bura, Aria HasanzadeZonuzy, Dileep Kalathil, Srinivas Shakkottai, Jean-Francois Chamberland

    Abstract: Safe reinforcement learning is extremely challenging--not only must the agent explore an unknown environment, it must do so while ensuring no safety constraint violations. We formulate this safe reinforcement learning (RL) problem using the framework of a finite-horizon Constrained Markov Decision Process (CMDP) with an unknown transition probability function, where we model the safety requirement… ▽ More

    Submitted 17 October, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted to NeurIPS 2022

  29. arXiv:2110.02128  [pdf, other

    cs.LG stat.ML

    NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

    Authors: Khaled Nakhleh, Santosh Ganji, Ping-Chun Hsieh, I-Hong Hou, Srinivas Shakkottai

    Abstract: Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless ba… ▽ More

    Submitted 19 January, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in NeurIPS 2021

  30. arXiv:2109.05546  [pdf, ps, other

    cs.LG stat.ML

    Improved Algorithms for Misspecified Linear Markov Decision Processes

    Authors: Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

    Abstract: For the misspecified linear Markov decision process (MLMDP) model of Jin et al. [2020], we propose an algorithm with three desirable properties. (P1) Its regret after $K$ episodes scales as $K \max \{ \varepsilon_{\text{mis}}, \varepsilon_{\text{tol}} \}$, where $\varepsilon_{\text{mis}}$ is the degree of misspecification and $\varepsilon_{\text{tol}}$ is a user-specified error tolerance. (P2) Its… ▽ More

    Submitted 19 October, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

    Comments: This version adds an intuitive explanation in Section 3

    Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

  31. arXiv:2107.03263  [pdf, other

    cs.LG

    Episodic Bandits with Stochastic Experts

    Authors: Nihal Sharma, Soumya Basu, Karthikeyan Shanmugam, Sanjay Shakkottai

    Abstract: We study a version of the contextual bandit problem where an agent can intervene through a set of stochastic expert policies. The agent interacts with the environment over episodes, with each episode having different context distributions; this results in the `best expert' changing across episodes. Our goal is to develop an agent that tracks the best expert over episodes. We introduce the Empirica… ▽ More

    Submitted 26 October, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

  32. arXiv:2106.12729  [pdf, ps, other

    cs.LG math.OC stat.ML

    Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

    Authors: Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

    Abstract: In temporal difference (TD) learning, off-policy sampling is known to be more practical than on-policy sampling, and by decoupling learning from data collection, it enables data reuse. It is known that policy evaluation (including multi-step off-policy importance sampling) has the interpretation of solving a generalized Bellman equation. In this paper, we derive finite-sample bounds for any genera… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

  33. arXiv:2106.11174  [pdf, other

    cs.LG cs.CV

    Does Optimal Source Task Performance Imply Optimal Pre-training for a Target Task?

    Authors: Steven Gutstein, Brent Lance, Sanjay Shakkottai

    Abstract: Fine-tuning of pre-trained deep nets is commonly used to improve accuracies and training times for neural nets. It is generally assumed that pre-training a net for optimal source task performance best prepares it for fine-tuning to learn an arbitrary target task. This is generally not true. Stopping source task training, prior to optimal performance, can create a pre-trained net better suited for… ▽ More

    Submitted 12 April, 2022; v1 submitted 21 June, 2021; originally announced June 2021.

  34. Job Dispatching Policies for Queueing Systems with Unknown Service Rates

    Authors: Tuhinangshu Choudhury, Gauri Joshi, Weina Wang, Sanjay Shakkottai

    Abstract: In multi-server queueing systems where there is no central queue holding all incoming jobs, job dispatching policies are used to assign incoming jobs to the queue at one of the servers. Classic job dispatching policies such as join-the-shortest-queue and shortest expected delay assume that the service rates and queue lengths of the servers are known to the dispatcher. In this work, we tackle the p… ▽ More

    Submitted 10 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  35. arXiv:2105.10625  [pdf, other

    cs.LG cs.DS

    Combinatorial Blocking Bandits with Stochastic Delays

    Authors: Alexia Atsidakou, Orestis Papadigenopoulos, Soumya Basu, Constantine Caramanis, Sanjay Shakkottai

    Abstract: Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable) model is that of blocking bandits, where an arm becomes unavailable for a deterministic number of rounds after each play. In this work, we extend the above mode… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

    Comments: International Conference on Machine Learning, ICML'21

  36. arXiv:2105.01593  [pdf, ps, other

    cs.LG stat.ML

    Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

    Authors: Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant

    Abstract: We propose an algorithm that uses linear function approximation (LFA) for stochastic shortest path (SSP). Under minimal assumptions, it obtains sublinear regret, is computationally efficient, and uses stationary policies. To our knowledge, this is the first such algorithm in the LFA literature (for SSP or other formulations). Our algorithm is a special case of a more general one, which achieves re… ▽ More

    Submitted 19 October, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: This version removes most assumptions of the prior one

    Journal ref: International Conference on Machine Learning (ICML) 2022

  37. arXiv:2103.02729  [pdf, other

    cs.LG

    Linear Bandit Algorithms with Sublinear Time Complexity

    Authors: Shuo Yang, Tongzheng Ren, Sanjay Shakkottai, Eric Price, Inderjit S. Dhillon, Sujay Sanghavi

    Abstract: We propose two linear bandits algorithms with per-step complexity sublinear in the number of arms $K$. The algorithms are designed for applications where the arm set is extremely large and slowly changing. Our key realization is that choosing an arm reduces to a maximum inner product search (MIPS) problem, which can be solved approximately without breaking regret guarantees. Existing approximate M… ▽ More

    Submitted 9 June, 2022; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: Accepted at ICML 2022

  38. arXiv:2102.07078  [pdf, other

    cs.LG math.OC

    Exploiting Shared Representations for Personalized Federated Learning

    Authors: Liam Collins, Hamed Hassani, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: Deep neural networks have shown the ability to extract universal feature representations from data such as images and text that have been useful for a variety of learning tasks. However, the fruits of representation learning have yet to be fully-realized in federated settings. Although data in federated settings is often non-i.i.d. across clients, the success of centralized deep learning suggests… ▽ More

    Submitted 24 March, 2023; v1 submitted 14 February, 2021; originally announced February 2021.

  39. arXiv:2102.01567  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

    Authors: Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

    Abstract: This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do this by first reformulating the RL algorithms as \textit{Markovian Stochastic Approximation} (SA) algorithms to solve fixed-point equations. We then develop a Lyapunov analysis and derive mean-square error bounds on the co… ▽ More

    Submitted 4 September, 2023; v1 submitted 2 February, 2021; originally announced February 2021.

  40. arXiv:2012.02876  [pdf, other

    cs.LG stat.ML

    One-bit feedback is sufficient for upper confidence bound policies

    Authors: Daniel Vial, Sanjay Shakkottai, R. Srikant

    Abstract: We consider a variant of the traditional multi-armed bandit problem in which each arm is only able to provide one-bit feedback during each pull based on its past history of rewards. Our main result is the following: given an upper confidence bound policy which uses full-reward feedback, there exists a coding scheme for generating one-bit feedback, and a corresponding decoding scheme and arm select… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

  41. arXiv:2011.01016  [pdf, other

    cs.LG

    Stochastic Linear Bandits with Protected Subspace

    Authors: Advait Parulekar, Soumya Basu, Aditya Gopalan, Karthikeyan Shanmugam, Sanjay Shakkottai

    Abstract: We study a variant of the stochastic linear bandit problem wherein we optimize a linear objective function but rewards are accrued only orthogonal to an unknown subspace (which we interpret as a \textit{protected space}) given only zero-order stochastic oracle access to both the objective itself and protected subspace. In particular, at each round, the learner must choose whether to query the obje… ▽ More

    Submitted 1 March, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

  42. arXiv:2010.14672  [pdf, other

    cs.LG math.OC stat.ML

    How Does the Task Landscape Affect MAML Performance?

    Authors: Liam Collins, Aryan Mokhtari, Sanjay Shakkottai

    Abstract: Model-Agnostic Meta-Learning (MAML) has become increasingly popular for training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps. However, the MAML objective is significantly more difficult to optimize compared to standard non-adaptive learning (NAL), and little is understood about how much MAML improves over NAL in terms of the fast adaptability of thei… ▽ More

    Submitted 9 August, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

  43. arXiv:2009.06606  [pdf, ps, other

    cs.LG stat.ML

    Adaptive KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

    Authors: Arghyadip Roy, Sanjay Shakkottai, R. Srikant

    Abstract: In the regret-based formulation of Multi-armed Bandit (MAB) problems, except in rare instances, much of the literature focuses on arms with i.i.d. rewards. In this paper, we consider the problem of obtaining regret guarantees for MAB problems in which the rewards of each arm form a Markov chain which may not belong to a single parameter exponential family. To achieve a logarithmic regret in such p… ▽ More

    Submitted 8 October, 2022; v1 submitted 14 September, 2020; originally announced September 2020.

  44. arXiv:2008.00311  [pdf, other

    cs.LG stat.ML

    Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs

    Authors: Aria HasanzadeZonuzy, Archana Bura, Dileep Kalathil, Srinivas Shakkottai

    Abstract: Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goa… ▽ More

    Submitted 1 March, 2021; v1 submitted 1 August, 2020; originally announced August 2020.

  45. arXiv:2007.03812  [pdf, other

    cs.LG cs.DC cs.SI stat.ML

    Robust Multi-Agent Multi-Armed Bandits

    Authors: Daniel Vial, Sanjay Shakkottai, R. Srikant

    Abstract: Recent works have shown that agents facing independent instances of a stochastic $K$-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generaliz… ▽ More

    Submitted 10 October, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

    Journal ref: ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc) 2021

  46. arXiv:2007.01442  [pdf, other

    cs.LG cs.DC cs.SI stat.ML

    Multi-Agent Low-Dimensional Linear Bandits

    Authors: Ronshee Chawla, Abishek Sankararaman, Sanjay Shakkottai

    Abstract: We study a multi-agent stochastic linear bandit with side information, parameterized by an unknown vector $θ^* \in \mathbb{R}^d$. The side information consists of a finite collection of low-dimensional subspaces, one of which contains $θ^*$. In our setting, agents can collaborate to reduce regret by sending recommendations across a communication graph connecting them. We present a novel decentrali… ▽ More

    Submitted 25 May, 2022; v1 submitted 2 July, 2020; originally announced July 2020.

    Comments: To appear in IEEE Transactions on Automatic Control

  47. arXiv:2006.11683  [pdf, other

    math.OC cs.GT cs.LG cs.MA

    Reinforcement Learning for Mean Field Games with Strategic Complementarities

    Authors: Kiyeob Lee, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai

    Abstract: Mean Field Games (MFG) are the class of games with a very large number of agents and the standard equilibrium concept is a Mean Field Equilibrium (MFE). Algorithms for learning MFE in dynamic MFGs are unknown in general. Our focus is on an important subclass that possess a monotonicity property called Strategic Complementarities (MFG-SC). We introduce a natural refinement to the equilibrium concep… ▽ More

    Submitted 1 February, 2021; v1 submitted 20 June, 2020; originally announced June 2020.

  48. arXiv:2004.00472  [pdf, other

    cs.NI eess.SY

    Learning to Cache and Caching to Learn: Regret Analysis of Caching Algorithms

    Authors: Archana Bura, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai, Jean-Francois Chamberland-Tremblay

    Abstract: Crucial performance metrics of a caching algorithm include its ability to quickly and accurately learn a popularity distribution of requests. However, a majority of work on analytical performance analysis focuses on hit probability after an asymptotically large time has elapsed. We consider an online learning viewpoint, and characterize the "regret" in terms of the finite time difference between t… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

  49. arXiv:2003.03426  [pdf, other

    cs.LG stat.ML

    Contextual Blocking Bandits

    Authors: Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, Sanjay Shakkottai

    Abstract: We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. However, playing an arm blocks it (across all contexts) for a fixed and known number of future time steps. The above contextual setting, which captures important scenarios such as recommendation systems or ad placement wit… ▽ More

    Submitted 17 June, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

  50. arXiv:2002.08405  [pdf, other

    cs.LG stat.ML

    On Under-exploration in Bandits with Mean Bounds from Confounded Data

    Authors: Nihal Sharma, Soumya Basu, Karthikeyan Shanmugam, Sanjay Shakkottai

    Abstract: We study a variant of the multi-armed bandit problem where side information in the form of bounds on the mean of each arm is provided. We develop the novel non-optimistic Global Under-Explore (GLUE) algorithm which uses the provided mean bounds (across all the arms) to infer pseudo-variances for each arm, which in turn decide the rate of exploration for the arms. We analyze the regret of GLUE and… ▽ More

    Submitted 10 June, 2021; v1 submitted 19 February, 2020; originally announced February 2020.