Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Marinov, T V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.06480  [pdf, ps, other

    cs.LG cs.GT

    Incentive-compatible Bandits: Importance Weighting No More

    Authors: Julian Zimmert, Teodor V. Marinov

    Abstract: We study the problem of incentive-compatible online learning with bandit feedback. In this class of problems, the experts are self-interested agents who might misrepresent their preferences with the goal of being selected most often. The goal is to devise algorithms which are simultaneously incentive-compatible, that is the experts are incentivised to report their true preferences, and have no reg… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  2. arXiv:2403.19462  [pdf, other

    cs.LG cs.PL

    Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization

    Authors: Teodor V. Marinov, Alekh Agarwal, Mircea Trofin

    Abstract: This work studies a Reinforcement Learning (RL) problem in which we are given a set of trajectories collected with K baseline policies. Each of these policies can be quite suboptimal in isolation, and have strong performance in complementary parts of the state space. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space. We propose a si… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  3. arXiv:2305.17040  [pdf, other

    cs.LG cs.CL

    A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

    Authors: Jacob Abernethy, Alekh Agarwal, Teodor V. Marinov, Manfred K. Warmuth

    Abstract: We study the phenomenon of \textit{in-context learning} (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization. Our goal is to explain how a pre-trained transformer model is able to perform ICL under reasonable assumptions on the pre-training process and the downstream tasks. We posit a… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  4. arXiv:2302.03784  [pdf, ps, other

    cs.LG stat.ML

    Leveraging User-Triggered Supervision in Contextual Bandits

    Authors: Alekh Agarwal, Claudio Gentile, Teodor V. Marinov

    Abstract: We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. Such an interaction arises, for example, in text prediction or autocompletion settings, where a poor suggestion is simply ignored and the user enters the desired text instead. Crucially, this extra feedback is user-triggered on only a subset of the contexts. We develop a new fram… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  5. arXiv:2206.10022  [pdf, other

    cs.LG

    Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality

    Authors: Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

    Abstract: We revisit the problem of stochastic online learning with feedback graphs, with the goal of devising algorithms that are optimal, up to constants, both asymptotically and in finite time. We show that, surprisingly, the notion of optimal finite-time regret is not a uniquely defined property in this context and that, in general, it is decoupled from the asymptotic rate. We discuss alternative choice… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  6. arXiv:2206.01836  [pdf, ps, other

    cs.LG math.OC

    Dimension Independent Generalization of DP-SGD for Overparameterized Smooth Convex Optimization

    Authors: Yi-An Ma, Teodor Vanislavov Marinov, Tong Zhang

    Abstract: This paper considers the generalization performance of differentially private convex learning. We demonstrate that the convergence analysis of Langevin algorithms can be used to obtain new generalization bounds with differential privacy guarantees for DP-SGD. More specifically, by using some recently obtained dimension-independent convergence results for stochastic Langevin algorithms with convex… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  7. arXiv:2110.13282  [pdf, ps, other

    cs.LG

    The Pareto Frontier of model selection for general Contextual Bandits

    Authors: Teodor V. Marinov, Julian Zimmert

    Abstract: Recent progress in model selection raises the question of the fundamental limits of these techniques. Under specific scrutiny has been model selection for general contextual bandits with nested policy classes, resulting in a COLT2020 open problem. It asks whether it is possible to obtain simultaneously the optimal single algorithm guarantees over all policies in a nested sequence of policy classes… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

  8. arXiv:2107.01264  [pdf, other

    cs.LG

    Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

    Authors: Christoph Dann, Teodor V. Marinov, Mehryar Mohri, Julian Zimmert

    Abstract: We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the insight that, in order to achieve a favorable regret, an algorithm does not need to learn how to behave optimally in states that are not reached by an optimal policy.… ▽ More

    Submitted 26 October, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

  9. arXiv:2006.09255  [pdf, other

    cs.LG stat.ML

    Corralling Stochastic Bandit Algorithms

    Authors: Raman Arora, Teodor V. Marinov, Mehryar Mohri

    Abstract: We study the problem of corralling stochastic bandit algorithms, that is combining multiple bandit algorithms designed for a stochastic environment, with the goal of devising a corralling algorithm that performs almost as well as the best base algorithm. We give two general algorithms for this setting, which we show benefit from favorable regret guarantees. We show that the regret of the corrallin… ▽ More

    Submitted 28 February, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

  10. arXiv:2002.09609  [pdf, ps, other

    cs.LG stat.ML

    Private Stochastic Convex Optimization: Efficient Algorithms for Non-smooth Objectives

    Authors: Raman Arora, Teodor V. Marinov, Enayat Ullah

    Abstract: In this paper, we revisit the problem of private stochastic convex optimization. We propose an algorithm based on noisy mirror descent, which achieves optimal rates both in terms of statistical complexity and number of queries to a first-order stochastic oracle in the regime when the privacy parameter is inversely proportional to the number of samples.

    Submitted 17 November, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

  11. arXiv:1907.12189  [pdf, ps, other

    cs.LG stat.ML

    Bandits with Feedback Graphs and Switching Costs

    Authors: Raman Arora, Teodor V. Marinov, Mehryar Mohri

    Abstract: We study the adversarial multi-armed bandit problem where partial observations are available and where, in addition to the loss incurred for each action, a \emph{switching cost} is incurred for shifting to a new action. All previously known results incur a factor proportional to the independence number of the feedback graph. We give a new algorithm whose regret guarantee depends only on the domina… ▽ More

    Submitted 22 March, 2020; v1 submitted 28 July, 2019; originally announced July 2019.

    Comments: Camera ready from NeurIPS 2019, new algorithm and improved results in Section 3.2

  12. arXiv:1811.04127  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Policy Regret in Repeated Games

    Authors: Raman Arora, Michael Dinitz, Teodor V. Marinov, Mehryar Mohri

    Abstract: The notion of \emph{policy regret} in online learning is a well defined? performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of pla… ▽ More

    Submitted 22 March, 2020; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: Camera ready from NeurIPS 2018; 25 pages; Slightly updated results and proofs for Section 3 and Section 4

  13. arXiv:1808.00934  [pdf, other

    cs.LG cs.AI stat.ML

    Streaming Kernel PCA with $\tilde{O}(\sqrt{n})$ Random Features

    Authors: Enayat Ullah, Poorya Mianjy, Teodor V. Marinov, Raman Arora

    Abstract: We study the statistical and computational aspects of kernel principal component analysis using random Fourier features and show that under mild assumptions, $O(\sqrt{n} \log n)$ features suffices to achieve $O(1/ε^2)$ sample complexity. Furthermore, we give a memory efficient streaming algorithm based on classical Oja's algorithm that achieves this rate.

    Submitted 15 November, 2018; v1 submitted 2 August, 2018; originally announced August 2018.

    Comments: Advances in Neural Information Processing Systems (NIPS), 2018. 42 pages, 3 figures

  14. arXiv:1702.06818  [pdf, other

    cs.LG stat.ML

    Stochastic Approximation for Canonical Correlation Analysis

    Authors: Raman Arora, Teodor V. Marinov, Poorya Mianjy, Nathan Srebro

    Abstract: We propose novel first-order stochastic approximation algorithms for canonical correlation analysis (CCA). Algorithms presented are instances of inexact matrix stochastic gradient (MSG) and inexact matrix exponentiated gradient (MEG), and achieve $ε$-suboptimality in the population objective in $\operatorname{poly}(\frac{1}ε)$ iterations. We also consider practical variants of the proposed algorit… ▽ More

    Submitted 26 February, 2018; v1 submitted 22 February, 2017; originally announced February 2017.