Zum Hauptinhalt springen

Showing 1–50 of 73 results for author: Lattimore, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06506  [pdf, ps, other

    math.OC cs.LG stat.ML

    Online Newton Method for Bandit Convex Optimisation

    Authors: Hidde Fokkema, Dirk van der Hoeven, Tor Lattimore, Jack J. Mayo

    Abstract: We introduce a computationally efficient algorithm for zeroth-order bandit convex optimisation and prove that in the adversarial setting its regret is at most $d^{3.5} \sqrt{n} \mathrm{polylog}(n, d)$ with high probability where $d$ is the dimension and $n$ is the time horizon. In the stochastic setting the bound improves to $M d^{2} \sqrt{n} \mathrm{polylog}(n, d)$ where… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2402.06535  [pdf, other

    math.OC cs.LG stat.ML

    Bandit Convex Optimisation

    Authors: Tor Lattimore

    Abstract: Bandit convex optimisation is a fundamental framework for studying zeroth-order convex optimisation. These notes cover the many tools used for this problem, including cutting plane methods, interior point methods, continuous exponential weights, gradient descent and online Newton step. The nuances between the many assumptions and setups are explained. Although there is not much truly new here, som… ▽ More

    Submitted 10 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: 200 pages. More polished and some new results

  3. arXiv:2311.13294  [pdf, other

    cs.LG cs.AI

    Probabilistic Inference in Reinforcement Learning Done Right

    Authors: Jean Tarbouriech, Tor Lattimore, Brendan O'Donoghue

    Abstract: A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statist… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023

  4. arXiv:2306.13053  [pdf, ps, other

    cs.LG

    Context-lumpable stochastic bandits

    Authors: Chung-Wei Lee, Qinghua Liu, Yasin Abbasi-Yadkori, Chi Jin, Tor Lattimore, Csaba Szepesvári

    Abstract: We consider a contextual bandit problem with $S$ contexts and $K$ actions. In each round $t=1,2,\dots$, the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into $r\le \min\{S,K\}$ groups such that… ▽ More

    Submitted 27 November, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  5. arXiv:2305.11908  [pdf, other

    cs.HC cs.LG q-bio.NC stat.ML

    Sequential Best-Arm Identification with Application to Brain-Computer Interface

    Authors: Xin Zhou, Botao Hao, Jian Kang, Tor Lattimore, Lexin Li

    Abstract: A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system. It allows individuals to interact with the device using only their thoughts, and holds immense potential for a wide range of applications in medicine, rehabilitation, and human augmentation. An electroencephalogram (EEG) and event-related potential (ERP)-b… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  6. arXiv:2302.05371  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Second-Order Method for Stochastic Bandit Convex Optimisation

    Authors: Tor Lattimore, András György

    Abstract: We introduce a simple and efficient algorithm for unconstrained zeroth-order stochastic convex bandits and prove its regret is at most $(1 + r/d)[d^{1.5} \sqrt{n} + d^3] polylog(n, d, r)$ where $n$ is the horizon, $d$ the dimension and $r$ is the radius of a known ball containing the minimiser of the loss.

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: 27 pages

  7. arXiv:2302.03683  [pdf, ps, other

    cs.LG stat.ML

    Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

    Authors: Johannes Kirschner, Tor Lattimore, Andreas Krause

    Abstract: Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent results on the linear formulation of partial monitoring that naturally generalizes the standard linear bandit setting. The main result is that a single algorithm,… ▽ More

    Submitted 13 November, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  8. arXiv:2302.03319  [pdf, ps, other

    cs.LG math.ST stat.ML

    Leveraging Demonstrations to Improve Online Learning: Quality Matters

    Authors: Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

    Abstract: We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning… ▽ More

    Submitted 17 May, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023

  9. arXiv:2206.04640  [pdf, ps, other

    cs.LG cs.IT math.ST stat.ME stat.ML

    Regret Bounds for Information-Directed Reinforcement Learning

    Authors: Botao Hao, Tor Lattimore

    Abstract: Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for Markov Decision Processes (MDPs) is still limited. We develop novel information-theoretic tools to bound the information ratio and cumulative information gain about the learning target. Our theoretical results shed light on the… ▽ More

    Submitted 24 November, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted at NeurIPS 2022

  10. arXiv:2205.13170  [pdf, other

    cs.LG stat.ML

    Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

    Authors: Sanae Amani, Tor Lattimore, András György, Lin F. Yang

    Abstract: We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $Ω(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. W… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

  11. arXiv:2205.10895  [pdf, ps, other

    cs.LG math.ST stat.ML

    Contextual Information-Directed Sampling

    Authors: Botao Hao, Tor Lattimore, Chao Qin

    Abstract: Information-directed sampling (IDS) has recently demonstrated its potential as a data-efficient reinforcement learning algorithm. However, it is still unclear what is the right form of information ratio to optimize when contextual information is available. We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandi… ▽ More

    Submitted 9 June, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

    Comments: Accepted at ICML 2022

  12. arXiv:2202.10997  [pdf, ps, other

    math.OC cs.LG

    Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret

    Authors: Tor Lattimore

    Abstract: We show that a version of the generalised information ratio of Lattimore and Gyorgy (2020) determines the asymptotic minimax regret for all finite-action partial monitoring games provided that (a) the standard definition of regret is used but the latent space where the adversary plays is potentially infinite; or (b) the regret introduced by Rustichini (1999) is used and the latent space is finite.… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: 28 pages

  13. arXiv:2110.15688  [pdf, other

    stat.ML cs.LG

    Variational Bayesian Optimistic Sampling

    Authors: Brendan O'Donoghue, Tor Lattimore

    Abstract: We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy. We provide a new analysis showing that any algorithm producing policies in the optimistic set enjoys $\tilde O(\sqrt{AT})$ Bayesian regret for a problem wi… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

  14. arXiv:2107.02266  [pdf, other

    math.ST cs.LG stat.ML

    Near-optimal inference in adaptive linear regression

    Authors: Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright

    Abstract: When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation. Our pr… ▽ More

    Submitted 21 March, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: 51 pages, 7 figures

  15. arXiv:2106.01660  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    Bandit Phase Retrieval

    Authors: Tor Lattimore, Botao Hao

    Abstract: We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, θ_\star\rangle^2$ where $θ_\star \in \mathbb R^d$ is an unknown parameter vector. We prove that the minimax cumulative regret in this problem is $\smash{\tilde Θ(d \sqrt{n})}$, which improves on the best known bounds by a factor of… ▽ More

    Submitted 4 June, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

  16. arXiv:2106.00444  [pdf, other

    cs.LG math.OC

    Minimax Regret for Bandit Convex Optimisation of Ridge Functions

    Authors: Tor Lattimore

    Abstract: We analyse adversarial bandit convex optimisation with an adversary that is restricted to playing functions of the form $f_t(x) = g_t(\langle x, θ\rangle)$ for convex $g_t : \mathbb R \to \mathbb R$ and unknown $θ\in \mathbb R^d$ that is homogeneous over time. We provide a short information-theoretic proof that the minimax regret is at most $O(d \sqrt{n} \log(n \operatorname{diam}(\mathcal K)))$ w… ▽ More

    Submitted 6 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: Correcting an (instructive) error that leads to a weaker result

  17. arXiv:2105.14267  [pdf, other

    stat.ML cs.LG math.ST

    Information Directed Sampling for Sparse Linear Bandits

    Authors: Botao Hao, Tor Lattimore, Wei Deng

    Abstract: Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which naturally balances the information-regret trade-off. We develop a class of information-theoretic Bayesian regret bounds that nearly match existing lower bounds on a v… ▽ More

    Submitted 29 May, 2021; originally announced May 2021.

  18. arXiv:2104.02293  [pdf, other

    cs.LG

    On the Optimality of Batch Policy Optimization Algorithms

    Authors: Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

    Abstract: Batch policy optimization considers leveraging existing data for policy construction before interacting with an environment. Although interest in this problem has grown significantly in recent years, its theoretical foundations remain under-developed. To advance the understanding of this problem, we provide three results that characterize the limits and possibilities of batch policy optimization i… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: 29 pages, 8 figures

  19. arXiv:2101.02055  [pdf, other

    cs.LG

    Geometric Entropic Exploration

    Authors: Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

    Abstract: Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy optimization problem whose solution aims at visiting all states as uniformly as possible. This is in contrast to standard uncertainty-based approaches where exploration is transient and eventually vanishes. However, exis… ▽ More

    Submitted 7 January, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

  20. arXiv:2011.05944  [pdf, other

    stat.ML cs.LG

    Asymptotically Optimal Information-Directed Sampling

    Authors: Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

    Abstract: We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Ou… ▽ More

    Submitted 2 July, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: Accepted at COLT 2021

  21. arXiv:2011.04020  [pdf, other

    stat.ML cs.LG math.ST

    High-Dimensional Sparse Linear Bandits

    Authors: Botao Hao, Tor Lattimore, Mengdi Wang

    Abstract: Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising. We derive a novel $Ω(n^{2/3})$ dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditione… ▽ More

    Submitted 4 September, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: Accepted by NeurIPS 2020

  22. arXiv:2011.04019  [pdf, other

    cs.LG math.ST stat.ML

    Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

    Authors: Botao Hao, Yaqi Duan, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

    Abstract: This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation. When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient. We first consider the off-policy policy evaluation problem. To evaluate a new target policy, we analyze… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

  23. arXiv:2011.04018  [pdf, other

    cs.LG math.ST stat.ML

    Online Sparse Reinforcement Learning

    Authors: Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

    Abstract: We investigate the hardness of online reinforcement learning in fixed horizon, sparse linear Markov decision process (MDP), with a special focus on the high-dimensional regime where the ambient dimension is larger than the number of episodes. Our contribution is two-fold. First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy… ▽ More

    Submitted 10 February, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: Accepted at AISTATS 2021

  24. arXiv:2009.12228  [pdf, ps, other

    math.OC cs.LG stat.ML

    Mirror Descent and the Information Ratio

    Authors: Tor Lattimore, András György

    Abstract: We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014]. Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directe… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

  25. arXiv:2006.05964  [pdf, other

    cs.LG stat.ML

    Gaussian Gated Linear Networks

    Authors: David Budden, Adam Marblestone, Eren Sezener, Tor Lattimore, Greg Wayne, Joel Veness

    Abstract: We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks. Instead of using backpropagation to learn features, GLNs have a distributed and local credit assignment mechanism based on optimizing a convex objective. This gives rise to many desirable properties including universality, data-efficient online learning, trivial interpret… ▽ More

    Submitted 21 October, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

  26. arXiv:2006.05145  [pdf, other

    cs.LG stat.CO stat.ML

    Matrix games with bandit feedback

    Authors: Brendan O'Donoghue, Tor Lattimore, Ian Osband

    Abstract: We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix is known to the players. Despite numerous applications, this problem has received relatively little attention. Although adversarial bandit algorithms achieve lo… ▽ More

    Submitted 12 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  27. arXiv:2006.00475  [pdf, other

    math.OC cs.LG stat.ML

    Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

    Authors: Tor Lattimore

    Abstract: We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex fun… ▽ More

    Submitted 25 September, 2020; v1 submitted 31 May, 2020; originally announced June 2020.

    Comments: To appear in Mathematical Statistics and Learning. 22 pages, 6 figures

  28. arXiv:2003.01704  [pdf, other

    cs.LG stat.ML

    Model Selection in Contextual Stochastic Bandit Problems

    Authors: Aldo Pacchiano, My Phan, Yasin Abbasi-Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari

    Abstract: We study bandit model selection in stochastic environments. Our approach relies on a meta-algorithm that selects between candidate base algorithms. We develop a meta-algorithm-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial meta-algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that… ▽ More

    Submitted 4 December, 2022; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: 33 main pages, 15 appendix pages

  29. arXiv:2002.11182  [pdf, other

    stat.ML cs.LG

    Information Directed Sampling for Linear Partial Monitoring

    Authors: Johannes Kirschner, Tor Lattimore, Andreas Krause

    Abstract: Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observabili… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

  30. arXiv:1911.07676  [pdf, ps, other

    stat.ML cs.LG

    Learning with Good Feature Representations in Bandits and in RL with a Generative Model

    Authors: Tor Lattimore, Csaba Szepesvari, Gellert Weisz

    Abstract: The construction by Du et al. (2019) implies that even if a learner is given linear features in $\mathbb R^d$ that approximate the rewards in a bandit with a uniform error of $ε$, then searching for an action that is optimal up to $O(ε)$ requires examining essentially all actions. We use the Kiefer-Wolfowitz theorem to prove a positive result that by checking only a few actions, a learner can alwa… ▽ More

    Submitted 19 February, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: 13 pages

  31. arXiv:1910.06996  [pdf, other

    cs.LG math.ST stat.ML

    Adaptive Exploration in Linear Contextual Bandit

    Authors: Botao Hao, Tor Lattimore, Csaba Szepesvari

    Abstract: Contextual bandits serve as a fundamental model for many sequential decision making tasks. The most popular theoretically justified approaches are based on the optimism principle. While these algorithms can be practical, they are known to be suboptimal asymptotically. On the other hand, existing asymptotically optimal algorithms for this problem do not exploit the linear structure in an optimal wa… ▽ More

    Submitted 14 March, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: Accepted at AISTATS 2020

  32. arXiv:1910.01526  [pdf, other

    cs.LG cs.IT stat.ML

    Gated Linear Networks

    Authors: Joel Veness, Tor Lattimore, David Budden, Avishkar Bhoopchand, Christopher Mattern, Agnieszka Grabska-Barwinska, Eren Sezener, Jianan Wang, Peter Toth, Simon Schmitt, Marcus Hutter

    Abstract: This paper presents a new family of backpropagation-free neural architectures, Gated Linear Networks (GLNs). What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn feature representations in favor of rapid online learning. Individual neurons can model… ▽ More

    Submitted 11 June, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1712.01897

  33. arXiv:1908.03568  [pdf, other

    cs.LG cs.AI stat.ML

    Behaviour Suite for Reinforcement Learning

    Authors: Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt

    Abstract: This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to stud… ▽ More

    Submitted 14 February, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

  34. arXiv:1907.13062  [pdf, ps, other

    cs.DS cs.AI

    Iterative Budgeted Exponential Search

    Authors: Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant

    Abstract: We tackle two long-standing problems related to re-expansions in heuristic search algorithms. For graph search, A* can require $Ω(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound. Existing algorithms that address this problem like B and B' improve this bound to $Ω(n^2)$. For tree search, IDA* can also require $Ω(n^2)$ expansions. We describe a new algorithmic framew… ▽ More

    Submitted 30 July, 2019; originally announced July 2019.

  35. arXiv:1907.05772  [pdf, other

    cs.LG math.OC stat.ML

    Exploration by Optimisation in Partial Monitoring

    Authors: Tor Lattimore, Csaba Szepesvari

    Abstract: We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring game for which the $n$-round minimax regret is bounded by $6(d+1) k^{3/2} \sqrt{n \log(k)}$, matching the best known information-theoretic upper bound. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games.

    Submitted 25 October, 2019; v1 submitted 12 July, 2019; originally announced July 2019.

    Comments: high probability bounds, experiments and simplified algorithms/analysis

  36. arXiv:1906.03242  [pdf, other

    cs.AI cs.DS

    Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

    Authors: Laurent Orseau, Levi H. S. Lelis, Tor Lattimore

    Abstract: We introduce and analyze two parameter-free linear-memory tree search algorithms. Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree. Previously, the best guarantee for a linear-memory algorithm under similar assumptions was achieved by IDA*, which in the worst case expands quadratically mo… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: This paper and another independent IJCAI 2019 submission have been merged into a single paper that subsumes both of them (Helmert et. al., 2019). This paper is placed here only for historical context. Please only cite the subsuming paper

  37. arXiv:1905.11817  [pdf, other

    cs.LG stat.ML

    Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

    Authors: Julian Zimmert, Tor Lattimore

    Abstract: The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in m… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

  38. arXiv:1903.07890  [pdf, ps, other

    cs.LG stat.ML

    On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits

    Authors: Roman Pogodin, Tor Lattimore

    Abstract: We make three contributions to the theory of k-armed adversarial bandits. First, we prove a first-order bound for a modified variant of the INF strategy by Audibert and Bubeck [2009], without sacrificing worst case optimality or modifying the loss estimators. Second, we provide a variance analysis for algorithms based on follow the regularised leader, showing that without adaptation the variance o… ▽ More

    Submitted 24 July, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

    Comments: 14 pages

  39. Degenerate Feedback Loops in Recommender Systems

    Authors: Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, Pushmeet Kohli

    Abstract: Machine learning is used extensively in recommender systems deployed in products. The decisions made by these systems can influence user beliefs and preferences which in turn affect the feedback the learning system receives - thus creating a feedback loop. This phenomenon can give rise to the so-called "echo chambers" or "filter bubbles" that have user and societal implications. In this paper, we… ▽ More

    Submitted 27 March, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Journal ref: Proceedings of AAAI/ACM Conference on AI, Ethics, and Society, Honolulu, HI, USA, January 27-28, 2019 (AIES '19)

  40. arXiv:1902.00470  [pdf, other

    cs.LG math.OC stat.ML

    An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

    Authors: Tor Lattimore, Csaba Szepesvari

    Abstract: We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic tools of Russo and Van Roy (2016) for proving Bayesian regret bounds and combine them with the minimax theorem to derive minimax regret bounds for various partial… ▽ More

    Submitted 29 May, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 29 pages, to appear in COLT 2019

  41. arXiv:1901.11530  [pdf, other

    cs.LG cs.AI stat.ML

    A Geometric Perspective on Optimal Representations for Reinforcement Learning

    Authors: Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

    Abstract: We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functions. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary po… ▽ More

    Submitted 25 June, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

  42. arXiv:1901.02230  [pdf, other

    cs.LG stat.ML

    Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

    Authors: Laurent Orseau, Tor Lattimore, Shane Legg

    Abstract: We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms. We argue that existing algorithms such as exponentiated gradient, online gradient descent and online Newton step do not adequately satisfy both requirements. Our main contribution is an analysis of the Prod algorithm that is robust to any data sequence and runs in linear time rel… ▽ More

    Submitted 8 January, 2019; originally announced January 2019.

    Journal ref: Algorithmic Learning Theory 2017

  43. arXiv:1811.10928  [pdf, other

    cs.AI

    Single-Agent Policy Tree Search With Guarantees

    Authors: Laurent Orseau, Levi H. S. Lelis, Tor Lattimore, Théophane Weber

    Abstract: We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to prove an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for `needle-in-a-haystack' problems. The second algorithm is based on s… ▽ More

    Submitted 28 November, 2018; v1 submitted 27 November, 2018; originally announced November 2018.

    Journal ref: 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada

  44. arXiv:1811.05154  [pdf, other

    cs.LG stat.ML

    Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

    Authors: Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore

    Abstract: We propose a bandit algorithm that explores by randomizing its history of rewards. Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards. We design the pseudo rewards such that the bootstrap mean is optimistic with a sufficiently high probability. We call our algorithm Giro, which stands for garbage in, reward out. We an… ▽ More

    Submitted 20 June, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

    Comments: Proceedings of the 36th International Conference on Machine Learning

  45. arXiv:1810.02567  [pdf, other

    stat.ML cs.LG

    Online Learning to Rank with Features

    Authors: Shuai Li, Tor Lattimore, Csaba Szepesvári

    Abstract: We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter. Only relatively mild assumptions are made on the examination function. A novel algorithm for this setup is analysed, showing that the dependence on the number of items is… ▽ More

    Submitted 25 May, 2019; v1 submitted 5 October, 2018; originally announced October 2018.

  46. arXiv:1807.02089  [pdf, other

    stat.ML cs.LG

    Linear Bandits with Stochastic Delayed Feedback

    Authors: Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

    Abstract: Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by practitioners hoping to apply existing algorithms is that usually the feedback is randomly delayed and delays are only partially observable. For example, while a purchase… ▽ More

    Submitted 2 March, 2020; v1 submitted 5 July, 2018; originally announced July 2018.

  47. arXiv:1806.05819  [pdf, other

    cs.LG stat.ML

    BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback

    Authors: Chang Li, Branislav Kveton, Tor Lattimore, Ilya Markov, Maarten de Rijke, Csaba Szepesvari, Masrour Zoghi

    Abstract: In this paper, we study the problem of safe online learning to re-rank, where user feedback is used to improve the quality of displayed lists. Learning to rank has traditionally been studied in two settings. In the offline setting, rankers are typically learned from relevance labels created by judges. This approach has generally become standard in industrial applications of ranking, such as search… ▽ More

    Submitted 29 June, 2019; v1 submitted 15 June, 2018; originally announced June 2018.

  48. arXiv:1806.02248  [pdf, other

    stat.ML cs.LG

    TopRank: A practical algorithm for online stochastic ranking

    Authors: Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari

    Abstract: Online learning to rank is a sequential decision-making problem where in each round the learning agent chooses a list of items and receives feedback in the form of clicks from the user. Many sample-efficient algorithms have been proposed for this problem that assume a specific click model connecting rankings and user behavior. We propose a generalized click model that encompasses many existing mod… ▽ More

    Submitted 18 March, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

  49. arXiv:1805.09247  [pdf, ps, other

    cs.LG stat.ML

    Cleaning up the neighborhood: A full classification for adversarial partial monitoring

    Authors: Tor Lattimore, Csaba Szepesvari

    Abstract: Partial monitoring is a generalization of the well-known multi-armed bandit framework where the loss is not directly observed by the learner. We complete the classification of finite adversarial partial monitoring to include all games, solving an open problem posed by Bartok et al. [2014]. Along the way we simplify and improve existing algorithms and correct errors in previous analyses. Our second… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

    Comments: 24 pages

  50. arXiv:1712.01897  [pdf, other

    cs.LG cs.IT

    Online Learning with Gated Linear Networks

    Authors: Joel Veness, Tor Lattimore, Avishkar Bhoopchand, Agnieszka Grabska-Barwinska, Christopher Mattern, Peter Toth

    Abstract: This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss. Rather than relying on non-linear transfer functions, our method gains representational power by the use of data conditioning. We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borel-measurable function on a c… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

    Comments: 40 pages