Zum Hauptinhalt springen

Showing 1–46 of 46 results for author: Neu, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12600  [pdf, ps, other

    cs.LG

    Generalization bounds for mixing processes via delayed online-to-PAC conversions

    Authors: Baptiste Abeles, Eugenio Clerico, Gergely Neu

    Abstract: We study the generalization error of statistical learning algorithms in a non-i.i.d. setting, where the training data is sampled from a stationary mixing process. We develop an analytic framework for this scenario based on a reduction to online learning with delayed feedback. In particular, we show that the existence of an online learning algorithm with bounded regret (against a fixed statistical… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2406.04056  [pdf, other

    cs.LG math.OC stat.ML

    Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently

    Authors: Sergio Calo, Anders Jonsson, Gergely Neu, Ludovic Schwartz, Javier Segovia-Aguas

    Abstract: We propose a new framework for formulating optimal transport distances between Markov chains. Previously known formulations studied couplings between the entire joint distribution induced by the chains, and derived solutions via a reduction to dynamic programming (DP) in an appropriately defined Markov decision process. This formulation has, however, not led to particularly efficient algorithms so… ▽ More

    Submitted 11 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2405.13755  [pdf, ps, other

    cs.LG

    Offline RL via Feature-Occupancy Gradient Ascent

    Authors: Gergely Neu, Nneka Okolo

    Abstract: We study offline Reinforcement Learning in large infinite-horizon discounted Markov Decision Processes (MDPs) when the reward and transition models are linearly realizable under a known feature map. Starting from the classic linear-program formulation of the optimal control problem in MDPs, we develop a new algorithm that performs a form of gradient ascent in the space of feature occupancies, defi… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 26 pages

  4. arXiv:2402.15411  [pdf, other

    cs.LG

    Optimistic Information Directed Sampling

    Authors: Gergely Neu, Matteo Papini, Ludovic Schwartz

    Abstract: We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory of information-directed sampling due to Russo and Van Roy (2018) and the worst-case theory of Foster, Kakade, Qian, and Rakhlin (2021) based on the decision-esti… ▽ More

    Submitted 27 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  5. arXiv:2402.13903  [pdf, ps, other

    cs.LG math.OC stat.ML

    Dealing with unbounded gradients in stochastic saddle-point optimization

    Authors: Gergely Neu, Nneka Okolo

    Abstract: We study the performance of stochastic first-order methods for finding saddle points of convex-concave functions. A notorious challenge faced by such methods is that the gradients can grow arbitrarily large during optimization, which may result in instability and divergence. In this paper, we propose a simple and effective regularization technique that stabilizes the iterates and yields meaningful… ▽ More

    Submitted 7 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 23 pages

  6. arXiv:2310.01609  [pdf, ps, other

    stat.ML cs.LG

    Adversarial Contextual Bandits Go Kernelized

    Authors: Gergely Neu, Julia Olkhovskaya, Sattar Vakili

    Abstract: We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achi… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  7. arXiv:2309.15771  [pdf, other

    cs.LG stat.ML

    Importance-Weighted Offline Learning Done Right

    Authors: Germano Gabbianelli, Gergely Neu, Matteo Papini

    Abstract: We study the problem of offline policy optimization in stochastic contextual bandit problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy. Rather than making any structural assumptions on the reward function, we assume access to a given policy class and aim to compete with the best comparator policy within this clas… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  8. arXiv:2305.19674  [pdf, other

    stat.ML cs.LG

    Online-to-PAC Conversions: Generalization Bounds via Regret Analysis

    Authors: Gábor Lugosi, Gergely Neu

    Abstract: We present a new framework for deriving bounds on the generalization bound of statistical learning algorithms from the perspective of online learning. Specifically, we construct an online learning game called the "generalization game", where an online learner is trying to compete with a fixed statistical learning algorithm in predicting the sequence of generalization gaps on a training set of i.i.… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  9. arXiv:2305.12944  [pdf, ps, other

    cs.LG

    Offline Primal-Dual Reinforcement Learning for Linear MDPs

    Authors: Germano Gabbianelli, Gergely Neu, Nneka Okolo, Matteo Papini

    Abstract: Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy. This problem has attracted a lot of attention recently, but most existing methods with strong theoretical guarantees are restricted to finite-horizon or tabular settings. In constrast, few algorithms for infinite-horizon settings with function approximation and m… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  10. arXiv:2305.00832  [pdf, ps, other

    cs.LG stat.ML

    First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

    Authors: Julia Olkhovskaya, Jack Mayo, Tim van Erven, Gergely Neu, Chen-Yu Wei

    Abstract: We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of $K$ arms to change over time without restriction. Assuming the $d$-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of $T$ rounds is known to scale as $\tilde O(\sqrt{Kd T})$. Under the additional assumption that the… ▽ More

    Submitted 24 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

  11. arXiv:2302.14004  [pdf, other

    cs.LG stat.ML

    Optimistic Planning by Regularized Dynamic Programming

    Authors: Antoine Moulin, Gergely Neu

    Abstract: We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particula… ▽ More

    Submitted 14 June, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  12. arXiv:2210.12057  [pdf, ps, other

    cs.LG math.OC

    Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization

    Authors: Gergely Neu, Nneka Okolo

    Abstract: We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a sm… ▽ More

    Submitted 31 January, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: 23 pages including reference and appendix

  13. arXiv:2210.09409  [pdf, other

    math.OC cs.LG

    Sufficient Exploration for Convex Q-learning

    Authors: Fan Lu, Prashant Mehta, Sean Meyn, Gergely Neu

    Abstract: In recent years there has been a collective research effort to find new formulations of reinforcement learning that are simultaneously more efficient and more amenable to analysis. This paper concerns one approach that builds on the linear programming (LP) formulation of optimal control of Manne. A primal version is called logistic Q-learning, and a dual variant is convex Q-learning. This paper fo… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  14. arXiv:2209.10968  [pdf, other

    cs.LG

    Proximal Point Imitation Learning

    Authors: Luca Viano, Angeliki Kamoutsi, Gergely Neu, Igor Krawczuk, Volkan Cevher

    Abstract: This work develops new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning (IL) with linear function approximation without restrictive coherence assumptions. We begin with the minimax formulation of the problem and then outline how to leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing, for online and offl… ▽ More

    Submitted 30 May, 2023; v1 submitted 22 September, 2022; originally announced September 2022.

  15. arXiv:2207.08956  [pdf, other

    cs.LG stat.ML

    Online Learning with Off-Policy Feedback

    Authors: Germano Gabbianelli, Matteo Papini, Gergely Neu

    Abstract: We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead sees the ones obtained by another unknown policy run in parallel (behavior policy). Instead of a standard exploration-exploitation dilemma, the learner has to f… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  16. arXiv:2205.13924  [pdf, ps, other

    cs.LG

    Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

    Authors: Gergely Neu, Julia Olkhovskaya, Matteo Papini, Ludovic Schwartz

    Abstract: We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts. We adapt the information-theoretic perspective of \cite{RvR16} to the contextual setting by considering a lifted version of the information ratio defined in terms of the unknown model parameter instead of the optimal action or optimal policy as done… ▽ More

    Submitted 6 March, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

  17. arXiv:2202.04985  [pdf, other

    stat.ML cs.LG

    Generalization Bounds via Convex Analysis

    Authors: Gábor Lugosi, Gergely Neu

    Abstract: Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail. In this work, we generalize this result beyond the standard choice of Shann… ▽ More

    Submitted 19 July, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

  18. arXiv:2112.15430  [pdf, other

    cs.LG cs.AI math.OC

    Robustness and risk management via distributional dynamic programming

    Authors: Mastane Achab, Gergely Neu

    Abstract: In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally in distributional reinforcement learning (DRL), the focus is on the whole distribution of the return, not just its expectation. Although DRL-based methods produ… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  19. arXiv:2109.11909  [pdf, ps, other

    cs.LG

    Learning to maximize global influence from local observations

    Authors: Gábor Lugosi, Gergely Neu, Julia Olkhovskaya

    Abstract: We study a family online influence maximization problems where in a sequence of rounds $t=1,\ldots,T$, a decision maker selects one from a large number of agents with the goal of maximizing influence. Upon choosing an agent, the decision maker shares a piece of information with the agent, which information then spreads in an unobserved network over which the agents communicate. The goal of the dec… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:1805.11022

  20. arXiv:2102.00931  [pdf, other

    cs.LG stat.ML

    Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

    Authors: Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy

    Abstract: We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds dep… ▽ More

    Submitted 15 August, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: COLT 2021

  21. arXiv:2010.11151  [pdf, other

    cs.LG cs.AI stat.ML

    Logistic Q-Learning

    Authors: Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

    Abstract: We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The method is closely related to the classic Relative Entropy Policy Search (REPS) algorithm of Peters et al. (2010), with the key difference that our method introduces a Q-function that enables efficient exact model-free implementation. The main feature of our al… ▽ More

    Submitted 25 February, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

  22. arXiv:2007.01891  [pdf, ps, other

    cs.LG stat.ML

    A Unifying View of Optimism in Episodic Reinforcement Learning

    Authors: Gergely Neu, Ciara Pike-Burke

    Abstract: The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs a… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

  23. arXiv:2007.01612  [pdf, ps, other

    cs.LG stat.ML

    Online learning in MDPs with linear function approximation and bandit feedback

    Authors: Gergely Neu, Julia Olkhovskaya

    Abstract: We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets to observe the rewards associated with its actions. We allow the state space to be arbitrarily large, but we assume that all action-value functions can be repre… ▽ More

    Submitted 12 June, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

  24. arXiv:2002.00287  [pdf, ps, other

    cs.LG stat.ML

    Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

    Authors: Gergely Neu, Julia Olkhovskaya

    Abstract: We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time. Under the assumption that the $d$-dimensional contexts are generated i.i.d.~at random from a known distributions, we develop computationally efficient algorithms based on the classic Exp3 algo… ▽ More

    Submitted 24 May, 2022; v1 submitted 1 February, 2020; originally announced February 2020.

  25. arXiv:2001.10623  [pdf, ps, other

    cs.LG math.ST stat.ML

    Fast Rates for Online Prediction with Abstention

    Authors: Gergely Neu, Nikita Zhivotovskiy

    Abstract: In the setting of sequential prediction of individual $\{0, 1\}$-sequences with expert advice, we show that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $\frac 12$ (say, $0.49$), it is possible to achieve expected regret bounds that are independent of the time horizon $T$. We exactly characterize the dependence on the abstention cost $c$ and the n… ▽ More

    Submitted 20 June, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

    Comments: 19 pages, minor corrections, to appear in COLT

  26. arXiv:1909.10904  [pdf, ps, other

    math.OC cs.LG stat.ML

    Faster saddle-point optimization for solving large-scale Markov decision processes

    Authors: Joan Bas-Serrano, Gergely Neu

    Abstract: We consider the problem of computing optimal policies in average-reward Markov decision processes. This classical problem can be formulated as a linear program directly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the number of states. To address this issue, recent work has considered a linearly relaxed version of the resulting saddle-point pro… ▽ More

    Submitted 10 January, 2020; v1 submitted 22 September, 2019; originally announced September 2019.

  27. arXiv:1906.07987  [pdf, other

    cs.LG cs.AI stat.ML

    Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

    Authors: Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

    Abstract: We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this pa… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

  28. arXiv:1902.08668  [pdf, other

    stat.ML cs.LG

    Beating SGD Saturation with Tail-Averaging and Minibatching

    Authors: Nicole Mücke, Gergely Neu, Lorenzo Rosasco

    Abstract: While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, and in particular tail ave… ▽ More

    Submitted 26 May, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

  29. arXiv:1902.03035  [pdf, ps, other

    cs.LG stat.ML

    Bandit Principal Component Analysis

    Authors: Wojciech Kotłowski, Gergely Neu

    Abstract: We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices. We focus on a natural notion of bandit feedback where the learner only observes the loss associated with its own prediction. Based on the classical ob… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

  30. Potential and Pitfalls of Multi-Armed Bandits for Decentralized Spatial Reuse in WLANs

    Authors: Francesc Wilhelmi, Sergio Barrachina-Muñoz, Cristina Cano, Boris Bellalta, Anders Jonsson, Gergely Neu

    Abstract: Spatial Reuse (SR) has recently gained attention to maximize the performance of IEEE 802.11 Wireless Local Area Networks (WLANs). Decentralized mechanisms are expected to be key in the development of SR solutions for next-generation WLANs, since many deployments are characterized by being uncoordinated by nature. However, the potential of decentralized mechanisms is limited by the significant lack… ▽ More

    Submitted 14 December, 2018; v1 submitted 28 May, 2018; originally announced May 2018.

  31. arXiv:1805.11022  [pdf, ps, other

    cs.LG stat.ML

    Online Influence Maximization with Local Observations

    Authors: Julia Olkhovskaya, Gergely Neu, Gábor Lugosi

    Abstract: We consider an online influence maximization problem in which a decision maker selects a node among a large number of possibilities and places a piece of information at the node. The node transmits the information to some others that are in the same connected component in a random graph. The goal of the decision maker is to reach as many nodes as possible, with the added complication that feedback… ▽ More

    Submitted 28 May, 2018; originally announced May 2018.

  32. arXiv:1802.08009  [pdf, ps, other

    cs.LG stat.ML

    Iterate averaging as regularization for stochastic gradient descent

    Authors: Gergely Neu, Lorenzo Rosasco

    Abstract: We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least squares regression, we show that this averaging scheme has a the same regularizing effect, and indeed is asymptoticall… ▽ More

    Submitted 22 February, 2018; originally announced February 2018.

  33. arXiv:1802.04327  [pdf, other

    cs.NI

    Wireless Optimisation via Convex Bandits: Unlicensed LTE/WiFi Coexistence

    Authors: Cristina Cano, Gergely Neu

    Abstract: Bandit Convex Optimisation (BCO) is a powerful framework for sequential decision-making in non-stationary and partially observable environments. In a BCO problem, a decision-maker sequentially picks actions to minimize the cumulative cost associated with these decisions, all while receiving partial feedback about the state of the environment. This formulation is a very natural fit for wireless-net… ▽ More

    Submitted 5 February, 2018; originally announced February 2018.

    Comments: 11 pages, 5 figures

  34. arXiv:1710.11403  [pdf, other

    cs.NI

    Collaborative Spatial Reuse in Wireless Networks via Selfish Multi-Armed Bandits

    Authors: Francesc Wilhelmi, Cristina Cano, Gergely Neu, Boris Bellalta, Anders Jonsson, Sergio Barrachina-Muñoz

    Abstract: Next-generation wireless deployments are characterized by being dense and uncoordinated, which often leads to inefficient use of resources and poor performance. To solve this, we envision the utilization of completely decentralized mechanisms to enable Spatial Reuse (SR). In particular, we focus on dynamic channel selection and Transmission Power Control (TPC). We rely on Reinforcement Learning (R… ▽ More

    Submitted 13 November, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

  35. arXiv:1710.05739  [pdf, ps, other

    cs.LG math.OC stat.ML

    On the Hardness of Inventory Management with Censored Demand Data

    Authors: Gábor Lugosi, Mihalis G. Markakis, Gergely Neu

    Abstract: We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored/sales data. In analogy to multi-armed bandit problems, the manager needs to simultaneously "explore" and "exploit" with her inventory decisions, in order to minimize the cumulative cost. We make no probabilistic assumptions---importantly, independence or tim… ▽ More

    Submitted 16 October, 2017; originally announced October 2017.

  36. arXiv:1705.10257  [pdf, ps, other

    cs.LG stat.ML

    Boltzmann Exploration Done Right

    Authors: Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu

    Abstract: Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite its widespread use, there is virtually no theoretical understanding about the limitations or the actual benefits of this exploration scheme. Does it drive exploration in a meaningful way? Is it prone to misidentifying the optima… ▽ More

    Submitted 7 November, 2017; v1 submitted 29 May, 2017; originally announced May 2017.

  37. arXiv:1705.07798  [pdf, other

    cs.LG cs.AI stat.ML

    A unified view of entropy-regularized Markov decision processes

    Authors: Gergely Neu, Anders Jonsson, Vicenç Gómez

    Abstract: We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yi… ▽ More

    Submitted 22 May, 2017; originally announced May 2017.

  38. arXiv:1702.08712  [pdf, ps, other

    stat.ML cs.LG

    Algorithmic stability and hypothesis complexity

    Authors: Tongliang Liu, Gábor Lugosi, Gergely Neu, Dacheng Tao

    Abstract: We introduce a notion of algorithmic stability of learning algorithms---that we term \emph{argument stability}---that captures stability of the hypothesis output by the learning algorithm in the normed space of functions from which hypotheses are selected. The main result of the paper bounds the generalization error of any learning algorithm in terms of its argument stability. The bounds are based… ▽ More

    Submitted 3 August, 2017; v1 submitted 28 February, 2017; originally announced February 2017.

  39. arXiv:1702.06341  [pdf, ps, other

    cs.LG math.OC stat.ML

    Fast rates for online learning in Linearly Solvable Markov Decision Processes

    Authors: Gergely Neu, Vicenç Gómez

    Abstract: We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs. In the stationary version of this problem, a learner interacts with its environment by directly controlling the state transitions, attempting to balance a fixed state-dependent cost and a certain smooth cost penalizing extreme control inputs. In the current paper, we consider an online… ▽ More

    Submitted 6 June, 2017; v1 submitted 21 February, 2017; originally announced February 2017.

  40. arXiv:1506.03271  [pdf, other

    cs.LG stat.ML

    Explore no more: Improved high-probability regret bounds for non-stochastic bandits

    Authors: Gergely Neu

    Abstract: This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on exp… ▽ More

    Submitted 3 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

    Comments: To appear at NIPS 2015

  41. arXiv:1503.05087  [pdf, ps, other

    cs.LG stat.ML

    Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

    Authors: Gergely Neu, Gábor Bartók

    Abstract: We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations. Our new method, called Geometric Resampling (GR), is described and analyzed in the context of online combinatorial optimization under semi-bandit feedback, where a learner sequentially selects its actions from a combinat… ▽ More

    Submitted 31 August, 2016; v1 submitted 17 March, 2015; originally announced March 2015.

    Comments: To appear in JMLR

  42. arXiv:1502.06354  [pdf, ps, other

    cs.LG stat.ML

    First-order regret bounds for combinatorial semi-bandits

    Authors: Gergely Neu

    Abstract: We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algori… ▽ More

    Submitted 10 June, 2015; v1 submitted 23 February, 2015; originally announced February 2015.

    Comments: To appear at COLT 2015

  43. arXiv:1406.6812  [pdf, other

    cs.LG stat.ML

    Online learning in MDPs with side information

    Authors: Yasin Abbasi-Yadkori, Gergely Neu

    Abstract: We study online learning of finite Markov decision process (MDP) problems when a side information vector is available. The problem is motivated by applications such as clinical trials, recommendation systems, etc. Such applications have an episodic structure, where each episode corresponds to a patient/customer. Our objective is to compete with the optimal dynamic policy that can take side informa… ▽ More

    Submitted 26 June, 2014; originally announced June 2014.

  44. arXiv:1305.2732  [pdf, ps, other

    cs.LG

    An efficient algorithm for learning with semi-bandit feedback

    Authors: Gergely Neu, Gábor Bartók

    Abstract: We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geomet… ▽ More

    Submitted 13 May, 2013; originally announced May 2013.

    Comments: submitted to ALT 2013

  45. arXiv:1302.5797  [pdf, ps, other

    cs.LG

    Prediction by Random-Walk Perturbation

    Authors: Luc Devroye, Gábor Lugosi, Gergely Neu

    Abstract: We propose a version of the follow-the-perturbed-leader online prediction algorithm in which the cumulative losses are perturbed by independent symmetric random walks. The forecaster is shown to achieve an expected regret of the optimal order O(sqrt(n log N)) where n is the time horizon and N is the number of experts. More importantly, it is shown that the forecaster changes its prediction at most… ▽ More

    Submitted 23 February, 2013; originally announced February 2013.

  46. arXiv:1206.5264  [pdf

    cs.LG stat.ML

    Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

    Authors: Gergely Neu, Csaba Szepesvari

    Abstract: In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's observed behavior. The main difficulty is that the mapping f… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-295-302