Search | arXiv e-print repository

Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning

Authors: Otmane Sakhi, Imad Aouali, Pierre Alquier, Nicolas Chopin

Abstract: This work investigates the offline formulation of the contextual bandit problem, where the goal is to leverage past interactions collected under a behavior policy to evaluate, select, and learn new, potentially better-performing, policies. Motivated by critical applications, we move beyond point estimators. Instead, we adopt the principle of pessimism where we construct upper bounds that assess a… ▽ More This work investigates the offline formulation of the contextual bandit problem, where the goal is to leverage past interactions collected under a behavior policy to evaluate, select, and learn new, potentially better-performing, policies. Motivated by critical applications, we move beyond point estimators. Instead, we adopt the principle of pessimism where we construct upper bounds that assess a policy's worst-case performance, enabling us to confidently select and learn improved policies. Precisely, we introduce novel, fully empirical concentration bounds for a broad class of importance weighting risk estimators. These bounds are general enough to cover most existing estimators and pave the way for the development of new ones. In particular, our pursuit of the tightest bound within this class motivates a novel estimator (LS), that logarithmically smooths large importance weights. The bound for LS is provably tighter than all its competitors, and naturally results in improved policy selection and learning strategies. Extensive policy evaluation, selection, and learning experiments highlight the versatility and favorable performance of LS. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2308.01566 [pdf, other]

Fast Slate Policy Optimization: Going Beyond Plackett-Luce

Authors: Otmane Sakhi, David Rohde, Nicolas Chopin

Abstract: An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the o… ▽ More An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions. △ Less

Submitted 29 December, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: Transactions on Machine Learning Research

arXiv:2306.15422 [pdf, other]

Debiasing Piecewise Deterministic Markov Process samplers using couplings

Authors: Adrien Corenflos, Matthew Sutton, Nicolas Chopin

Abstract: Monte Carlo methods - such as Markov chain Monte Carlo (MCMC) and piecewise deterministic Markov process (PDMP) samplers - provide asymptotically exact estimators of expectations under a target distribution. There is growing interest in alternatives to this asymptotic regime, in particular in constructing estimators that are exact in the limit of an infinite amount of computing processors, rather… ▽ More Monte Carlo methods - such as Markov chain Monte Carlo (MCMC) and piecewise deterministic Markov process (PDMP) samplers - provide asymptotically exact estimators of expectations under a target distribution. There is growing interest in alternatives to this asymptotic regime, in particular in constructing estimators that are exact in the limit of an infinite amount of computing processors, rather than in the limit of an infinite number of Markov iterations. In particular, Jacob et al. (2020) introduced coupled MCMC estimators to remove the non-asymptotic bias, resulting in MCMC estimators that can be embarrassingly parallelised. In this work, we extend the estimators of Jacob et al. (2020) to the continuous-time context and derive couplings for the bouncy, the boomerang and the coordinate samplers. Some preliminary empirical results are included that demonstrate the reasonable scaling of our method with the dimension of the target. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 30 pages, 3 figures. This is a preliminary version which does not include all the experiments

arXiv:2210.13132 [pdf, other]

PAC-Bayesian Offline Contextual Bandits With Guarantees

Authors: Otmane Sakhi, Pierre Alquier, Nicolas Chopin

Abstract: This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to opt… ▽ More This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to optimize them. We prove that the derived bounds are tighter than their competitors, and can be optimized directly to confidently improve upon the logging policy offline. Our approach learns policies with guarantees, uses all available data and does not require tuning additional hyperparameters on held-out sets. We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios. △ Less

Submitted 27 May, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: Accepted to ICML 2023

arXiv:2206.03369 [pdf, other]

Computational Doob's h-transforms for Online Filtering of Discretely Observed Diffusions

Authors: Nicolas Chopin, Andras Fulop, Jeremy Heng, Alexandre H. Thiery

Abstract: This paper is concerned with online filtering of discretely observed nonlinear diffusion processes. Our approach is based on the fully adapted auxiliary particle filter, which involves Doob's $h$-transforms that are typically intractable. We propose a computational framework to approximate these $h$-transforms by solving the underlying backward Kolmogorov equations using nonlinear Feynman-Kac form… ▽ More This paper is concerned with online filtering of discretely observed nonlinear diffusion processes. Our approach is based on the fully adapted auxiliary particle filter, which involves Doob's $h$-transforms that are typically intractable. We propose a computational framework to approximate these $h$-transforms by solving the underlying backward Kolmogorov equations using nonlinear Feynman-Kac formulas and neural networks. The methodology allows one to train a locally optimal particle filter prior to the data-assimilation procedure. Numerical experiments illustrate that the proposed approach can be orders of magnitude more efficient than state-of-the-art particle filters in the regime of highly informative observations, when the observations are extreme under the model, or if the state dimension is large. △ Less

Submitted 30 May, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

Comments: 20 pages

Journal ref: ICML 2023

arXiv:2202.02264 [pdf, ps, other]

De-Sequentialized Monte Carlo: a parallel-in-time particle smoother

Authors: Adrien Corenflos, Nicolas Chopin, Simo Särkkä

Abstract: Particle smoothers are SMC (Sequential Monte Carlo) algorithms designed to approximate the joint distribution of the states given observations from a state-space model. We propose dSMC (de-Sequentialized Monte Carlo), a new particle smoother that is able to process $T$ observations in $\mathcal{O}(\log T)$ time on parallel architecture. This compares favourably with standard particle smoothers, th… ▽ More Particle smoothers are SMC (Sequential Monte Carlo) algorithms designed to approximate the joint distribution of the states given observations from a state-space model. We propose dSMC (de-Sequentialized Monte Carlo), a new particle smoother that is able to process $T$ observations in $\mathcal{O}(\log T)$ time on parallel architecture. This compares favourably with standard particle smoothers, the complexity of which is linear in $T$. We derive $\mathcal{L}_p$ convergence results for dSMC, with an explicit upper bound, polynomial in $T$. We then discuss how to reduce the variance of the smoothing estimates computed by dSMC by (i) designing good proposal distributions for sampling the particles at the initialization of the algorithm, as well as by (ii) using lazy resampling to increase the number of particles used in dSMC. Finally, we design a particle Gibbs sampler based on dSMC, which is able to perform parameter inference in a state-space model at a $\mathcal{O}(\log(T))$ cost on parallel hardware. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: 31 pages, 6 figures

arXiv:1702.00564 [pdf, other]

Modelling dependency completion in sentence comprehension as a Bayesian hierarchical mixture process: A case study involving Chinese relative clauses

Authors: Shravan Vasishth, Nicolas Chopin, Robin Ryder, Bruno Nicenboim

Abstract: We present a case-study demonstrating the usefulness of Bayesian hierarchical mixture modelling for investigating cognitive processes. In sentence comprehension, it is widely assumed that the distance between linguistic co-dependents affects the latency of dependency resolution: the longer the distance, the longer the retrieval time (the distance-based account). An alternative theory, direct-acces… ▽ More We present a case-study demonstrating the usefulness of Bayesian hierarchical mixture modelling for investigating cognitive processes. In sentence comprehension, it is widely assumed that the distance between linguistic co-dependents affects the latency of dependency resolution: the longer the distance, the longer the retrieval time (the distance-based account). An alternative theory, direct-access, assumes that retrieval times are a mixture of two distributions: one distribution represents successful retrievals (these are independent of dependency distance) and the other represents an initial failure to retrieve the correct dependent, followed by a reanalysis that leads to successful retrieval. We implement both models as Bayesian hierarchical models and show that the direct-access model explains Chinese relative clause reading time data better than the distance account. △ Less

Submitted 5 May, 2017; v1 submitted 2 February, 2017; originally announced February 2017.

Comments: 6 pages, 2 figures. To appear in the Proceedings of the Cognitive Science Conference 2017, London, UK

arXiv:1211.5901 [pdf, ps, other]

Bayesian learning of noisy Markov decision processes

Authors: Sumeetpal S. Singh, Nicolas Chopin, Nick Whiteley

Abstract: We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about… ▽ More We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller. △ Less

Submitted 26 November, 2012; originally announced November 2012.

arXiv:1205.4304 [pdf, ps, other]

In praise of the referee

Authors: Nicolas Chopin, Andrew Gelman, Kerrie L. Mengersen, Christian P. Robert

Abstract: There has been a lively debate in many fields, including statistics and related applied fields such as psychology and biomedical research, on possible reforms of the scholarly publishing system. Currently, referees contribute so much to improve scientific papers, both directly through constructive criticism and indirectly through the threat of rejection. We discuss ways in which new approaches to… ▽ More There has been a lively debate in many fields, including statistics and related applied fields such as psychology and biomedical research, on possible reforms of the scholarly publishing system. Currently, referees contribute so much to improve scientific papers, both directly through constructive criticism and indirectly through the threat of rejection. We discuss ways in which new approaches to journal publication could continue to make use of the valuable efforts of peer reviewers. △ Less

Submitted 19 May, 2012; originally announced May 2012.

Comments: 13 pages

Showing 1–9 of 9 results for author: Chopin, N