Model-free Posterior Sampling via Learning Rate Randomization

Tiapkin, Daniil; Belomestny, Denis; Calandriello, Daniele; Moulines, Eric; Munos, Remi; Naumov, Alexey; Perrault, Pierre; Valko, Michal; Menard, Pierre

Statistics > Machine Learning

arXiv:2310.18186 (stat)

[Submitted on 27 Oct 2023]

Title:Model-free Posterior Sampling via Learning Rate Randomization

Authors:Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

View PDF

Abstract:In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieves a regret bound of order $\widetilde{\mathcal{O}}(\sqrt{H^{5}SAT})$, where $H$ is the planning horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the number of episodes. For a metric state-action space, RandQL enjoys a regret bound of order $\widetilde{\mathcal{O}}(H^{5/2} T^{(d_z+1)/(d_z+2)})$, where $d_z$ denotes the zooming dimension. Notably, RandQL achieves optimistic exploration without using bonuses, relying instead on a novel idea of learning rate randomization. Our empirical study shows that RandQL outperforms existing approaches on baseline exploration environments.

Comments:	NeurIPS-2023
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2310.18186 [stat.ML]
	(or arXiv:2310.18186v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2310.18186

Submission history

From: Daniil Tiapkin [view email]
[v1] Fri, 27 Oct 2023 14:59:44 UTC (24,706 KB)

Statistics > Machine Learning

Title:Model-free Posterior Sampling via Learning Rate Randomization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Model-free Posterior Sampling via Learning Rate Randomization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators