Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

Ramesh, Aditya; Rauber, Paulo; Conserva, Michelangelo; Schmidhuber, Jürgen

doi:10.1162/neco_a_01539

Computer Science > Machine Learning

arXiv:2007.04750 (cs)

[Submitted on 9 Jul 2020 (v1), last revised 3 Nov 2023 (this version, v2)]

Title:Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

Authors:Aditya Ramesh, Paulo Rauber, Michelangelo Conserva, Jürgen Schmidhuber

View PDF

Abstract:An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical context may introduce spurious relationships or lack a convenient representation of crucial information. In order to address these issues, we propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling. Our experiments on a diverse selection of contextual and noncontextual nonstationary problems show that our recurrent approach consistently outperforms its feedforward counterpart, which requires handcrafted historical contexts, while being more widely applicable than conventional nonstationary bandit algorithms. Although it is very difficult to provide theoretical performance guarantees for our new approach, we also prove a novel regret bound for linear posterior sampling with measurement error that may serve as a foundation for future theoretical work.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2007.04750 [cs.LG]
	(or arXiv:2007.04750v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2007.04750
Journal reference:	Neural Computation. 2022 Oct 7;34(11):2232-72
Related DOI:	https://doi.org/10.1162/neco_a_01539

Submission history

From: Aditya Ramesh [view email]
[v1] Thu, 9 Jul 2020 12:46:51 UTC (4,770 KB)
[v2] Fri, 3 Nov 2023 11:12:12 UTC (7,056 KB)

Computer Science > Machine Learning

Title:Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators