Model-Free Risk-Sensitive Reinforcement Learning

Delétang, Grégoire; Grau-Moya, Jordi; Kunesch, Markus; Genewein, Tim; Brekelmans, Rob; Legg, Shane; Ortega, Pedro A.

Computer Science > Machine Learning

arXiv:2111.02907 (cs)

[Submitted on 4 Nov 2021]

Title:Model-Free Risk-Sensitive Reinforcement Learning

Authors:Grégoire Delétang, Jordi Grau-Moya, Markus Kunesch, Tim Genewein, Rob Brekelmans, Shane Legg, Pedro A. Ortega

View PDF

Abstract:We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free reinforcement learning algorithms. This extension can be regarded as modification of the Rescorla-Wagner rule, where the (sigmoidal) stimulus is taken to be either the event of over- or underestimating the TD target. As a result, one obtains a stochastic approximation rule for estimating the free energy from i.i.d. samples generated by a Gaussian distribution with unknown mean and variance. Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.

Comments:	DeepMind Tech Report: 13 pages, 4 figures
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2111.02907 [cs.LG]
	(or arXiv:2111.02907v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2111.02907

Submission history

From: Pedro Alejandro Ortega [view email]
[v1] Thu, 4 Nov 2021 14:27:46 UTC (2,101 KB)

Computer Science > Machine Learning

Title:Model-Free Risk-Sensitive Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Model-Free Risk-Sensitive Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators