A new study in reinforcement learning theory shows that extending the temporal difference algorithm to unbiased learning under state uncertainty explains the observed ramping behaviour of dopamine neurons.
Crown Copyright © 2022. Published by Elsevier Inc. All rights reserved.