When deciding how long to keep waiting for delayed rewards that will arrive at an uncertain time, different distributions of possible reward times dictate different optimal strategies for maximizing reward. When reward timing distributions are heavy-tailed (e.g., waiting on hold) there is a point at which waiting is no longer advantageous because the opportunity cost of waiting is too high. Alternatively, when reward timing distributions have more predictable timing (e.g., uniform), it is advantageous to wait as long as necessary for the reward. Although people learn to approximate optimal strategies, little is known about how this learning occurs. One possibility is that people learn a general cognitive representation of the probability distribution that governs reward timing and then infer a strategy from that model of the environment. Another possibility is that they learn an action policy in a way that depends more narrowly on direct task experience, such that general knowledge of the reward timing distribution is insufficient for expressing the optimal strategy. Here, in a series of studies in which participants decided how long to persist for delayed rewards before quitting, we provided participants with information about the reward timing distribution in several ways. Whether the information was provided through counterfactual feedback (Study 1), previous exposure (Studies 2a and 2b), or description (Studies 3a and 3b), it did not obviate the need for direct, feedback-driven learning in a decision context. Therefore, learning when to quit waiting for delayed rewards might depend on task-specific experience, not solely on probabilistic reasoning.
Keywords: Decision making; Delay of gratification; Learning; Persistence; Reward timing.
Copyright © 2023 The Authors. Published by Elsevier B.V. All rights reserved.