Recent evidence indicates that humans can precisely predict the outcome of occluded actions. It has been suggested that these predictions arise from a mental simulation which might run in real-time. The present experiments aimed to specify the time course of this simulation process. Participants watched transiently occluded point-light actions and the temporal outcome after occlusion was manipulated. Participants were instructed to judge the temporal coherence of the action after a short (Experiment 1) and a long occlusion period (Experiment 2). Both experiments revealed a comparable negative point of subjective equality (PSE), indicating that action simulation took constantly longer than the observed action itself. Such a temporal error was not present when inverted actions were used, (Experiment 3) ruling out a pure visually driven effect. The results suggest that the temporal error is due to costs arising from a switch from action perception to an internal simulation process involving motor representations.