Neutron reflectometry has long been a powerful tool to study the interfacial properties of energy materials. Recently, time-resolved neutron reflectometry has been used to better understand transient phenomena in electrochemical systems. Those measurements often comprise a large number of reflectivity curves acquired over a narrow q range, with each individual curve having lower information content compared to a typical steady-state measurement. In this work, we present an approach that leverages existing reinforcement learning tools to model time-resolved data to extract the time evolution of structure parameters. By mapping the reflectivity curves taken at different times as individual states, we use the Soft Actor-Critic algorithm to optimize the time series of structure parameters that best represent the evolution of an electrochemical system. We show that this approach constitutes an elegant solution to the modeling of time-resolved neutron reflectometry data.