Dry weight (DW), defined as the lowest tolerated postdialysis weight following the ultrafiltration (UF) of excess fluid volume, is essential for any dialysis prescription for hemodialysis (HD) patients. However, there is no gold standard for DW assessment, and the difficulty of its accurate assessment increases given individual variations and the dynamic changes caused by the uncertainty of patients' condition. Therefore, the current empirical evaluation process is often crude, imprecise, experience-dependent, and energy-consuming. Here, we highlight the personalized dynamic changes in DW over time rather than the more accurate DW assessments at some point in time and formulate the DW evaluation problem into a sequential decision-making process using the Markov decision process (MDP) framework. A reinforcement learning (RL) algorithm based on a dueling double deep Q-network (Duel-DDQN) is proposed to optimize the DW assessment policy, and a multifaceted inspection is applied to assess policy effectiveness and safety. We utilize ten years of data from the Kidney Disease Center, enrolling 750 HD patients and 243,287 dialysis sessions. Good model calibration is confirmed, and off-policy evaluation demonstrates that our policy outperforms other policies, suggesting a decrease of 7.71% in the expected 5-year mortality rate and of 13.44% in the incidence of intradialytic symptoms compared with those of clinicians' strategy. The RL policy adjusts DW more frequently, responds to DW changes more actively, and observes a larger feature space. It is hoped that the proposed solution will help clinicians assess and monitor DW dynamically, making the estimation process more refined, personalized, and intelligent.