The primary objective of interactive medical image segmentation systems is to achieve more precise segmentation outcomes with reduced human intervention. This endeavor holds significant clinical importance for both pre-diagnostic pathological assessments and prognostic recovery. Among the various interaction methods available, click-based interactions stand out as an intuitive and straightforward approach compared to alternatives such as graffiti, bounding boxes, and extreme points. To improve the model's ability to interpret click-based interactions, we propose a comprehensive interactive segmentation framework that leverages an iterative weighted loss function based on user clicks. To enhance the segmentation capabilities of the Plain-ViT backbone, we introduce a Residual Multi-Headed Self-Attention encoder with hierarchical inputs and residual connections, offering multiple perspectives on the data. This innovative architecture leads to a remarkable improvement in segmentation model performance. In this research paper, we assess the robustness of our proposed framework using a self-compiled T2-MRI image dataset of the prostate and three publicly available datasets containing images of other organs. Our experimental results convincingly demonstrate that our segmentation model surpasses existing state-of-the-art methods. Furthermore, the incorporation of an iterative loss function training strategy significantly accelerates the model's convergence rate during interactions. In the prostate dataset, we achieved an impressive Intersection over Union (IoU) score of 88.11% and Number of Clicks(NoC) at 80% are 7.03 clicks.