The use of deep learning methods has dramatically increased the state-of-the-art performance in image object localization. However, commonly used supervised learning methods require large training datasets with pixel-level or bounding box annotations. Obtaining such fine-grained annotations is extremely costly, especially in the medical imaging domain. In this work, we propose a novel weakly supervised method for breast cancer localization. The essential advantage of our approach is that the model only requires image-level labels and uses a self-training strategy to refine the predicted localization in a step-wise manner. We evaluated our approach on a large, clinically relevant mammogram dataset. The results show that our model significantly improves performance compared to other methods trained similarly.