Identifying and interpreting fetal standard scan planes during 2-D ultrasound mid-pregnancy examinations are highly complex tasks, which require years of training. Apart from guiding the probe to the correct location, it can be equally difficult for a non-expert to identify relevant structures within the image. Automatic image processing can provide tools to help experienced as well as inexperienced operators with these tasks. In this paper, we propose a novel method based on convolutional neural networks, which can automatically detect 13 fetal standard views in freehand 2-D ultrasound data as well as provide a localization of the fetal structures via a bounding box. An important contribution is that the network learns to localize the target anatomy using weak supervision based on image-level labels only. The network architecture is designed to operate in real-time while providing optimal output for the localization task. We present results for real-time annotation, retrospective frame retrieval from saved videos, and localization on a very large and challenging dataset consisting of images and video recordings of full clinical anomaly screenings. We found that the proposed method achieved an average F1-score of 0.798 in a realistic classification experiment modeling real-time detection, and obtained a 90.09% accuracy for retrospective frame retrieval. Moreover, an accuracy of 77.8% was achieved on the localization task.