Non-contact heart rate (HR) measurement from facial videos has attracted high interests due to its convenience and cost effectiveness. However, accurate and robust HR estimation under various realistic scenarios remain a very challenging problem. In this paper, we develop a novel system which can achieve a robust and accurate HR estimation under those challenging scenarios. First, to minimize tracking-artifacts arising from large head motions and facial expressions, we propose a joint face detection and alignment method which can produce alignment-friendly facial bounding boxes with reliable initial facial shapes, facilitating accurate and robust face alignment even in the presence of large pose variations and expressions. Second, different from most existing methods [1-5] which derive pulse signals from predetermined grid cells (i.e. local patches), our patches are varying-sized triangles generated adaptively to exclude negative effects from non-rigid facial motions. Third, we propose an adaptive patch selection method to choose patches which contain skin regions and are more likely to contain useful information, followed by an independent component analysis, for an accurate HR estimate. Extensive experiments on both public datasets and our own dataset demonstrated that, comparing with the state-of-the-art methods [1-3], our method reduces the root mean square error (RMSE) by a large margin, ranging from 12% to 63%, and can provide a robust and accurate estimation under various challenging scenarios.