Heart and respiration rates represent important vital signs for the assessment of a person's health condition. To estimate these vital signs accurately, we propose a multitask Siamese network model (MTS) that combines the advantages of the Siamese network and the multitask learning architecture. The MTS model was trained by the images of the cheek including nose and mouth and forehead areas while sharing the same parameters between the Siamese networks, in order to extract the features about the heart and respiratory information. The proposed model was constructed with a small number of parameters and was able to yield a high vital-sign-prediction accuracy, comparable to that obtained from the single-task learning model; furthermore, the proposed model outperformed the conventional multitask learning model. As a result, we can simultaneously predict the heart and respiratory signals with the MTS model, while the number of parameters was reduced by 16 times with the mean average errors of heart and respiration rates being 2.84 and 4.21. Owing to its light weight, it would be advantageous to implement the vital-sign-monitoring model in an edge device such as a mobile phone or small-sized portable devices.
Keywords: Siamese network; contactless technique; deep learning; heart rate; multitasking; respiration rate.