Recent supervised deep learning methods have shown that heart rate can be measured remotely using facial videos. However, the performance of these supervised method are dependent on the availability of large-scale labelled data and they have been limited to 2D deep learning architectures that do not fully exploit the 3D spatiotemporal information. To solve this problem, we present a novel 3D self-supervised spatiotemporal learning framework for remote HR estimation on facial videos. Concretely, we propose a landmark-based spatial augmentation which splits the face into several informative parts based on the Shafer's dichromatic reflection model and a novel sparsity-based temporal augmentation exploiting Nyquist-Shannon sampling theorem to enhance the signal modelling ability. We evaluated our method on 3 public datasets and outperformed other self-supervised methods and achieved competitive accuracy with the state-of-the-art supervised methods.
翻译:最近经过监督的深层学习方法显示,心率可以通过面部视频进行远程测量,然而,这些受监督方法的性能取决于大规模贴标签数据的可用性,而且仅限于2D深层学习结构,这些结构没有充分利用3D空间时空信息。为解决这一问题,我们提出了一个3D自监督的自我时空学习新颖框架,用于对面部视频进行远程HR估计。具体地说,我们提议基于里程碑的空间扩增,根据Shafer的双色反射模型,将面部分割成若干内容丰富的部分,以及一个利用Nyquist-Shannon采样信号建模能力的新超时增速模型。我们评估了我们关于3个公共数据集的方法,并超越了其他自我监督方法,并以最先进的监督方法实现了竞争性的准确性。