Heartbeat rhythm and heart rate (HR) are important physiological parameters of the human body. This study presents an efficient multi-hierarchical spatio-temporal convolutional network that can quickly estimate remote physiological (rPPG) signal and HR from face video clips. First, the facial color distribution characteristics are extracted using a low-level face feature generation (LFFG) module. Then, the three-dimensional (3D) spatio-temporal stack convolution module (STSC) and multi-hierarchical feature fusion module (MHFF) are used to strengthen the spatio-temporal correlation of multi-channel features. In the MHFF, sparse optical flow is used to capture the tiny motion information of faces between frames and generate a self-adaptive region of interest (ROI) skin mask. Finally, the signal prediction module (SP) is used to extract the estimated rPPG signal. The heart rate estimation results show that the proposed network overperforms the state-of-the-art methods on three datasets, 1) UBFC-RPPG, 2) COHFACE, 3) our dataset, with the mean absolute error (MAE) of 2.15, 5.57, 1.75 beats per minute (bpm) respectively.
翻译:心跳节律和心率(HR)是人体重要的生理参数。本文提出了一种高效的多层次时空卷积网络,可以快速从面部视频片段估计远程生理(rPPG)信号和HR。首先,使用低层面部特征生成(LFFG)模块提取面部颜色分布特征。然后,采用三维时空堆叠卷积模块(STSC)和多层次特征融合模块(MHFF)来增强多通道特征的时空相关性。在MHFF中,使用稀疏光流捕捉帧间面部微小动作信息,并生成自适应感兴趣区域(ROI)皮肤掩模。最后,使用信号预测模块(SP)提取估计的rPPG信号。心率估计结果表明,所提出的网络在三个数据集(1)UBFC-RPPG,(2)COHFACE,(3)我们的数据集上都优于现有方法,平均绝对误差(MAE)分别为2.15、5.57、1.75 bpm。