Video-based remote physiological measurement aims to estimate remote photoplethysmography (rPPG) signals from human face videos and then measure multiple vital signs (e.g. heart rate, respiration frequency) from rPPG signals. Recent approaches achieve it by training deep neural networks, which normally require abundant face videos and synchronously recorded photoplethysmography (PPG) signals for supervision. However, the collection of these annotated corpora is uneasy in practice. In this paper, we introduce a novel frequency-inspired self-supervised framework that learns to estimate rPPG signals from face videos without the need of ground truth PPG signals. Given a video sample, we first augment it into multiple positive/negative samples which contain similar/dissimilar signal frequencies to the original one. Specifically, positive samples are generated using spatial augmentation. Negative samples are generated via a learnable frequency augmentation module, which performs non-linear signal frequency transformation on the input without excessively changing its visual appearance. Next, we introduce a local rPPG expert aggregation module to estimate rPPG signals from augmented samples. It encodes complementary pulsation information from different face regions and aggregate them into one rPPG prediction. Finally, we propose a series of frequency-inspired losses, i.e. frequency contrastive loss, frequency ratio consistency loss, and cross-video frequency agreement loss, for the optimization of estimated rPPG signals from multiple augmented video samples and across temporally neighboring video samples. We conduct rPPG-based heart rate, heart rate variability and respiration frequency estimation on four standard benchmarks. The experimental results demonstrate that our method improves the state of the art by a large margin.
翻译:以视频为基础的远程生理测量,目的是估计人类脸部视频的远程光肿成像仪信号,然后从 RPPG 信号中测量多个关键信号(例如心脏率、呼吸频率),最近的方法是培训深神经网络,通常需要大量的面部视频和同步记录的光肿成像仪信号,以进行监督。然而,这些附加说明的子公司收集工作在实践中并不稳定。在本文中,我们引入了一个新的频率激励自上而下的自我监督框架,在不需要地面真知光谱信号的情况下,从脸部视频中估算 RPPG 信号(例如心脏率、呼吸频率频率频率频率频率频率),然后通过视频样本样本样本样本样本的多个正/负性样本,我们首先将其放大为多个正/异性信号样本,具体地,通过可学习的频率增强模块生成负面样本,对投入进行非线性信号频率转换,而其视觉外观外观外观。接下来,我们引入一个当地 RPPG 专家汇总模块模块模块,从增强的 RPG 样本中估算 RPP 大幅递增压率率,我们最后的频率 预估测测测测算 4 的频率损失 。