Camera-based physiological measurement is a growing field with neural models providing state-the-art-performance. Prior research have explored various ``end-to-end'' models; however these methods still require several preprocessing steps. These additional operations are often non-trivial to implement making replication and deployment difficult and can even have a higher computational budget than the ``core'' network itself. In this paper, we propose two novel and efficient neural models for camera-based physiological measurement called EfficientPhys that remove the need for face detection, segmentation, normalization, color space transformation or any other preprocessing steps. Using an input of raw video frames, our models achieve state-of-the-art accuracy on three public datasets. We show that this is the case whether using a transformer or convolutional backbone. We further evaluate the latency of the proposed networks and show that our most light weight network also achieves a 33% improvement in efficiency.
翻译:以相机为基础的生理测量是一个不断增长的领域,神经模型可以提供状态艺术性能。 先前的研究已经探索了各种“ 端到端” 模型; 但是,这些方法仍然需要若干预处理步骤。 这些额外的操作往往非三重性,难以实施复制和部署,甚至可以比“ 核心” 网络本身有更高的计算预算。 在本文中,我们提出了两个新型高效的基于相机的生理测量神经模型,叫做“ 高效的物理”模型,可以消除面部检测、分解、正常化、色彩空间转换或任何其他预处理步骤的需要。 通过原始视频框架的输入,我们的模型可以在三个公共数据集上实现最新准确性。 我们表明,无论使用变压器还是“ 核心” 网络都是如此。 我们进一步评估了拟议网络的长度,并表明我们最轻重的网络在效率方面也实现了33%的提高。