Human head pose estimation is an essential problem in facial analysis in recent years that has a lot of computer vision applications such as gaze estimation, virtual reality, and driver assistance. Because of the importance of the head pose estimation problem, it is necessary to design a compact model to resolve this task in order to reduce the computational cost when deploying on facial analysis-based applications such as large camera surveillance systems, AI cameras while maintaining accuracy. In this work, we propose a lightweight model that effectively addresses the head pose estimation problem. Our approach has two main steps. 1) We first train many teacher models on the synthesis dataset - 300W-LPA to get the head pose pseudo labels. 2) We design an architecture with the ResNet18 backbone and train our proposed model with the ensemble of these pseudo labels via the knowledge distillation process. To evaluate the effectiveness of our model, we use AFLW-2000 and BIWI - two real-world head pose datasets. Experimental results show that our proposed model significantly improves the accuracy in comparison with the state-of-the-art head pose estimation methods. Furthermore, our model has the real-time speed of $\sim$300 FPS when inferring on Tesla V100.
翻译:人类头部估计是近年来面部分析中的一个基本问题。 面部估计是近些年来在面部分析中的一个基本问题。 面部分析有许多计算机视觉应用软件,如视觉估计、虚拟现实和驱动器协助。 由于头部具有估计问题的重要性,有必要设计一个紧凑模型来解决这项任务,以便在部署面部分析应用软件时降低计算成本,如大型摄影监视系统、AI照相机,同时保持准确性。 在这项工作中,我们提出了一个能有效解决头部构成估计问题的轻量模型。 我们的方法有两个主要步骤。 (1) 我们首先在合成数据集上培训许多教师模型- 300W-LPA, 以获得头部假标签。 (2) 我们设计了一个有ResNet18主干线的建筑,并通过知识蒸馏过程用这些假标签的组合来培训我们提议的模型。 为了评估我们的模型的有效性,我们使用了AFLW-2000和BIWI-两个真实世界头部构成数据集。实验结果显示,我们提议的模型大大改进了与艺术头部估计方法相比较的准确性。 此外,我们的模型在FPS-300美元时,我们的模型具有实时速度。