We present a deep learning-based multi-task approach for head pose estimation in images. We contribute with a network architecture and training strategy that harness the strong dependencies among face pose, alignment and visibility, to produce a top performing model for all three tasks. Our architecture is an encoder-decoder CNN with residual blocks and lateral skip connections. We show that the combination of head pose estimation and landmark-based face alignment significantly improve the performance of the former task. Further, the location of the pose task at the bottleneck layer, at the end of the encoder, and that of tasks depending on spatial information, such as visibility and alignment, in the final decoder layer, also contribute to increase the final performance. In the experiments conducted the proposed model outperforms the state-of-the-art in the face pose and visibility tasks. By including a final landmark regression step it also produces face alignment results on par with the state-of-the-art.
翻译:我们为头部提出了一种基于深层次学习的多任务方法,在图像中进行估计。我们借助一种网络架构和培训战略,利用面部成形、对齐和可见度之间的强大依赖性,为这三项任务制作出一种最优秀的模型。我们的架构是一个带有残余区块和横向跳过连接的编码解码器CNN。我们表明,头部的估算和基于里程碑的面部对齐相结合,大大改善了前一项任务的绩效。此外,在编码器末端的瓶颈层,以及取决于空间信息的任务的位置,如在最后解码层的可见度和对齐,也有助于提高最后的性能。在进行实验时,拟议的模型超越了面部的状态和可见性任务。通过纳入一个最终的里程碑式回归步骤,它还产生与最新技术一样的面部一致结果。