Eye gaze is an important non-verbal cue for human affect analysis. Recent gaze estimation work indicated that information from the full face region can benefit performance. Pushing this idea further, we propose an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input. Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps to flexibly suppress or enhance information in different facial regions. Through extensive evaluation, we show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation, achieving improvements of up to 14.3% on MPIIGaze and 27.7% on EYEDIAP for person-independent 3D gaze estimation. We further show that this improvement is consistent across different illumination conditions and gaze directions and particularly pronounced for the most challenging extreme head poses.
翻译:眼视是人类影响分析的一个重要非语言提示。 最近的眼视估计工作显示, 来自全脸区域的信息可以有利于业绩。 进一步推动这一想法,我们建议一种外观方法,与计算机视觉的长期工作相比,仅将全脸图像作为输入。 我们的方法将脸部图像编码成一个革命性神经网络,在地貌图上应用空间重量来灵活压制或加强不同面部区域的信息。 通过广泛评估,我们显示我们的全脸方法大大优于2D和3D视觉估计的艺术水平,在MPIIGaze和EYEDIAP上改进最高达14.3%,在依靠3D视觉的人的视觉估计中改进27.7%。 我们还进一步表明,这种改进在不同的不光彩状况和视觉方向上是一致的,对于最具挑战性的极端头部的外表特别明显。