Appearance-based gaze estimation aims to predict the 3D eye gaze direction from a single image. While recent deep learning-based approaches have demonstrated excellent performance, they usually assume one calibrated face in each input image and cannot output multi-person gaze in real time. However, simultaneous gaze estimation for multiple people in the wild is necessary for real-world applications. In this paper, we propose the first one-stage end-to-end gaze estimation method, GazeOnce, which is capable of simultaneously predicting gaze directions for multiple faces (>10) in an image. In addition, we design a sophisticated data generation pipeline and propose a new dataset, MPSGaze, which contains full images of multiple people with 3D gaze ground truth. Experimental results demonstrate that our unified framework not only offers a faster speed, but also provides a lower gaze estimation error compared with state-of-the-art methods. This technique can be useful in real-time applications with multiple users.
翻译:基于外观的视觉估计旨在从一个图像中预测 3D 眼睛的视觉方向。 尽管最近的深层次学习方法显示了出色的性能, 它们通常在每个输入图像中假设一个经校准的面孔, 无法实时生成多人凝视。 但是, 现实世界应用程序需要同时对野外多人的视觉估计。 在本文中, 我们提出了第一个一至端的视觉估计方法 GazeOnce, 该方法能够同时预测图像中多面( > 10) 的视觉方向。 此外, 我们设计了一个复杂的数据生成管道, 并提出了一个新的数据集, MPSGaze, 其中包含有3D 视地真理的多个人的完整图像 。 实验结果显示, 我们的统一框架不仅提供了更快的速度, 而且还提供了比最先进的方法更低的视觉估计错误。 这一技术在与多个用户的实时应用中可能有用 。