Gaze estimation reveals where a person is looking. It is an important clue for understanding human intention. The recent development of deep learning has revolutionized many computer vision tasks, the appearance-based gaze estimation is no exception. However, it lacks a guideline for designing deep learning algorithms for gaze estimation tasks. In this paper, we present a comprehensive review of the appearance-based gaze estimation methods with deep learning. We summarize the processing pipeline and discuss these methods from four perspectives: deep feature extraction, deep neural network architecture design, personal calibration as well as device and platform. Since the data pre-processing and post-processing methods are crucial for gaze estimation, we also survey face/eye detection method, data rectification method, 2D/3D gaze conversion method, and gaze origin conversion method. To fairly compare the performance of various gaze estimation approaches, we characterize all the publicly available gaze estimation datasets and collect the code of typical gaze estimation algorithms. We implement these codes and set up a benchmark of converting the results of different methods into the same evaluation metrics. This paper not only serves as a reference to develop deep learning-based gaze estimation methods but also a guideline for future gaze estimation research. Implemented methods and data processing codes are available at http://phi-ai.org/GazeHub.
翻译:Gaze估计揭示了一个人的视线,这是了解人类意图的重要线索。最近深层次学习的发展使许多计算机的视觉任务发生了革命性的变化,外观视觉估计也不例外。然而,它缺乏设计视觉估计任务的深层次学习算法的指导方针。在本文件中,我们用深刻的学习,对以外观为基础的视觉估计方法进行全面审查。我们从四个角度总结了处理管道并讨论这些方法:深特征提取、深神经网络结构设计、个人校准以及装置和平台。由于数据预处理和后处理方法对于视觉估计至关重要,我们还调查面部/眼部探测方法、数据校正解方法、2D/3D目视转换方法和视觉源转换方法。为了比较各种视觉估计方法的性能,我们描述所有公开提供的视觉估计数据集并收集典型的视觉估计算法的代码。我们执行这些代码,并设定了将不同方法的结果转换为同一评价指标的基准。由于数据处理和G-H 正在使用的数据代码。