Gaze estimation, which is a method to determine where a person is looking at given the person's full face, is a valuable clue for understanding human intention. Similarly to other domains of computer vision, deep learning (DL) methods have gained recognition in the gaze estimation domain. However, there are still gaze calibration problems in the gaze estimation domain, thus preventing existing methods from further improving the performances. An effective solution is to directly predict the difference information of two human eyes, such as the differential network (Diff-Nn). However, this solution results in a loss of accuracy when using only one inference image. We propose a differential residual model (DRNet) combined with a new loss function to make use of the difference information of two eye images. We treat the difference information as auxiliary information. We assess the proposed model (DRNet) mainly using two public datasets (1) MpiiGaze and (2) Eyediap. Considering only the eye features, DRNet outperforms the state-of-the-art gaze estimation methods with $angular-error$ of 4.57 and 6.14 using MpiiGaze and Eyediap datasets, respectively. Furthermore, the experimental results also demonstrate that DRNet is extremely robust to noise images.
翻译:Gaze估计是确定一个人从他满脸的脸上看的地方的一种方法,是了解人类意图的宝贵线索。与计算机视觉的其他领域一样,深学习(DL)方法在凝视估计域中也得到了承认。然而,在凝视估计域中仍然存在着凝视校准问题,从而阻止了现有方法进一步改进性能。一个有效的解决办法是直接预测两种人眼睛的差别信息,例如差异网络(Diff-Nnn)。然而,如果只使用一种推断图像,这种解决办法就会造成准确性损失。我们建议使用差分残余模型(DRNet)和新的损失功能,以利用两种眼睛图像的差别信息。我们把差异信息视为辅助信息。我们主要使用两种公共数据集评估拟议的模型(DRNet) (1) MpiiGaze 和 (2) Eyediap 。只考虑眼特征,DRNet以4.57美元和6.14美元作为最新视觉估计方法的准确性。我们提议使用MpiiGze和EyediNet分别显示极强的图像。