As a critical cue for understanding human intention, human gaze provides a key signal for Human-Computer Interaction(HCI) applications. Appearance-based gaze estimation, which directly regresses the gaze vector from eye images, has made great progress recently based on Convolutional Neural Networks(ConvNets) architecture and open-source large-scale gaze datasets. However, encoding model-based knowledge into CNN model to further improve the gaze estimation performance remains a topic that needs to be explored. In this paper, we propose HybridGazeNet(HGN), a unified framework that encodes the geometric eyeball model into the appearance-based CNN architecture explicitly. Composed of a multi-branch network and an uncertainty module, HybridGazeNet is trained using a hyridized strategy. Experiments on multiple challenging gaze datasets shows that HybridGazeNet has better accuracy and generalization ability compared with existing SOTA methods. The code will be released later.
翻译:作为理解人类意图的关键提示,人类凝视为人类-计算机互动(HCI)应用提供了一个关键信号。基于视觉的视觉估计直接从视觉图像中倒退了视向矢量,最近根据进化神经网络(ConvNets)架构和开放源码大型凝视数据集取得了巨大进展。然而,将基于模型的知识编码为CNN模型,以进一步改进视觉估计性能,仍是一个需要探讨的议题。在本文件中,我们提议采用混合GazeNet(HGN),这是一个将几何眼球模型明确编码成基于外观CNN结构的统一框架。混合GazeNet是一个多分支网络和不确定性模块的组合,使用一种环球化战略进行了培训。对多重具有挑战性的视觉数据集的实验表明,与现有的SOTA方法相比,混合GazeNet的准确性和概括性能力更高。该代码将在日后发布。