Gaze estimation is of great importance to many scientific fields and daily applications, ranging from fundamental research in cognitive psychology to attention-aware mobile systems. While recent advancements in deep learning have yielded remarkable successes in building highly accurate gaze estimation systems, the associated high computational cost and the reliance on large-scale labeled gaze data for supervised learning place challenges on the practical use of existing solutions. To move beyond these limitations, we present FreeGaze, a resource-efficient framework for unsupervised gaze representation learning. FreeGaze incorporates the frequency domain gaze estimation and the contrastive gaze representation learning in its design. The former significantly alleviates the computational burden in both system calibration and gaze estimation, and dramatically reduces the system latency; while the latter overcomes the data labeling hurdle of existing supervised learning-based counterparts, and ensures efficient gaze representation learning in the absence of gaze label. Our evaluation on two gaze estimation datasets shows that FreeGaze can achieve comparable gaze estimation accuracy with existing supervised learning-based approach, while enabling up to 6.81 and 1.67 times speedup in system calibration and gaze estimation, respectively.
翻译:对许多科学领域和日常应用而言,Gaze估计对于许多科学领域和日常应用都非常重要,从认知心理学基础研究到关注移动系统。虽然最近深层学习的进展在建立高度准确的视觉估计系统方面取得了显著成功,但相关的高计算成本和对大规模标记的凝视数据对监督学习的依赖对实际使用现有解决方案提出了挑战。为了超越这些局限性,我们介绍了FreeGaze,这是一个不受监督的视觉代表学习的资源高效框架。FreeGaze在其设计中纳入了频域眼视估计和对比的视觉代表学习。前者大大减轻了系统校准和视觉估计中的计算负担,并大大降低了系统的内嵌度;而后者克服了标明现有受监督的学习对应方存在障碍的数据,确保在没有视觉标签的情况下高效的视觉表现学习。我们对两个视觉估计数据集的评价表明,FreeGaze可以利用现有的以监督学习为基础的方法实现可比的视觉估计准确度,同时使系统校准和视觉估计的速度分别达到6.81和1.67倍。