Over the past few years, there has been an increasing interest to interpret gaze direction in an unconstrained environment with limited supervision. Owing to data curation and annotation issues, replicating gaze estimation method to other platforms, such as unconstrained outdoor or AR/VR, might lead to significant drop in performance due to insufficient availability of accurately annotated data for model training. In this paper, we explore an interesting yet challenging problem of gaze estimation method with a limited amount of labelled data. The proposed method distills knowledge from the labelled subset with visual features; including identity-specific appearance, gaze trajectory consistency and motion features. Given a gaze trajectory, the method utilizes label information of only the start and the end frames of a gaze sequence. An extension of the proposed method further reduces the requirement of labelled frames to only the start frame with a minor drop in the generated label's quality. We evaluate the proposed method on four benchmark datasets (CAVE, TabletGaze, MPII and Gaze360) as well as web-crawled YouTube videos. Our proposed method reduces the annotation effort to as low as 2.67%, with minimal impact on performance; indicating the potential of our model enabling gaze estimation 'in-the-wild' setup.
翻译:过去几年来,人们越来越有兴趣在不受限制且监督有限的环境中解释视觉方向。由于数据整理和注释问题,将视觉估计方法复制到其他平台,如不受限制的室外或AR/VR,由于模型培训缺乏准确的附加说明数据,可能导致性能显著下降。在本文中,我们探索了以有限的贴标签数据来估计视觉方法这一有趣而又富有挑战性的问题。拟议方法从带有视觉特征的标签子组中提取知识;包括特定身份的外观、视轨迹的一致性和运动特征。根据视觉轨迹,该方法仅使用视觉序列的起始和结束框架的标签信息。拟议方法的扩展进一步将标记框架的要求仅降低到起始框架,而生成的标签质量稍有下降。我们评估了四个基准数据集的拟议方法(CAVE、CapetGaze、MPII和Gaze360)以及网络浏览的YouTube视频。我们提出的方法将注意努力降低到2.67%的模型,使视觉产生最小的影响。