Gaze tracking is a valuable tool with a broad range of applications in various fields, including medicine, psychology, virtual reality, marketing, and safety. Therefore, it is essential to have gaze tracking software that is cost-efficient and high-performing. Accurately predicting gaze remains a difficult task, particularly in real-world situations where images are affected by motion blur, video compression, and noise. Super-resolution has been shown to improve image quality from a visual perspective. This work examines the usefulness of super-resolution for improving appearance-based gaze tracking. We show that not all SR models preserve the gaze direction. We propose a two-step framework based on SwinIR super-resolution model. The proposed method consistently outperforms the state-of-the-art, particularly in scenarios involving low-resolution or degraded images. Furthermore, we examine the use of super-resolution through the lens of self-supervised learning for gaze prediction. Self-supervised learning aims to learn from unlabelled data to reduce the amount of required labeled data for downstream tasks. We propose a novel architecture called SuperVision by fusing an SR backbone network to a ResNet18 (with some skip connections). The proposed SuperVision method uses 5x less labeled data and yet outperforms, by 15%, the state-of-the-art method of GazeTR which uses 100% of training data.
翻译:注视追踪是一种有广泛应用的有价值的工具,涉及医学、心理学、虚拟现实、营销和安全等众多领域。因此,必须拥有具有成本效益和高性能的注视追踪软件。准确预测注视仍然是个困难的任务,特别是在受到运动模糊、视频压缩和噪声影响的现实环境中。超分辨率已被证明可以从视觉上改善图像质量。本研究检验了超分辨率对改善基于外貌的注视追踪的效用。我们发现不是所有SR模型都能保持注视方向。我们提出了一个基于SwinIR超分辨率模型的两步框架。所提出的方法在特别是低分辨率或受到破坏的图像场景中一直优于最先进的解决方案。此外,我们从自监督学习的角度研究了超分辨率的用途,以进行注视预测。自监督学习旨在从未带标签的数据中学习,以减少下游任务所需的标记数据量。我们提出了一种名为SuperVision的新颖架构,将SR主干网络融合到ResNet18(带一些跳过连接)中。所提出的SuperVision方法使用的标记数据量比使用100%训练数据的GazeTR方法少5倍,但表现优于后者15%。