Deep neural networks have demonstrated superior performance on appearance-based gaze estimation tasks. However, due to variations in person, illuminations, and background, performance degrades dramatically when applying the model to a new domain. In this paper, we discover an interesting gaze jitter phenomenon in cross-domain gaze estimation, i.e., the gaze predictions of two similar images can be severely deviated in target domain. This is closely related to cross-domain gaze estimation tasks, but surprisingly, it has not been noticed yet previously. Therefore, we innovatively propose to utilize the gaze jitter to analyze and optimize the gaze domain adaptation task. We find that the high-frequency component (HFC) is an important factor that leads to jitter. Based on this discovery, we add high-frequency components to input images using the adversarial attack and employ contrastive learning to encourage the model to obtain similar representations between original and perturbed data, which reduces the impacts of HFC. We evaluate the proposed method on four cross-domain gaze estimation tasks, and experimental results demonstrate that it significantly reduces the gaze jitter and improves the gaze estimation performance in target domains.
翻译:深神经网络在以外观为基础的视觉估计任务上表现出了优异的性能。 但是,由于人、光照和背景的不同,在将模型应用到一个新的领域时,性能会急剧下降。 在本文中,我们发现跨多面视觉估计中一种有趣的目视狂喜现象,即对两种相似图像的视觉预测在目标领域可能严重偏差。这与跨面观估计任务密切相关,但令人惊讶的是,它以前还没有被注意到。因此,我们创新地提议利用凝视狂来分析和优化视域适应任务。我们发现,高频部分(HFC)是导致振荡的一个重要因素。基于这一发现,我们利用对抗性攻击在输入图像时添加高频部分,并利用对比性学习鼓励模型获得原始数据和周遭数据之间的类似表述,以减少氢氟碳化合物的影响。我们评估了四种跨面视觉估计任务的拟议方法,实验结果显示,它显著地减少了凝光度并改进了目标领域的视觉估计业绩。