Human eye contact is a form of non-verbal communication and can have a great influence on social behavior. Since the location and size of the eye contact targets vary across different videos, learning a generic video-independent eye contact detector is still a challenging task. In this work, we address the task of one-way eye contact detection for videos in the wild. Our goal is to build a unified model that can identify when a person is looking at his gaze targets in an arbitrary input video. Considering that this requires time-series relative eye movement information, we propose to formulate the task as a temporal segmentation. Due to the scarcity of labeled training data, we further propose a gaze target discovery method to generate pseudo-labels for unlabeled videos, which allows us to train a generic eye contact segmentation model in an unsupervised way using in-the-wild videos. To evaluate our proposed approach, we manually annotated a test dataset consisting of 52 videos of human conversations. Experimental results show that our eye contact segmentation model outperforms the previous video-dependent eye contact detector and can achieve 71.88% framewise accuracy on our annotated test set. Our code and evaluation dataset are available at https://github.com/ut-vision/Video-Independent-ECS.
翻译:人类眼接触是一种非口头的交流形式,可以对社会行为产生巨大影响。由于眼接触目标的位置和大小因视频的不同而不同,学习通用视频独立眼接触检测器仍是一项艰巨的任务。在这项工作中,我们处理的是在野生视频中单向眼接触检测任务。我们的目标是建立一个统一的模型,能够识别一个人何时在任意输入视频中观看其视觉目标。考虑到这需要时间序列相对眼睛移动信息,我们提议将任务设计成一个时间段段。由于标签化的培训数据稀缺,我们进一步建议一种视觉目标发现方法,为未贴标签的视频生成假标签,这使我们能够用在视频中以不受监督的方式培训通用眼接触分解模型。为了评估我们的拟议方法,我们手动了一个由52个人类谈话视频组成的测试数据集。实验结果显示,我们眼接触模式超越了先前的视频独立眼接触检测器,并能够实现71.88%的无标签式图像精确度。 IMAPS/ADS/ADVS/ADS-ADS-ADS-Intestational-Intuction ASetmental-Intutional-Intuctions AS.ODown 和EDVGVGiscodeal-VGismiss-S-VGislational-s AStal-Setetetetets AStal-S-s