实时多人野外眨眼检测用于未剪辑视频 (Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video)

Real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. The existing research efforts generally focus on single-person cases towards trimmed video. However, multi-person scenario within untrimmed videos is also important for practical applications, which has not been well concerned yet. To address this, we shed light on this research field for the first time with essential contributions on dataset, theory, and practices. In particular, a large-scale dataset termed MPEblink that involves 686 untrimmed videos with 8748 eyeblink events is proposed under multi-person conditions. The samples are captured from unconstrained films to reveal "in the wild" characteristics. Meanwhile, a real-time multi-person eyeblink detection method is also proposed. Being different from the existing counterparts, our proposition runs in a one-stage spatio-temporal way with end-to-end learning capacity. Specifically, it simultaneously addresses the sub-tasks of face detection, face tracking, and human instance-level eyeblink detection. This paradigm holds 2 main advantages: (1) eyeblink features can be facilitated via the face's global context (e.g., head pose and illumination condition) with joint optimization and interaction, and (2) addressing these sub-tasks in parallel instead of sequential manner can save time remarkably to meet the real-time running requirement. Experiments on MPEblink verify the essential challenges of real-time multi-person eyeblink detection in the wild for untrimmed video. Our method also outperforms existing approaches by large margins and with a high inference speed.

翻译：实时野外眨眼检测能广泛用于疲劳检测、人脸反欺诈、情感分析等领域。现有研究主要集中在针对剪辑视频中的单人情况。然而，在多人场景下进行未剪辑视频的检测也是实际应用中的重要部分，但是这方面的研究尚未得到充分重视。为了解决这个问题，我们首次着眼于该研究领域，并在数据集、理论和实践方面做出了重要的贡献。特别是，我们提出了一个名为MPEblink的大规模数据集，涉及686个未剪辑视频和8748个眨眼事件，适用于多人条件下的场景。这些样本是从非约束性电影中捕获的，以揭示“野外”特征。与现有的众多方法不同，我们提出了一个实时多人眨眼检测方法，执行一种端到端的一阶段时空方式。具体而言，它同时解决了人脸检测、人脸跟踪和人体实例级眨眼检测这三个子任务。该方法有两个主要优点：（1）利用人脸的全局上下文（例如头部姿态和照明条件）可以促进眨眼特征的提取及联合优化和交互；（2）并行地解决这些子任务而不是顺序地解决可以显著缩短时间，满足实时运行要求。在MPEblink上的实验验证了未剪辑视频野外中实时多人眨眼检测的关键挑战。我们的方法也相对现有方法在速度和检测性能上取得了显著优越的表现。