Head detection in the indoor video is an essential component of building occupancy detection. While deep models have achieved remarkable progress in general object detection, they are not satisfying enough in complex indoor scenes. The indoor surveillance video often includes cluttered background objects, among which heads have small scales and diverse poses. In this paper, we propose Motion-aware Pseudo Siamese Network (MPSN), an end-to-end approach that leverages head motion information to guide the deep model to extract effective head features in indoor scenarios. By taking the pixel-wise difference of adjacent frames as the auxiliary input, MPSN effectively enhances human head motion information and removes the irrelevant objects in the background. Compared with prior methods, it achieves superior performance on the two indoor video datasets. Our experiments show that MPSN successfully suppresses static background objects and highlights the moving instances, especially human heads in indoor videos. We also compare different methods to capture head motion, which demonstrates the simplicity and flexibility of MPSN. To validate the robustness of MPSN, we conduct adversarial experiments with a mathematical solution of small perturbations for robust model selection. Finally, for confirming its potential in building control systems, we apply MPSN to occupancy counting. Code is available at https://github.com/pl-share/MPSN.
翻译:室内视频中头部检测是建筑占用探测的基本组成部分。 虽然深层模型在一般物体检测方面取得了显著进步, 但在复杂的室内场景中却不够令人满意。 室内监视视频通常包括杂乱的背景物体, 其中头部规模小,外形各异。 在本文中, 我们提议采用“ 端到端”方法, 利用头部信息来引导深层模型在室内情景中提取有效的头部特征。 虽然深层模型在一般物体检测中取得了显著的进展, 但是在复杂的室内场景中它们不够令人满意。 室内监视视频视频中通常包括杂乱的背景物体, 其中头部的尺寸小, 以及各种外形。 我们的实验显示, 运动成功压制静态背景物体, 突出移动情况, 特别是室内视频中的人头。 我们还比较了不同的方法来捕捉头部运动, 展示了MPSN的简单性和灵活性。 为了验证MPSN的坚固性, 我们用小孔径的数学解决方案进行对抗性实验, 与先前的方法相比, 它在两个室内视频数据集中取得了优异性功能。 最后, 我们用MASN/ SUR 正在计算其潜在的控制系统。