We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically-transformed images as inputs to a Convolutional Neural Network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multi-view trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to $42\%$ without increasing the inference time of the model. Our multi-camera association method achieves up to $89\%$ multi-object tracking accuracy with an average computation time of less than $15$ ms.
翻译:我们引入了一个新的框架,用于跟踪机场检查站安全情景的顶部摄像视频中的多个物体,其目标与乘客及其行李相对应;我们建议采用自我监督学习技术,以提供从顶部图像中产生的情况分解不确定性的模型信息;我们的SSL方法通过使用测试时间数据增强和基于回归、旋转和变换的假标签改进技术来改进物体探测;我们的假标签生成方法提供了多位几何转换图像,作为进化神经网络(CNN)的投入,并恢复了网络为减少本地化错误而增加的探测,然后用中值算法将其分组;自监督探测器模型用于单部相机跟踪算法,以便为目标生成时间标识;我们的方法还包含一个多视角轨联动机制,以保持乘客跨摄像器旅行时的一致时间标识;在现实的机场检查站环境里,对从多部高层摄像头获得的视频的检测、跟踪和关联性表现进行了评估,展示了拟议方法的有效性;我们的结果显示,自监督美元模型提高了目标的准确度,比我们平均时间跟踪的精确度提高了多轨道的准确度,没有达到多轨道。