The recent trend in multiple object tracking (MOT) is jointly solving detection and tracking, where object detection and appearance feature (or motion) are learned simultaneously. Despite competitive performance, in crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections. In this paper, we jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes. By imposing mutual object-count constraints between detection and counting, the CountingMOT tries to find a balance between object detection and crowd density map estimation, which can help it to recover missed detections or reject false detections. Our approach is an attempt to bridge the gap of object detection, counting, and re-Identification. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to failure in crowded scenes, or depend on local correlations to build a graphical relationship for matching targets. The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 77.6), MOT17 (MOTA of 78.0%) and MOT20 (MOTA of 70.2%).
翻译:多个物体跟踪(MOT)的最新趋势是共同解决探测和跟踪问题,即物体探测和外观特征(或运动)同时学习。尽管表现竞争,但是在拥挤的场景中,联合探测和跟踪通常无法找到准确的物体关联,因为误发现或误发现。在本文中,我们共同在端对端框架内,根据拥挤的场景专门设计计数和重新定位。通过在探测和计数之间设置相互的物体计数限制,计数MOT试图在物体探测和人群密度地图估计之间找到平衡,从而帮助它恢复未发现的探测或拒绝虚假的探测。我们的方法是试图弥合天体探测、计数和重新识别方面的差距。这与以前MOT方法形成鲜明对照,这些方法要么忽视了人群密度,从而容易在拥挤的场景中出现故障,要么取决于当地相关关系,以建立匹配目标的图形关系。拟议的MOT追踪器可以进行在线和实时跟踪,并实现公共基准MOT16(MO6)、MOT17(78.0% MOTA)和MOT17(MO2.%)的公共基准(MOTO0.17)。