Multi-view approaches to people-tracking have the potential to better handle occlusions than single-view ones in crowded scenes. They often rely on the tracking-by-detection paradigm, which involves detecting people first and then connecting the detections. In this paper, we argue that an even more effective approach is to predict people motion over time and infer people's presence in individual frames from these. This enables to enforce consistency both over time and across views of a single temporal frame. We validate our approach on the PETS2009 and WILDTRACK datasets and demonstrate that it outperforms state-of-the-art methods.
翻译:人行道的多视角方法有可能在拥挤的场景中更好地处理隔离问题,而不是单一视角的隔离问题,它们往往依赖跟踪逐个检测模式,先先探测人,然后将探测连接起来。在本文中,我们争论说,一个更加有效的方法是预测人随时间而移动,然后从中推断个人在单个框架中的存在。这样就可以在时间和对单一时间框架的不同观点中加强一致性。我们验证了我们在PETS2009和WILDTRACK数据集上的做法,并证明它优于最先进的方法。