This work presents a unified, fully differentiable model for multi-people tracking that learns to associate detections into trajectories without relying on pre-computed tracklets. The model builds a dynamic spatiotemporal graph that aggregates spatial, contextual, and temporal information, enabling seamless information propagation across entire sequences. To improve occlusion handling, the graph can also encode scene-specific information. We also introduce a new large-scale dataset with 25 partially overlapping views, detailed scene reconstructions, and extensive occlusions. Experiments show the model achieves state-of-the-art performance on public benchmarks and the new dataset, with flexibility across diverse conditions. Both the dataset and approach will be publicly released to advance research in multi-people tracking.
翻译:本研究提出了一种统一的、完全可微的多目标追踪模型,该模型能够学习将检测结果关联成轨迹,而无需依赖预计算的轨迹片段。该模型构建了一个动态时空图,聚合了空间、上下文和时间信息,实现了在整个序列中无缝的信息传播。为了提升遮挡处理能力,该图还可以编码场景特定信息。我们还引入了一个新的大规模数据集,包含25个部分重叠的视角、详细的场景重建以及大量的遮挡情况。实验表明,该模型在公开基准测试和新数据集上均达到了最先进的性能,并在多种条件下展现出良好的灵活性。数据集与所提方法均将公开发布,以推动多目标追踪领域的研究。