Crowd segmentation is a fundamental task serving as the basis of crowded scene analysis, and it is highly desirable to obtain refined pixel-level segmentation maps. However, it remains a challenging problem, as existing approaches either require dense pixel-level annotations to train deep learning models or merely produce rough segmentation maps from optical or particle flows with physical models. In this paper, we propose the Motion Prior-Aware Siamese Network (MPASNET) for unsupervised crowd semantic segmentation. This model not only eliminates the need for annotation but also yields high-quality segmentation maps. Specially, we first analyze the coherent motion patterns across the frames and then apply a circular region merging strategy on the collective particles to generate pseudo-labels. Moreover, we equip MPASNET with siamese branches for augmentation-invariant regularization and siamese feature aggregation. Experiments over benchmark datasets indicate that our model outperforms the state-of-the-arts by more than 12% in terms of mIoU.
翻译:聚众分解是一项基本任务,可以作为拥挤的场景分析的基础,非常可取的是获取精细像素级分解图。然而,这仍然是一个具有挑战性的问题,因为现有的方法要求密集像素级分解图来训练深层学习模型,或者仅仅用物理模型来制作光学或粒子流的粗度分解图。在本文中,我们建议采用“移动前Aware Siamese网络”来进行不受监督的人群分解。这个模型不仅消除了批注的需要,而且产生了高质量的分解图。特别是,我们首先分析了跨框架的一致运动模式,然后对集体粒子应用循环区域合并战略来生成假标签。此外,我们把MPASNET配备了用于增强-不动调节和硅特征集合的硅分支。对基准数据集的实验表明,我们的模型在 mIOU方面超越了12%的状态。