Convolutional Neural Network (CNN) based crowd counting methods have achieved promising results in the past few years. However, the scale variation problem is still a huge challenge for accurate count estimation. In this paper, we propose a multi-scale feature aggregation network (MSFANet) that can alleviate this problem to some extent. Specifically, our approach consists of two feature aggregation modules: the short aggregation (ShortAgg) and the skip aggregation (SkipAgg). The ShortAgg module aggregates the features of the adjacent convolution blocks. Its purpose is to make features with different receptive fields fused gradually from the bottom to the top of the network. The SkipAgg module directly propagates features with small receptive fields to features with much larger receptive fields. Its purpose is to promote the fusion of features with small and large receptive fields. Especially, the SkipAgg module introduces the local self-attention features from the Swin Transformer blocks to incorporate rich spatial information. Furthermore, we present a local-and-global based counting loss by considering the non-uniform crowd distribution. Extensive experiments on four challenging datasets (ShanghaiTech dataset, UCF_CC_50 dataset, UCF-QNRF Dataset, WorldExpo'10 dataset) demonstrate the proposed easy-to-implement MSFANet can achieve promising results when compared with the previous state-of-the-art approaches.
翻译:过去几年来,基于革命神经网络(CNN)的人群计数方法取得了可喜的成果。然而,规模差异问题仍然是准确计算估算的巨大挑战。在本文中,我们提议建立一个能够在某种程度上缓解这一问题的多规模特征聚合网络(MSFANet),具体地说,我们的方法包括两个特征集合模块:短集(ShortAgg)和跳转(SkipAgg),短Agg模块汇总了相邻的聚合区块的特征。它的目的是让从网络底部到顶部的不同可接收域的功能逐渐融合起来。SkippAgg模块直接将小型可容纳域的功能与大得多可接收域的功能直接传播。它的目的是促进小型和大可容纳域的特性融合。特别是,SkippAgg模块介绍了Swin变换区块的本地自留功能,以纳入丰富的空间信息。此外,我们通过考虑非统一的人群分布,对四个具有挑战性的数据数据集进行广泛的实验(Schapha-Tech-stable State-DFAS-CSAD-CFASet the Develop State dal Develop State daset), UCFAC-CFA-D-C-D-D-D-C-C-C-C-D-C-C-C-D-CSet-CSpropal Dal DD-CSD-CSD-CSD-CSD-CSD-CSD-CSDDDDDDDDDDDDDDDDDDDDDSDSDSDSDADDDSDSD)的大规模数据展示数据系统,我们。