Crowd counting aims to learn the crowd density distributions and estimate the number of objects (e.g. persons) in images. The perspective effect, which significantly influences the distribution of data points, plays an important role in crowd counting. In this paper, we propose a novel perspective-aware approach called PANet to address the perspective problem. Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework. The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region. Different from most previous works which use Gaussian kernels to generate the density map as the supervised information, we propose the self-distilling supervision (SDS) training method. The ground-truth density maps are refined from the first training stage and the perspective information is distilled to the model in the second stage. The experimental results on ShanghaiTech Part_A and Part_B, UCF_QNRF, and UCF_CC_50 datasets demonstrate that our proposed PANet outperforms the state-of-the-art methods by a large margin.
翻译:人群计数的目的是在图像中了解人群密度分布和估计对象数量(例如人) 。 视角效应对数据点分布有显著影响,在人群计数中起着重要作用。 在本文中,我们提出了名为 PANet 的新视角认知方法,以解决视角问题。 基于一个图像中天体大小因视角效应而有很大差异的观察,我们建议了动态可接收字段框架。 该框架能够根据输入图像通过扩展变动参数调整可接收字段, 帮助模型为每个本地区域提取更多区分性特征。 与以前使用高山内核生成密度图作为监管信息的多数工程不同, 我们建议了自我淡化监督(SDS)培训方法。 从第一个培训阶段对地心密度图进行了精细化, 并在第二个阶段将视角信息提炼到模型中。 上海科技部分A和部分B、 UCFZNRF和UCF_CC_50 基础域域域域域域中的拟议方法展示了我们提议的PAFA的大型数据形式。