Previous work generally believes that improving the spatial invariance of convolutional networks is the key to object counting. However, after verifying several mainstream counting networks, we surprisingly found too strict pixel-level spatial invariance would cause overfit noise in the density map generation. In this paper, we try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map. The purpose of this is to allow the feature extraction process to potentially stimulate the density map generation process to overcome the annotation noise. Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution. Our work points a new direction for follow-up research, which should investigate how to properly relax the overly strict pixel-level spatial invariance for object counting. We evaluate our methods on 4 mainstream object counting networks (i.e., MCNN, CSRNet, SANet, and ResNet-50). Extensive experiments were conducted on 7 popular benchmarks for 3 applications (i.e., crowd, vehicle, and plant counting). Experimental results show that our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects.
翻译:先前的工作一般认为,改善变迁网络的空间差异性是点算目标的关键。 然而,在核查了几个主流计算网络之后,我们惊讶地发现,过于严格的像素级空间差异性会在密度地图生成过程中造成超适中的噪音。 在本文中,我们试图使用本地连接的高森内核来取代原始变动过滤器来估计密度地图的空间位置。这样做的目的是让地物提取过程有可能刺激密度地图生成过程,以克服批注噪音。在以往工作启发下,我们建议采用低位近似值,同时进行翻译,以有利于实施大规模高山共振动的近似性。我们的工作指向了后续研究的新方向,该方向应研究如何适当放松过于严格的变异性平级空间以进行天体计。我们评估了4个主流天体计网络(即MCNN、CSRNet、SANet和ResNet-50)的方法。我们进行了广泛的实验,对3个应用对象(即人群、人群、车辆和植物的定位)的7个通用基准进行了广泛的测试,以显示我们富有前景的其他学习方法。