Noisy annotations such as missing annotations and location shifts often exist in crowd counting datasets due to multi-scale head sizes, high occlusion, etc. These noisy annotations severely affect the model training, especially for density map-based methods. To alleviate the negative impact of noisy annotations, we propose a novel crowd counting model with one convolution head and one transformer head, in which these two heads can supervise each other in noisy areas, called Cross-Head Supervision. The resultant model, CHS-Net, can synergize different types of inductive biases for better counting. In addition, we develop a progressive cross-head supervision learning strategy to stabilize the training process and provide more reliable supervision. Extensive experimental results on ShanghaiTech and QNRF datasets demonstrate superior performance over state-of-the-art methods. Code is available at https://github.com/RaccoonDML/CHSNet.
翻译:这些吵闹的注释严重影响了模型培训,特别是密度地图方法。为了减轻噪音说明的负面影响,我们提议采用一个新的人群计数模式,即一个革命头目和一个变压器头,让这两个头目在吵闹地区互相监督,称为跨总督导。由此产生的模型CHS-Net可以将不同类型的诱导偏差协同起来,以便更好地计数。此外,我们制定了渐进式跨头督导学习战略,以稳定培训过程并提供更可靠的监督。上海科技和QNRF数据集的广泛实验结果显示优于最先进的方法。代码可在https://github.com/RaccoonDML/CHSNet上查阅。</s>