Crowd counting is a key aspect of crowd analysis and has been typically accomplished by estimating a crowd-density map and summing over the density values. However, this approach suffers from background noise accumulation and loss of density due to the use of broad Gaussian kernels to create the ground truth density maps. This issue can be overcome by narrowing the Gaussian kernel. However, existing approaches perform poorly when trained with such ground truth density maps. To overcome this limitation, we propose using conditional diffusion models to predict density maps, as diffusion models are known to model complex distributions well and show high fidelity to training data during crowd-density map generation. Furthermore, as the intermediate time steps of the diffusion process are noisy, we incorporate a regression branch for direct crowd estimation only during training to improve the feature learning. In addition, owing to the stochastic nature of the diffusion model, we introduce producing multiple density maps to improve the counting performance contrary to the existing crowd counting pipelines. Further, we also differ from the density summation and introduce contour detection followed by summation as the counting operation, which is more immune to background noise. We conduct extensive experiments on public datasets to validate the effectiveness of our method. Specifically, our novel crowd-counting pipeline improves the error of crowd-counting by up to $6\%$ on JHU-CROWD++ and up to $7\%$ on UCF-QNRF.
翻译:人群计数是人群分析的关键方面,通常通过估计人群密度图并对密度值进行求和来完成。然而,该方法存在背景噪声积累和由于使用广泛的高斯核创建地面真实密度图导致的密度损失问题。这个问题可以通过缩小高斯核来克服。然而,现有方法在使用这样的真实密度图进行训练时表现不佳。为了克服这个限制,我们提出使用条件扩散模型来预测密度图,因为扩散模型已被证明可以很好地建模复杂的分布,并且在人群密度图生成期间对训练数据显示出高度的精确性。此外,由于扩散过程的中间时间步骤具有噪声特性,在训练期间我们结合回归分支进行直接人群估计以改善特征学习。此外,由于扩散模型的随机性质,我们引入多个密度图来提高计数性能,与现有人群计数算法相反。此外,我们还与密度求和算法不同,引入轮廓检测加上求和作为计数操作,这更不容易受到背景噪声的影响。我们在公共数据集上进行了大量实验以验证我们方法的有效性。具体而言,我们的新型人群计数算法在JHU-CROWD++上将计数误差提高了$6\%$,在UCF-QNRF上提高了$7\%$。