In this paper, we propose a simple yet effective crowd counting and localization network named SCALNet. Unlike most existing works that separate the counting and localization tasks, we consider those tasks as a pixel-wise dense prediction problem and integrate them into an end-to-end framework. Specifically, for crowd counting, we adopt a counting head supervised by the Mean Square Error (MSE) loss. For crowd localization, the key insight is to recognize the keypoint of people, i.e., the center point of heads. We propose a localization head to distinguish dense crowds trained by two loss functions, i.e., Negative-Suppressed Focal (NSF) loss and False-Positive (FP) loss, which balances the positive/negative examples and handles the false-positive predictions. Experiments on the recent and large-scale benchmark, NWPU-Crowd, show that our approach outperforms the state-of-the-art methods by more than 5% and 10% improvement in crowd localization and counting tasks, respectively. The code is publicly available at https://github.com/WangyiNTU/SCALNet.
翻译:在本文中,我们提出一个简单而有效的人群计数和本地化网络,名为 SCALNet 。 与大多数区分计数和本地化任务的现有工作不同, 我们把这些任务视为像素一样的密集预测问题, 并将其纳入端对端框架。 具体地说, 在人群计数时, 我们采用一个由中方错误( MSE) 损失监督的计数头。 对于人群定位, 关键见解是识别人群的关键点, 即, 中心点。 我们建议一个本地化头, 以区分通过两种损失功能, 即负压式协调人( NSF) 损失和假政策( FP) 损失来培训的密集人群。 代码可以公开查阅 https://github. com/WangyuyNTU/ Crowd. 实验最近和大规模基准( NWPU- Crowd), 显示我们的方法在人群本地化和计数任务方面超越了州级方法的5%和10%以上。 代码可以在 https://githoub. com/ WangnyUL.