Crowd counting based on density maps is generally regarded as a regression task.Deep learning is used to learn the mapping between image content and crowd density distribution. Although great success has been achieved, some pedestrians far away from the camera are difficult to be detected. And the number of hard examples is often larger. Existing methods with simple Euclidean distance algorithm indiscriminately optimize the hard and easy examples so that the densities of hard examples are usually incorrectly predicted to be lower or even zero, which results in large counting errors. To address this problem, we are the first to propose the Hard Example Focusing(HEF) algorithm for the regression task of crowd counting. The HEF algorithm makes our model rapidly focus on hard examples by attenuating the contribution of easy examples.Then higher importance will be given to the hard examples with wrong estimations. Moreover, the scale variations in crowd scenes are large, and the scale annotations are labor-intensive and expensive. By proposing a multi-Scale Semantic Refining (SSR) strategy, lower layers of our model can break through the limitation of deep learning to capture semantic features of different scales to sufficiently deal with the scale variation. We perform extensive experiments on six benchmark datasets to verify the proposed method. Results indicate the superiority of our proposed method over the state-of-the-art methods. Moreover, our designed model is smaller and faster.
翻译:以密度地图为基础的人群计数通常被视为一个回归任务。 深层学习用于学习图像内容和人群密度分布之间的映射。 虽然已经取得了巨大成功, 但有些远离相机的行人很难被检测到。 硬实例的数量往往更多。 现有的简单Euclidean 远程算法方法, 不加区别地优化了硬和简单的例子, 从而通常错误地预测硬实例的密度较低甚至为零, 从而导致大量计数错误。 为了解决这个问题, 我们首先提出为人群计数的回归任务采用硬示例聚焦算法。 HEF 算法使我们的模型快速聚焦于硬实例, 其方法是减少简单实例的贡献。 然后, 将更加重视错误估计的硬实例。 此外, 人群场的规模变化很大, 比例说明通常是劳动力密集和昂贵的。 为了解决这个问题, 我们模型的下层层可以通过深度学习, 来获取不同比例尺的精度更小的缩略度特征, 来快速地分析我们所设计的比重度方法。 我们所设计的比重度的实验方式将比重得越远得多。