The rapid development in visual crowd analysis shows a trend to count people by positioning or even detecting, rather than simply summing a density map. It also enlightens us back to the essence of the field, detection to count, which can give more abundant crowd information and has more practical applications. However, some recent work on crowd localization and detection has two limitations: 1) The typical detection methods can not handle the dense crowds and a large variation in scale; 2) The density map heuristic methods suffer from performance deficiency in position and box prediction, especially in high density or large-size crowds. In this paper, we devise a tailored baseline for dense crowds location, detection, and counting from a new perspective, named as LDC-Net for convenience, which has the following features: 1) A strong but minimalist paradigm to detect objects by only predicting a location map and a size map, which endows an ability to detect in a scene with any capacity ($0 \sim 10,000+$ persons); 2) Excellent cross-scale ability in facing a large variation, such as the head ranging in $0 \sim 100,000+$ pixels; 3) Achieve superior performance in location and box prediction tasks, as well as a competitive counting performance compared with the density-based methods. Finally, the source code and pre-trained models will be released.
翻译:视觉人群分析的迅速发展显示了通过定位或甚至探测,而不是简单地绘制密度地图来计算人数的趋势,它也使我们重新回到了实地的本质,即检测到点数,这样可以提供更丰富的人群信息,并具有更实际的应用。然而,最近关于人群定位和检测的一些工作有两个局限性:(1) 典型的检测方法无法处理密集人群,规模差异很大;(2) 密度地图超强的方法存在位置和盒式预测的性能缺陷,特别是在高密度或大容量人群中。 在本文中,我们为密集人群的位置、检测和从新的角度计数设计了一个定制的基线,称为最不发达国家网,其特点如下:(1) 一个强大但最起码的模型,仅通过预测位置地图和大小地图来探测物体。 该模型赋予了在任何能力强的场景中探测能力(10 000美元+美元);(2) 面对巨大的变化,特别是在高密度或大容量人群群中,我们设计了一个特异的跨尺度能力,例如头值为10万美元+平素;(3) 最后,将高超高性性能性能和信箱前的计算方法,作为基于模型的预测。