Object detection generally requires sliding-window classifiers in tradition or anchor box based predictions in modern deep learning approaches. However, either of these approaches requires tedious configurations in boxes. In this paper, we provide a new perspective where detecting objects is motivated as a high-level semantic feature detection task. Like edges, corners, blobs and other feature detectors, the proposed detector scans for feature points all over the image, for which the convolution is naturally suited. However, unlike these traditional low-level features, the proposed detector goes for a higher-level abstraction, that is, we are looking for central points where there are objects, and modern deep models are already capable of such a high-level semantic abstraction. Besides, like blob detection, we also predict the scales of the central points, which is also a straightforward convolution. Therefore, in this paper, pedestrian and face detection is simplified as a straightforward center and scale prediction task through convolutions. This way, the proposed method enjoys a box-free setting. Though structurally simple, it presents competitive accuracy on several challenging benchmarks, including pedestrian detection and face detection. Furthermore, a cross-dataset evaluation is performed, demonstrating a superior generalization ability of the proposed method. Code and models can be accessed at (https://github.com/liuwei16/CSP and https://github.com/hasanirtiza/Pedestron).
翻译:通常,在传统或锚箱的基础上,在现代深层学习方法中进行滑式窗式物体探测,通常需要以传统或锚箱为基础的预测,而这两种方法中的任何一种方法都需要在框内进行枯燥的配置。在本文件中,我们提供了一个新的视角,将探测对象作为高级语义特征探测任务。像边缘、角、浮标和其他特征探测器一样,拟议的探测器扫描图像上方的特征点,这些特征点自然适合卷动。然而,与这些传统的低层次特征不同,提议的探测器用于更高层次的抽象,也就是说,我们正在寻找有物体的中央点,而现代深层模型已经能够进行这种高层次的语义抽象特征探测。此外,我们还预测了中心点的尺度,这也是一个简单的卷动。因此,在本文中,行人和面探测作为简单的中心和规模预测任务,提议的方法是无框框框设置。虽然结构简单,但在几个具有挑战性的基准上显示有竞争力的精确度,包括行道探测和面容感检测能力。此外,还像BLA/PE/SVS/C。 进行跨数据评估。