Vision-based vehicle detection approaches achieve incredible success in recent years with the development of deep convolutional neural network (CNN). However, existing CNN based algorithms suffer from the problem that the convolutional features are scale-sensitive in object detection task but it is common that traffic images and videos contain vehicles with a large variance of scales. In this paper, we delve into the source of scale sensitivity, and reveal two key issues: 1) existing RoI pooling destroys the structure of small scale objects, 2) the large intra-class distance for a large variance of scales exceeds the representation capability of a single network. Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales. First, we present a context-aware RoI pooling to maintain the contextual information and original structure of small scale objects. Second, we present a multi-branch decision network to minimize the intra-class distance of features. These lightweight techniques bring zero extra time complexity but prominent detection accuracy improvement. The proposed techniques can be equipped with any deep network architectures and keep them trained end-to-end. Our SINet achieves state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on the KITTI benchmark and a new highway dataset, which contains a large variance of scales and extremely small objects.
翻译:近年来,通过发展深层神经神经网络,基于视觉的车辆探测方法取得了令人难以置信的成功。然而,基于CNN的现有有线电视算法遇到了一个问题,即进化特征在目标探测任务中具有规模敏感度,但是,交通图像和视频含有比例差异很大的车辆。在本文件中,我们探索了规模敏感度的来源,并揭示了两个关键问题:(1) 现有的RoI联合起来摧毁了小规模物体的结构,(2) 大规模规模差异的大型阶级内部距离超过了单一网络的代表性能力。根据这些调查结果,我们提出了一个对进化敏感型神经网络(SINet)的问题,用于快速探测规模差异很大的车辆。首先,我们展示了一种具有环境意识的RoI集合,以维护规模小物体的背景信息和原始结构。第二,我们展示了一个多层决策网络网络,以最大限度地缩小等离级物体的距离。这些轻度技术带来了零额外时间复杂性,但显著的检测精确度改进了单一网络的表达能力。根据这些发现技术可以配备任何深层网络结构,并保持对进进动型神经网络网络网络网络网络网络网络(SINet)的精度,在最大端端端至端级的精确度上实现了一个最大级数据。