利用人类视觉识别机制改进物体探测的性能 (Improving Performance of Object Detection using the Mechanisms of Visual Recognition in Humans)

Object recognition systems are usually trained and evaluated on high resolution images. However, in real world applications, it is common that the images have low resolutions or have small sizes. In this study, we first track the performance of the state-of-the-art deep object recognition network, Faster- RCNN, as a function of image resolution. The results reveals negative effects of low resolution images on recognition performance. They also show that different spatial frequencies convey different information about the objects in recognition process. It means multi-resolution recognition system can provides better insight into optimal selection of features that results in better recognition of objects. This is similar to the mechanisms of the human visual systems that are able to implement multi-scale representation of a visual scene simultaneously. Then, we propose a multi-resolution object recognition framework rather than a single-resolution network. The proposed framework is evaluated on the PASCAL VOC2007 database. The experimental results show the performance of our adapted multi-resolution Faster-RCNN framework outperforms the single-resolution Faster-RCNN on input images with various resolutions with an increase in the mean Average Precision (mAP) of 9.14% across all resolutions and 1.2% on the full-spectrum images. Furthermore, the proposed model yields robustness of the performance over a wide range of spatial frequencies.

翻译：对象识别系统通常是在高分辨率图像上进行培训和评估。但是,在现实世界应用中,图像的分辨率较低或大小较小,这是很常见的。在本研究中,我们首先跟踪最先进的深度对象识别网络的性能,即快速- RCNN,作为图像分辨率的函数。结果显示低分辨率图像对识别性能的消极影响。结果还显示,不同空间频率在识别过程中传递关于对象的不同信息。这意味着多分辨率识别系统可以更好地了解最佳地选择能够更好地识别对象的特征。这类似于能够同时执行多比例图像场景的人类视觉系统机制。随后,我们提出了一个多分辨率目标识别框架,而不是单一分辨率网络。在 PASCAL VOC2007 数据库中评估了拟议框架。实验结果显示,我们经调整的多分辨率快速- RCNNN框架的性能超过了单分辨率,而输入图像的速率则高于各种分辨率。这类似于能够同时执行多比例图像的人类视觉系统机制。随后,我们提出了一个多分辨率目标识别框架,而不是单一分辨率网络。在所有分辨率上,将显示一个强度的完整图像。