In recent years, many publications showed that convolutional neural network based features can have a superior performance to engineered features. However, not much effort was taken so far to extract local features efficiently for a whole image. In this paper, we present an approach to compute patch-based local feature descriptors efficiently in presence of pooling and striding layers for whole images at once. Our approach is generic and can be applied to nearly all existing network architectures. This includes networks for all local feature extraction tasks like camera calibration, Patchmatching, optical flow estimation and stereo matching. In addition, our approach can be applied to other patch-based approaches like sliding window object detection and recognition. We complete our paper with a speed benchmark of popular CNN based feature extraction approaches applied on a whole image, with and without our speedup, and example code (for Torch) that shows how an arbitrary CNN architecture can be easily converted by our approach.
翻译:近些年来,许多出版物显示,以神经神经网络为基础的功能可以优于工程性能,然而,迄今为止,没有做出多大努力,为整个图像有效地提取本地特征。在本论文中,我们提出了一个方法,即时对整幅图像进行基于补丁的本地特征描述,同时对基于补丁的本地特征描述器进行高效率的计算。我们的方法是通用的,可以应用于几乎所有现有的网络结构。这包括所有本地特征提取任务的网络,如相机校准、补丁、光学流量估计和立体匹配。此外,我们的方法还可以适用于其他基于补补丁的方法,如滑动窗口对象的探测和识别。我们完成我们的论文时,以基于广受欢迎的CNN特征提取方法的快速基准为基础,在整个图像上应用,同时使用和不加快,以及示例代码(Torch),以显示我们的方法可以轻易地转换一个任意的CNN结构。