Efficient detection and description of geometric regions in images is a prerequisite in visual systems for localization and mapping. Such systems still rely on traditional hand-crafted methods for efficient generation of lightweight descriptors, a common limitation of the more powerful neural network models that come with high compute and specific hardware requirements. In this paper, we focus on the adaptations required by detection and description neural networks to enable their use in computationally limited platforms such as robots, mobile, and augmented reality devices. To that end, we investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms. In addition, we revisit common practices in descriptor quantization and propose the use of a binary descriptor normalization layer, enabling the generation of distinctive binary descriptors with a constant number of ones. ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size, by at least an order of magnitude when compared to full-precision counterparts. These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization. Code and models are available at https://github.com/menelaoskanakis/ZippyPoint.
翻译:在图像中高效地检测和描述几何区域是定位和映射视觉系统的先决条件。这样的系统仍然依赖于传统的手工制作方法来有效生成轻量级描述符,这是更强大的神经网络模型的共同限制,这些模型具有高计算和特定硬件要求。在本文中,我们关注了检测和描述神经网络的调整,以使它们能够在计算受限的平台上使用,例如机器人、移动和增强现实设备。为此,我们调查并适应了网络量化技术来加速推理并在计算受限的平台上启用其使用。此外,我们重新审视了描述符量化的常见做法,并提出了使用二元描述符归一化层的方法,使得可以生成具有恒定数量的1的独特二进制描述符。我们的高效量化网络ZippyPoint,具有二进制描述符,当与全精度对应物相比时,可以将网络运行时速度、描述符匹配速度和3D模型大小提高至少一个数量级。在估计单应矩阵、视觉定位和无地图视觉重定位任务的性能评估中,这些改进带来了轻微的性能下降。代码和模型可在https://github.com/menelaoskanakis/ZippyPoint中获取。