We introduce a lightweight network to improve descriptors of keypoints within the same image. The network takes the original descriptors and the geometric properties of keypoints as the input, and uses an MLP-based self-boosting stage and a Transformer-based cross-boosting stage to enhance the descriptors. The boosted descriptors can be either real-valued or binary ones. We use the proposed network to boost both hand-crafted (ORB, SIFT) and the state-of-the-art learning-based descriptors (SuperPoint, ALIKE) and evaluate them on image matching, visual localization, and structure-from-motion tasks. The results show that our method significantly improves the performance of each task, particularly in challenging cases such as large illumination changes or repetitive patterns. Our method requires only 3.2ms on desktop GPU and 27ms on embedded GPU to process 2000 features, which is fast enough to be applied to a practical system. The code and trained weights are publicly available at github.com/SJTU-ViSYS/FeatureBooster.
翻译:我们引入了一种轻量级的网络来改善图像中关键点的描述符,该网络以原始描述符和关键点的几何属性作为输入,并使用基于MLP的自我增强阶段和基于Transformer的交叉增强阶段来增强描述符。增强后的描述符可以是浮点值或二进制值。我们使用所提出的网络来提升手工制作的(ORB,SIFT)和最先进的基于学习的描述符(SuperPoint,ALIKE),并在图像匹配,视觉定位和结构从动任务上进行评估。结果表明,我们的方法显着提高了每个任务的性能,特别是在大的光照变化或重复模式等具有挑战性的情况下。我们的方法仅需要在台式机GPU上处理2000个特征即可在3.2ms内完成,在嵌入式GPU上只需27ms,足够快速,可应用于实际系统。代码和训练权重已经公开发布在github.com/SJTU-ViSYS/FeatureBooster上。