We introduce a lightweight network to improve descriptors of keypoints within the same image. The network takes the original descriptors and the geometric properties of keypoints as the input, and uses an MLP-based self-boosting stage and a Transformer-based cross-boosting stage to enhance the descriptors. The boosted descriptors can be either real-valued or binary ones. We use the proposed network to boost both hand-crafted (ORB, SIFT) and the state-of-the-art learning-based descriptors (SuperPoint, ALIKE) and evaluate them on image matching, visual localization, and structure-from-motion tasks. The results show that our method significantly improves the performance of each task, particularly in challenging cases such as large illumination changes or repetitive patterns. Our method requires only 3.2ms on desktop GPU and 27ms on embedded GPU to process 2000 features, which is fast enough to be applied to a practical system. The code and trained weights are publicly available at github.com/SJTU-ViSYS/FeatureBooster.
翻译:我们引入一种轻量级网络以改进同一图像内的关键点描述符。该网络将原始描述符和关键点的几何属性作为输入,并使用基于MLP的自我增强阶段和基于Transformer的交叉增强阶段来增强描述符。增强的描述符可以是实值的或二进制的。我们使用所提出的网络来增强手工制作的(ORB,SIFT)和最先进的学习型描述符(SuperPoint,ALIKE),并在图像匹配,视觉定位和三维重建任务上进行评估。结果表明,我们的方法显著改善了每个任务的性能,特别是在大光照变化或重复模式等具有挑战性的情况下。我们的方法仅需要在桌面GPU上花费3.2毫秒和在嵌入式GPU上花费27毫秒来处理2000个特征,足以应用于实际系统。代码和训练权重已公开在github.com/SJTU-ViSYS/FeatureBooster上提供。