LoFTR arXiv:2104.00680 is an efficient deep learning method for finding appropriate local feature matches on image pairs. This paper reports on the optimization of this method to work on devices with low computational performance and limited memory. The original LoFTR approach is based on a ResNet arXiv:1512.03385 head and two modules based on Linear Transformer arXiv:2006.04768 architecture. In the presented work, only the coarse-matching block was left, the number of parameters was significantly reduced, and the network was trained using a knowledge distillation technique. The comparison showed that this approach allows to obtain an appropriate feature detection accuracy for the student model compared to the teacher model in the coarse matching block, despite the significant reduction of model size. Also, the paper shows additional steps required to make model compatible with NVIDIA TensorRT runtime, and shows an approach to optimize training method for low-end GPUs.
翻译:LoFTR arXiv: 2104.00680 是一种高效的深层次学习方法,用于寻找图像配对上合适的本地特性匹配。 本文报告了在计算性能低和内存有限的设备上优化使用该方法的优化。 原始 LoFTR 方法基于 ResNet arXiv: 1512. 03385 头和基于 Lineartar 变换器 arXiv: 2006. 04768 结构的两个模块。 在所介绍的工作中,仅留下了粗略配对齐块,参数数量显著减少,并且网络使用知识蒸馏技术进行了培训。 比较表明,尽管模型大小显著缩小,但该方法仍能够使学生模型与粗皮配对区教师模型相比,学生模型获得适当的特征检测精度。 此外, 论文还展示了使模型与 NVIDIA TensorRT 运行时间兼容所需的其他步骤, 并展示了优化低端 GPUS培训方法的方法。