Deep-learning-based local feature extraction algorithms that combine detection and description have made significant progress in visible image matching. However, the end-to-end training of such frameworks is notoriously unstable due to the lack of strong supervision of detection and the inappropriate coupling between detection and description. The problem is magnified in cross-modal scenarios, in which most methods heavily rely on the pre-training. In this paper, we recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy, in which the detected probabilities of robust features are forced to peak and repeat, while features with high detection scores are emphasized during optimization. Different from previous works, those weights are detached from back propagation so that the detected probability of indistinct features would not be directly suppressed and the training would be more stable. Moreover, we propose the Super Detector, a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers, to fulfill the harsh terms of detection. Finally, we build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks. Extensive experiments demonstrate that features trained with the recoulped detection and description, named ReDFeat, surpass previous state-of-the-arts in the benchmark, while the model can be readily trained from scratch.
翻译:然而,由于缺乏对检测的有力监督以及检测和描述之间的不适当结合,这类框架的端对端培训臭名昭著地不稳定,因为缺乏对检测的严密监督以及检测和描述之间的不适当结合。问题在跨模式情景中更为突出,大多数方法在很大程度上依赖培训前期。在本文件中,我们用一种相互加权的战略,对检测和描述多式联运特征学习的单独限制进行对比,其中发现强性特征的概率被迫达到高峰和重复,同时在优化期间强调高检测分数的特征。与以往的工作不同,这些重量与后传播脱钩,以便不直接抑制所检测到的模糊特征的概率,而培训将更加稳定。此外,我们提议超级探测器,它拥有一个大面积的开放领域,并具备可学习的非最大抑制层,以达到严格的检测条件。最后,我们建立了一个基准,其中包括可见性、红外、近红外和合成的雷达图像配对等,用以评估经培训的特征特征的后期传播,同时展示以往的图像检测和升级的图像记录,同时展示经培训的地面测量和升级的图像记录和升级的图像记录。