We propose a compact pipeline to unify all the steps of Visual Localization: image retrieval, candidate re-ranking and initial pose estimation, and camera pose refinement. Our key assumption is that the deep features used for these individual tasks share common characteristics, so we should reuse them in all the procedures of the pipeline. Our DRAN (Deep Retrieval and image Alignment Network) is able to extract global descriptors for efficient image retrieval, use intermediate hierarchical features to re-rank the retrieval list and produce an initial pose guess, which is finally refined by means of a feature-metric optimization based on learned deep multi-scale dense features. DRAN is the first single network able to produce the features for the three steps of visual localization. DRAN achieves competitive performance in terms of robustness and accuracy under challenging conditions in public benchmarks, outperforming other unified approaches and consuming lower computational and memory cost than its counterparts using multiple networks. Code and models will be publicly available at https://github.com/jmorlana/DRAN.
翻译:暂无翻译