This work addresses cross-view camera pose estimation, i.e., determining the 3-DoF camera pose of a given ground-level image w.r.t. an aerial image of the local area. We propose SliceMatch, which consists of ground and aerial feature extractors, feature aggregators, and a pose predictor. The feature extractors extract dense features from the ground and aerial images. Given a set of candidate camera poses, the feature aggregators construct a single ground descriptor and a set of rotational equivariant pose-dependent aerial descriptors. Notably, our novel aerial feature aggregator has a cross-view attention module for ground-view guided aerial feature selection, and utilizes the geometric projection of the ground camera's viewing frustum on the aerial image to pool features. The efficient construction of aerial descriptors is achieved by using precomputed masks and by re-assembling the aerial descriptors for rotated poses. SliceMatch is trained using contrastive learning and pose estimation is formulated as a similarity comparison between the ground descriptor and the aerial descriptors. SliceMatch outperforms the state-of-the-art by 19% and 62% in median localization error on the VIGOR and KITTI datasets, with 3x FPS of the fastest baseline.
翻译:这项工作涉及跨视图相机显示估计, 即, 确定一个特定地面图像的3- DoF 相机的3- DoF 摄像布局, w.r.t. t. 是一个局部地区的空中图像。 我们提议 SliceMatch, 由地面和空中地貌提取器、 特征聚合器和 表面预测器组成。 特征提取器从地面和空中和空中图像中提取稠密的特征。 鉴于一组候选相机的布局, 特征聚合器构建了一个单一的地面描述器和一套旋转等同的视距空基描述器。 值得注意的是, 我们新的航空特征聚合器有一个透视模块, 用于地面视图引导空中地貌选择, 并使用地面摄像机的几何定位仪的透视图, 将空中图像的断断裂图作为地面描述器和空中定序中位的 VI 3 和空中定序中位的 VI SI 和空中定序 VI 3 和空中定序中位 VI 。