Modern deep learning techniques that regress the relative camera pose between two images have difficulty dealing with challenging scenarios, such as large camera motions resulting in occlusions and significant changes in perspective that leave little overlap between images. These models continue to struggle even with the benefit of large supervised training datasets. To address the limitations of these models, we take inspiration from techniques that show regressing keypoint locations in 2D and 3D can be improved by estimating a discrete distribution over keypoint locations. Analogously, in this paper we explore improving camera pose regression by instead predicting a discrete distribution over camera poses. To realize this idea, we introduce DirectionNet, which estimates discrete distributions over the 5D relative pose space using a novel parameterization to make the estimation problem tractable. Specifically, DirectionNet factorizes relative camera pose, specified by a 3D rotation and a translation direction, into a set of 3D direction vectors. Since 3D directions can be identified with points on the sphere, DirectionNet estimates discrete distributions on the sphere as its output. We evaluate our model on challenging synthetic and real pose estimation datasets constructed from Matterport3D and InteriorNet. Promising results show a near 50% reduction in error over direct regression methods.
翻译:在两个图像之间反移相对相机的现代深层学习技术在应对具有挑战性的情景时遇到了困难,例如大型相机运动导致图像分离和视野的重大变化,使图像之间几乎没有重叠。这些模型继续挣扎,即使有大型受监督的培训数据集。为了解决这些模型的局限性,我们可以从显示2D和3D中回归关键点位置的技术中获取灵感,通过估计关键点位置上回归关键点位置的离散分布可以改进。与此类似,我们在本文件中探索通过预测离散分布来改善相机的回归。为了实现这一想法,我们引入了DirectNet,它利用新的参数来估计5D相对空间的离散分布,使估算问题可以可被移动。具体地说,DentNet将3D旋转和翻译方向指定的相对相机成一组3D方向矢量。由于3D方向可以与球点相匹配,DentalNet估计了球场上离散分布作为其输出。我们评估了从Mentriport3D和Internet中构建的具有挑战性的合成和真实面图像估计数据设置模型,我们评估了50的模型,从而缩小了直径方向。