用于移动ARA的即时视觉观察测量初始化 (Instant Visual Odometry Initialization for Mobile AR)

Mobile AR applications benefit from fast initialization to display world-locked effects instantly. However, standard visual odometry or SLAM algorithms require motion parallax to initialize (see Figure 1) and, therefore, suffer from delayed initialization. In this paper, we present a 6-DoF monocular visual odometry that initializes instantly and without motion parallax. Our main contribution is a pose estimator that decouples estimating the 5-DoF relative rotation and translation direction from the 1-DoF translation magnitude. While scale is not observable in a monocular vision-only setting, it is still paramount to estimate a consistent scale over the whole trajectory (even if not physically accurate) to avoid AR effects moving erroneously along depth. In our approach, we leverage the fact that depth errors are not perceivable to the user during rotation-only motion. However, as the user starts translating the device, depth becomes perceivable and so does the capability to estimate consistent scale. Our proposed algorithm naturally transitions between these two modes. We perform extensive validations of our contributions with both a publicly available dataset and synthetic data. We show that the proposed pose estimator outperforms the classical approaches for 6-DoF pose estimation used in the literature in low-parallax configurations. We release a dataset for the relative pose problem using real data to facilitate the comparison with future solutions for the relative pose problem. Our solution is either used as a full odometry or as a preSLAM component of any supported SLAM system (ARKit, ARCore) in world-locked AR effects on platforms such as Instagram and Facebook.

翻译：移动 AR 应用程序从快速初始化中受益, 以立即显示世界效应。但是, 标准视觉测量或 SLAM 算法需要运动抛光器才能初始化( 见图1), 因此, 启动时间会延迟。在本文中, 我们展示了一个 6 - DoF 单眼视觉观察测量仪, 即即刻初始化, 而没有运动parallax 。我们的主要贡献是一个显示显示器, 即从1- DoF 翻译规模来估计 5 - DoF 相对旋转和翻译方向。虽然标准视觉测量仪或 SLAM 算法无法在单眼环境中观察到, 但是, 估计整个轨迹( 即使不是物理精确, 也需要估计) 仍然至关重要, 要估计出整个轨迹的一致规模( 即使不是物理精确精确, 也需要估计) 以避免ARF 效果误差。在我们的方法中, 我们利用了深度错误错误, 当用户开始翻译设备时, 深度, 就可以测深度,, 使 Facebook 系统自然转换。我们使用用于 IMRA 系统的的,, 将系统用于的格式的以格式以格式格式的以格式, 作为格式格式的, 的以格式格式格式, 作为格式格式格式的的格式的格式的格式格式的的的的, 格式格式格式的的用于的的格式格式的的的的的的。