Structure from motion algorithms have an inherent limitation that the reconstruction can only be determined up to the unknown scale factor. Modern mobile devices are equipped with an inertial measurement unit (IMU), which can be used for estimating the scale of the reconstruction. We propose a method that recovers the metric scale given inertial measurements and camera poses. In the process, we also perform a temporal and spatial alignment of the camera and the IMU. Therefore, our solution can be easily combined with any existing visual reconstruction software. The method can cope with noisy camera pose estimates, typically caused by motion blur or rolling shutter artifacts, via utilizing a Rauch-Tung-Striebel (RTS) smoother. Furthermore, the scale estimation is performed in the frequency domain, which provides more robustness to inaccurate sensor time stamps and noisy IMU samples than the previously used time domain representation. In contrast to previous methods, our approach has no parameters that need to be tuned for achieving a good performance. In the experiments, we show that the algorithm outperforms the state-of-the-art in both accuracy and convergence speed of the scale estimate. The accuracy of the scale is around $1\%$ from the ground truth depending on the recording. We also demonstrate that our method can improve the scale accuracy of the Project Tango's build-in motion tracking.
翻译:运动算法的结构有一个内在的限制,即重建只能确定到未知的尺度系数。现代移动设备配备了一个惯性测量单位(IMU),可用于估计重建的规模。我们建议一种方法,根据惯性测量和摄像器的配置,恢复尺度;在这个过程中,我们还对相机和IMU进行时间和空间的调整。因此,我们的解决方案可以很容易地与任何现有的视觉重建软件结合起来。该方法可以应付噪音相机带来的估计数,通常是由运动模糊或滚动的百叶窗工艺品引起的,其原因通常是使用一个光滑的Rauch-Tung-Striebel(RTS)来计算。此外,比例估算是在频率域进行的,它为不准确的传感器时间戳破和噪音IMU样品提供了比以前使用的时间域表示更稳健的方法。与以前的方法相比,我们的方法不需要调整来达到良好的性能。在实验中,我们显示算法在比例估计的准确性和趋同速度上都不符合现状。我们的项目的精确度的精确度可以从地面上显示我们的精确度。