Estimating the pose of a moving camera from monocular video is a challenging problem, especially due to the presence of moving objects in dynamic environments, where the performance of existing camera pose estimation methods are susceptible to pixels that are not geometrically consistent. To tackle this challenge, we present a robust dense indirect structure-from-motion method for videos that is based on dense correspondence initialized from pairwise optical flow. Our key idea is to optimize long-range video correspondence as dense point trajectories and use it to learn robust estimation of motion segmentation. A novel neural network architecture is proposed for processing irregular point trajectory data. Camera poses are then estimated and optimized with global bundle adjustment over the portion of long-range point trajectories that are classified as static. Experiments on MPI Sintel dataset show that our system produces significantly more accurate camera trajectories compared to existing state-of-the-art methods. In addition, our method is able to retain reasonable accuracy of camera poses on fully static scenes, which consistently outperforms strong state-of-the-art dense correspondence based methods with end-to-end deep learning, demonstrating the potential of dense indirect methods based on optical flow and point trajectories. As the point trajectory representation is general, we further present results and comparisons on in-the-wild monocular videos with complex motion of dynamic objects. Code is available at https://github.com/bytedance/particle-sfm.
翻译:从单视视频中估计移动相机的外形是一个具有挑战性的问题,特别是由于在动态环境中存在移动物体,现有相机的性能估计方法很容易被不具有几何一致性的像素所左右。为了应对这一挑战,我们为视频提出了一种强大的密集间接结构-从移动结构方法,该方法的基础是从对称光学流开始的密集通信。我们的关键想法是将远程视频通信优化为密集点轨迹,并利用它来学习对运动截断的可靠估计。为处理不正常点轨迹数据提出了一个新的神经网络结构。然后,对相机的外形进行了估计,并优化了全球包装调整,以覆盖被归类为静态的远程点轨迹部分。对 MPI Sintel数据集的实验表明,我们的系统生成的摄像定轨迹比现有的最新光速流要准确得多。 此外,我们的方法能够保持完全静止场景的摄像器配置的合理准确性,这些图像始终优于强的状态-基于端点轨迹的直径直径通信方法。我们从正向至端的轨道上学习了普通的直径流和直径直径的直径路路路路径。