We propose GeoNet, a jointly unsupervised learning framework for monocular depth, optical flow and ego-motion estimation from videos. The three components are coupled by the nature of 3D scene geometry, jointly learned by our framework in an end-to-end manner. Specifically, geometric relationships are extracted over the predictions of individual modules and then combined as an image reconstruction loss, reasoning about static and dynamic scene parts separately. Furthermore, we propose an adaptive geometric consistency loss to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively. Experimentation on the KITTI driving dataset reveals that our scheme achieves state-of-the-art results in all of the three tasks, performing better than previously unsupervised methods and comparably with supervised ones.
翻译:我们提出GeoNet,这是一个无人监督的单眼深度、光学流和自我感知的视频联合学习框架。三个组成部分结合了三维场景几何学的性质,由我们的框架以端至端的方式共同学习。具体地说,几何关系通过单个模块的预测进行提取,然后作为图像重建损失、关于静态和动态场景部分的推理分别进行合并。此外,我们提出一个适应性几何一致性损失,以提高外部区域和非地中海区域的稳健性,从而有效地解决隐蔽性和纹理模糊性。对KITTI驱动数据集的实验表明,我们的计划在所有三项任务中都取得了最先进的结果,比以前未受监督的方法要好,并且与受监督的方法相容。