We propose DFPNet -- an unsupervised, joint learning system for monocular Depth, Optical Flow and egomotion (Camera Pose) estimation from monocular image sequences. Due to the nature of 3D scene geometry these three components are coupled. We leverage this fact to jointly train all the three components in an end-to-end manner. A single composite loss function -- which involves image reconstruction-based loss for depth & optical flow, bidirectional consistency checks and smoothness loss components -- is used to train the network. Using hyperparameter tuning, we are able to reduce the model size to less than 5% (8.4M parameters) of state-of-the-art DFP models. Evaluation on KITTI and Cityscapes driving datasets reveals that our model achieves results comparable to state-of-the-art in all of the three tasks, even with the significantly smaller model size.
翻译:我们建议 DFPNet -- -- 一个不受监督的、共同学习单眼深度、光流和自动(Camera Pose)的单眼图像序列估算的系统。 由于三维场景的几何性质,这三个组成部分是结合的。 我们利用这个事实,以端到端的方式联合培训所有三个组成部分。 一个单一的综合损失功能 -- -- 包括基于图像的深度和光学流的重建损失、双向一致性检查和平稳损失部分 -- -- 用于培训网络。我们使用超参数调,能够将模型的大小减少到最先进的DFP模型的5%以下(8.4M参数)。对KITTI和城市景驱动数据集的评估显示,我们的模型取得的结果与所有三项任务中的最新数据相近,即使模型大小要小得多。