Self-supervised monocular depth estimation has been a subject of intense study in recent years, because of its applications in robotics and autonomous driving. Much of the recent work focuses on improving depth estimation by increasing architecture complexity. This paper shows that state-of-the-art performance can also be achieved by improving the learning process rather than increasing model complexity. More specifically, we propose (i) disregarding small potentially dynamic objects when training, and (ii) employing an appearance-based approach to separately estimate object pose for truly dynamic objects. We demonstrate that these simplifications reduce GPU memory usage by 29% and result in qualitatively and quantitatively improved depth maps. The code is available at https://github.com/kieran514/Dyna-DM.
翻译:近年来,由于其在机器人和自动驾驶中的应用,自监督单目深度估计成为了研究的热点。最近的大部分工作都集中在通过增加架构复杂性来提高深度估计的精度。本文表明,通过改进学习过程而不是增加模型复杂性,也可以实现最先进的性能。具体来说,我们提出了(i)在训练时忽略小的潜在动态对象,以及(ii)采用基于外观的方法,单独估计对于真正动态的物体的对象姿态。我们证明这些简化方式可以将GPU内存使用降低29%,并可以得到在定性和定量上都有所改善的深度图像。该代码可在https://github.com/kieran514/Dyna-DM获得。