3D reconstruction of depth and motion from monocular video in dynamic environments is a highly ill-posed problem due to scale ambiguities when projecting to the 2D image domain. In this work, we investigate the performance of the current State-of-the-Art (SotA) deep multi-view systems in such environments. We find that current supervised methods work surprisingly well despite not modelling individual object motions, but make systematic errors due to a lack of dense ground truth data. To detect such errors during usage, we extend the cost volume based Deep Video to Depth (DeepV2D) framework \cite{teed2018deepv2d} with a learned uncertainty. Our Deep Video to certain Depth (DeepV2cD) model allows i) to perform en par or better with current SotA and ii) achieve a better uncertainty measure than the naive Shannon entropy. Our experiments show that a simple filter strategy based on the uncertainty can significantly reduce systematic errors. This results in cleaner reconstructions both on static and dynamic parts of the scene.
翻译:3D 在动态环境中从单视视频重建深度和运动深度是一个高度错误的问题, 原因是在投射 2D 图像域时, 范围模糊。 在这项工作中, 我们调查了当前最先进的( SotA) 深多视图系统在这种环境中的性能。 我们发现, 当前受监督的方法非常有效, 尽管没有模拟单个物体动作, 但是由于缺少密集的地面真实数据而造成系统错误。 为了在使用过程中发现这种错误, 我们将基于深视频( 深V2D) 框架的成本量扩大到深度( 深V2D) 框架\ cite{teed 2018deepv2d}, 并具有一定的不确定性。 我们的深度( DeepV2cD) 模型允许i) 与当前的SotA 和 (ii) 相比, 实现更好的不确定性测量。 我们的实验显示, 基于不确定性的简单过滤战略可以显著减少系统错误。 这导致在现场静态和动态部分进行更清洁的重建。