We propose an online multi-view depth prediction approach on posed video streams, where the scene geometry information computed in the previous time steps is propagated to the current time step in an efficient and geometrically plausible way. The backbone of our approach is a real-time capable, lightweight encoder-decoder that relies on cost volumes computed from pairs of images. We extend it by placing a ConvLSTM cell at the bottleneck layer, which compresses an arbitrary amount of past information in its states. The novelty lies in propagating the hidden state of the cell by accounting for the viewpoint changes between time steps. At a given time step, we warp the previous hidden state into the current camera plane using the previous depth prediction. Our extension brings only a small overhead of computation time and memory consumption, while improving the depth predictions significantly. As a result, we outperform the existing state-of-the-art multi-view stereo methods on most of the evaluated metrics in hundreds of indoor scenes while maintaining a real-time performance. Code available: https://github.com/ardaduz/deep-video-mvs
翻译:我们建议对成型视频流采用在线多视图深度预测方法,将先前步骤所计算的场景几何学信息以高效和几何貌似合理的方式传播到当前时间步骤。我们的方法之主是一个实时的、轻巧的编码器解码器,它依赖于从一对图像中计算的成本量。我们通过在瓶颈层放置CONLSTM细胞来扩展它,该细胞将过去的信息任意地压缩到其状态。新颖之处在于通过计算时间步骤之间的视图变化来传播细胞的隐藏状态。在一个特定步骤中,我们利用以前的深度预测将以前的隐藏状态转换到目前的照相机上。我们的扩展仅带来一个小的计算时间和记忆消耗的间接,同时大大改进深度预测。结果,我们在数百个室内场上对大多数经过评估的计量都采用了超前状态的多视角立体法,同时保持实时性能。代码:https://github.comaduz/deep-vical-mvs。