We introduce FaDIV-Syn, a fast depth-independent view synthesis method. Our multi-view approach addresses the problem that view synthesis methods are often limited by their depth estimation stage, where incorrect depth predictions can lead to large projection errors. To avoid this issue, we efficiently warp multiple input images into the target frame for a range of assumed depth planes. The resulting tensor representation is fed into a U-Net-like CNN with gated convolutions, which directly produces the novel output view. We therefore side-step explicit depth estimation. This improves efficiency and performance on transparent, reflective, and feature-less scene parts. FaDIV-Syn can handle both interpolation and extrapolation tasks and outperforms state-of-the-art extrapolation methods on the large-scale RealEstate10k dataset. In contrast to comparable methods, it is capable of real-time operation due to its lightweight architecture. We further demonstrate data efficiency of FaDIV-Syn by training from fewer examples as well as its generalization to higher resolutions and arbitrary depth ranges under severe depth discretization.
翻译:我们引入了快速独立深度视图合成法FADIV-Syn。 我们的多视角方法解决了一个问题,认为合成方法往往因其深度估计阶段而受到限制,不正确的深度预测可能导致大预测错误。 为避免这一问题,我们高效地将多个输入图像转换到一系列假设深度平面的目标框中。 由此产生的“ 压力代表”被输入到一个像U-Net-CNN, 配有门形的组合, 直接生成新的输出视图。 因此, 我们侧向明确的深度估计。 这提高了透明、 反射和无地貌的场景部分的效率和性能。 FADIV-Syn既能处理内插和外推任务,又在大规模RealEstate10k数据集上形成最先进的外推法。 与类似的方法相比,它能够由于较轻的架构而实时操作。 我们进一步通过从较少的例子以及其一般化到更高分辨率和深度深度的任意深度范围来显示 FADIV-Syn的数据效率。