We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network. The former one predicts depth maps of the target views by using adaptive depth scaling, while the latter one leverages the predicted depths and renders spatially and temporally consistent target images. In the experimental evaluation on standard datasets, RGBD-Net not only outperforms the state-of-the-art by a clear margin, but it also generalizes well to new scenes without per-scene optimization. Moreover, we show that RGBD-Net can be optionally trained without depth supervision while still retaining high-quality rendering. Thanks to the depth regression network, RGBD-Net can be also used for creating dense 3D point clouds that are more accurate than those produced by some state-of-the-art multi-view stereo methods.
翻译:我们提出了一个新的新观点合成级联结构,称为RGBD-Net,由两个核心组成部分组成:一个等级深度回归网络和一个深觉生成网络。前者通过使用适应深度缩放来预测目标视图的深度地图,而后者则利用预测深度和空间与时间一致性的目标图像。在对标准数据集的实验性评估中,RGBD-Net不仅以明确的边距优于最新数据,而且还在没有每个屏幕优化的情况下向新场景推广。此外,我们表明,RGBD-Net可以在没有深度监督的情况下接受可选的培训,同时保留高质量的图像。由于深度回归网络,RGBD-Net还可以用来制造比一些最先进的多视立体立体方法更精确的密度三维点云。