While 2D generative adversarial networks have enabled high-resolution image synthesis, they largely lack an understanding of the 3D world and the image formation process. Thus, they do not provide precise control over camera viewpoint or object pose. To address this problem, several recent approaches leverage intermediate voxel-based representations in combination with differentiable rendering. However, existing methods either produce low image resolution or fall short in disentangling camera and scene properties, e.g., the object identity may vary with the viewpoint. In this paper, we propose a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene. In contrast to voxel-based representations, radiance fields are not confined to a coarse discretization of the 3D space, yet allow for disentangling camera and scene properties while degrading gracefully in the presence of reconstruction ambiguity. By introducing a multi-scale patch-based discriminator, we demonstrate synthesis of high-resolution images while training our model from unposed 2D images alone. We systematically analyze our approach on several challenging synthetic and real-world datasets. Our experiments reveal that radiance fields are a powerful representation for generative image synthesis, leading to 3D consistent models that render with high fidelity.
翻译:虽然2D基因对抗网络能够进行高分辨率图像合成,但基本上缺乏对3D世界和图像形成过程的了解,因此,它们无法准确控制相机的视角或对象构成。为了解决这一问题,最近的一些方法利用了中间的 voxel 表达方式,同时可作出不同的表达方式。然而,现有的方法要么产生低图像分辨率,要么在脱钩的相机和场景属性中显得差强人意,例如,对象特征可能随观点的不同而不同。在本文件中,我们为最近证明在对单一场景进行新颖的合成视图合成方面成功的亮点提出了一种亮点模型。与基于 voxel 的表示方式不同,光点并不局限于对3D 空间的粗略离散化,而是允许断开的相机和场景特性,同时在重建的模糊性面前优雅地贬低。通过引入一个多尺度的、基于偏差的区分式的辨别器,我们展示高分辨率图像的合成,同时将我们的模型从未保存的 2D 图像单项加以培训。我们系统地分析了我们在若干具有挑战性的合成和现实世界数据集集方面采用的方法。我们的实验表明,光谱化领域具有很强的模型。