We introduce 3inGAN, an unconditional 3D generative model trained from 2D images of a single self-similar 3D scene. Such a model can be used to produce 3D "remixes" of a given scene, by mapping spatial latent codes into a 3D volumetric representation, which can subsequently be rendered from arbitrary views using physically based volume rendering. By construction, the generated scenes remain view-consistent across arbitrary camera configurations, without any flickering or spatio-temporal artifacts. During training, we employ a combination of 2D, obtained through differentiable volume tracing, and 3D Generative Adversarial Network (GAN) losses, across multiple scales, enforcing realism on both its 3D structure and the 2D renderings. We show results on semi-stochastic scenes of varying scale and complexity, obtained from real and synthetic sources. We demonstrate, for the first time, the feasibility of learning plausible view-consistent 3D scene variations from a single exemplar scene and provide qualitative and quantitative comparisons against recent related methods.
翻译:我们引入了3inGAN, 这是一种由单一自我相似的3D场景的 2D 图像所训练的无条件的3D 基因模型。 这样的模型可以用来生成特定场景的 3D “ 组合 ”, 将空间潜伏代码映射成一个 3D 体积图解, 并随后通过基于物理的体积图解的任意观点进行。 通过构建, 生成的场景在任意的摄像结构中保持视觉一致, 没有任何闪烁或时空工艺品 。 在培训过程中, 我们结合了 2D, 通过不同体积的追踪获得的2D 和 3D 基因反变形网络( GAN) 损失, 并跨越多个尺度, 在3D 体积结构和 2D 图像中实施现实主义 。 我们展示了从真实和合成来源获得的、 不同规模和复杂度的半随机场景的结果。 我们首次展示了从单一外景区区区学习合理视觉和立体的3D 场景变的可行性, 并提供与最近相关方法的定性和定量比较。