We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generative model that enables both unconditional and conditional generation of 3D scenes. Our model generalizes previous works that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples. We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene.
翻译:我们引入了GAUDI, 这是一种能够捕捉复杂和现实的 3D 场景分布的基因模型, 可以从移动的相机中浸泡出来。 我们用一个可扩缩但强大的方法来应对这个挑战性的问题, 我们首先优化一个潜在代表形式, 分解亮光场和摄像头。 这个潜在代表形式被用来学习一个能够无条件和有条件地生成 3D 场景的基因模型。 我们的模型概括了以前侧重于单个对象的工作, 排除了摄影机构成分布在样本中共享的假设。 我们显示, GAUDI 在多个数据集的无条件基因化设置中获得了最先进的性能, 并且允许有条件生成 3D 场景, 并给出像微小的图像观察或描述场景的文字这样的变数。