In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D layout-based approach for 3D synthesis and implementing a new 3D field representation with a stronger geometric inductive bias, we have created a 3D GAN that is both efficient and of high quality, while allowing for a more controllable generation process. Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality in comparison to previous works.
翻译:在这项工作中,我们介绍了 CC3D,这是一种有条件的生成模型,可以根据2D语义场景布局合成复杂的3D场景,并使用单视角图像进行训练。与大多数现有的3D GAN不同,它们将应用范围限制在对齐的单个物体上,我们专注于通过建模3D场景的组合性质来生成具有多个物体的复杂场景。通过设计基于2D布局的3D合成方法,并实现具有更强几何归纳偏差的新的3D场表示,我们创建了一种既高效又高质量的3D GAN,同时允许更可控的生成过程。我们在合成的3D-FRONT和真实的KITTI-360数据集上的评估结果表明,与以前的作品相比,我们的模型生成了更具有视觉和几何质量的场景。