Recent breakthroughs in text-guided image generation have led to remarkable progress in the field of 3D synthesis from text. By optimizing neural radiance fields (NeRF) directly from text, recent methods are able to produce remarkable results. Yet, these methods are limited in their control of each object's placement or appearance, as they represent the scene as a whole. This can be a major issue in scenarios that require refining or manipulating objects in the scene. To remedy this deficit, we propose a novel GlobalLocal training framework for synthesizing a 3D scene using object proxies. A proxy represents the object's placement in the generated scene and optionally defines its coarse geometry. The key to our approach is to represent each object as an independent NeRF. We alternate between optimizing each NeRF on its own and as part of the full scene. Thus, a complete representation of each object can be learned, while also creating a harmonious scene with style and lighting match. We show that using proxies allows a wide variety of editing options, such as adjusting the placement of each independent object, removing objects from a scene, or refining an object. Our results show that Set-the-Scene offers a powerful solution for scene synthesis and manipulation, filling a crucial gap in controllable text-to-3D synthesis.
翻译:最近在文本引导的图像生成方面取得了突破性进展,这已经在3D合成领域产生了显著的进步。通过直接从文本中优化神经辐射场(NeRF),最近的方法能够产生出色的结果。然而,这些方法在控制每个对象的放置或外观方面受到限制,因为它们以整个场景来表示场景。这可能是在需要调整或操作场景中的对象时的一个主要问题。为了解决这一问题,我们提出了一种用于使用对象代理进行3D场景合成的全局-局部训练框架。代理表示生成场景中物体的位置,并可选地定义其粗略几何形状。我们方法的关键是将每个对象表示为独立的NeRF。我们在各自优化每个NeRF和完整场景中优化每个NeRF之间交替。因此,可以学习到每个对象的完整表示,并创建具有风格和照明匹配的和谐场景。我们证明使用代理允许各种各样的编辑选项,例如调整每个独立对象的位置、从场景中删除对象或精细调整对象。我们的结果表明,Set-the-Scene提供了一个强大的场景合成和操作解决方案,填补了可控的文本到3D合成中的关键空白。