Text-guided diffusion models have shown superior performance in image/video generation and editing. While few explorations have been performed in 3D scenarios. In this paper, we discuss three fundamental and interesting problems on this topic. First, we equip text-guided diffusion models to achieve \textbf{3D-consistent generation}. Specifically, we integrate a NeRF-like neural field to generate low-resolution coarse results for a given camera view. Such results can provide 3D priors as condition information for the following diffusion process. During denoising diffusion, we further enhance the 3D consistency by modeling cross-view correspondences with a novel two-stream (corresponding to two different views) asynchronous diffusion process. Second, we study \textbf{3D local editing} and propose a two-step solution that can generate 360$^{\circ}$ manipulated results by editing an object from a single view. Step 1, we propose to perform 2D local editing by blending the predicted noises. Step 2, we conduct a noise-to-text inversion process that maps 2D blended noises into the view-independent text embedding space. Once the corresponding text embedding is obtained, 360$^{\circ}$ images can be generated. Last but not least, we extend our model to perform \textbf{one-shot novel view synthesis} by fine-tuning on a single image, firstly showing the potential of leveraging text guidance for novel view synthesis. Extensive experiments and various applications show the prowess of our 3DDesigner. Project page is available at \url{https://3ddesigner-diffusion.github.io/}.
翻译:文本制导的传播模型在图像/ 视频生成和编辑中表现优异。 { 虽然在 3D 情景中很少进行探索 。 在本文中, 我们讨论三个基本和有趣的问题 。 首先, 我们装备了文本制导的传播模型, 以实现\ textbf{ 3D- concistent 生成} 。 具体地说, 我们整合了一个类似 NeRF 的神经字段, 以生成一个特定相机视图的低分辨率粗缩结果。 这些结果可以提供 3D 前缀, 作为以下传播进程的条件信息 。 在分解 扩散过程中, 我们通过以新颖的双流( 响应两种不同的观点) 建模交叉视图来进一步加强 3D 一致性 。 第二, 我们研究\ textb{ 3D 本地编辑} 并提议一个两步解决方案, 通过编辑一个对象来生成 360$ { {circrc} 的低分辨率结果。 第一步, 我们提议通过混合预测的噪音来进行本地编辑 。 步骤2} 我们进行噪式到的缩略图的文本浏览查看进程, 将显示 3xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx