Virtual reality and augmented reality (XR) bring increasing demand for 3D content. However, creating high-quality 3D content requires tedious work that a human expert must do. In this work, we study the challenging task of lifting a single image to a 3D object and, for the first time, demonstrate the ability to generate a plausible 3D object with 360{\deg} views that correspond well with the given reference image. By conditioning on the reference image, our model can fulfill the everlasting curiosity for synthesizing novel views of objects from images. Our technique sheds light on a promising direction of easing the workflows for 3D artists and XR designers. We propose a novel framework, dubbed NeuralLift-360, that utilizes a depth-aware neural radiance representation (NeRF) and learns to craft the scene guided by denoising diffusion models. By introducing a ranking loss, our NeuralLift-360 can be guided with rough depth estimation in the wild. We also adopt a CLIP-guided sampling strategy for the diffusion prior to provide coherent guidance. Extensive experiments demonstrate that our NeuralLift-360 significantly outperforms existing state-of-the-art baselines. Project page: https://vita-group.github.io/NeuralLift-360/
翻译:虚拟现实和扩展现实( XR) 带来了对 3D 内容的不断增长的需求。 然而, 创建高质量的 3D 内容需要人类专家必须做的烦琐工作。 在这项工作中, 我们研究将一个图像提升到 3D 对象的艰巨任务。 我们研究将一个图像提升到 3D 对象的艰巨任务, 并首次展示出与给定的参考图像相匹配的360\deg}观点生成一个可信的 3D 对象的能力。 通过调整参考图像, 我们的模型可以满足对合成图像中对象的新观点的长期好奇心。 我们的技术为3D 艺术家和 XR 设计师的工作流程的通畅通指明了方向。 我们提出了一个新的框架, 称为 NeuralLift-360, 利用深度-aware神经光度代表( NERF), 并学习以非音化的传播模型来引导场景。 通过引入排序损失, 我们的NeuralLift-360 可以在野外进行粗略的估测。 我们还采用了CLIP 指导的推广战略, 提供一致的指导。 广泛的实验显示我们现有的 NealLLif- Neforal- Reformagroforma_ regrof) lif- pal brogrof