We present a novel method for performing flexible, 3D-aware image content manipulation while enabling high-quality novel view synthesis. While NeRF-based approaches are effective for novel view synthesis, such models memorize the radiance for every point in a scene within a neural network. Since these models are scene-specific and lack a 3D scene representation, classical editing such as shape manipulation, or combining scenes is not possible. Hence, editing and combining NeRF-based scenes has not been demonstrated. With the aim of obtaining interpretable and controllable scene representations, our model couples learnt scene-specific feature volumes with a scene agnostic neural rendering network. With this hybrid representation, we decouple neural rendering from scene-specific geometry and appearance. We can generalize to novel scenes by optimizing only the scene-specific 3D feature representation, while keeping the parameters of the rendering network fixed. The rendering function learnt during the initial training stage can thus be easily applied to new scenes, making our approach more flexible. More importantly, since the feature volumes are independent of the rendering model, we can manipulate and combine scenes by editing their corresponding feature volumes. The edited volume can then be plugged into the rendering model to synthesize high-quality novel views. We demonstrate various scene manipulations, including mixing scenes, deforming objects and inserting objects into scenes, while still producing photo-realistic results.
翻译:我们展示了一种创新方法,用于进行灵活、3D-觉察到的图像内容操作,同时促成高质量的新观点合成。虽然基于NERF的模型方法对新观点合成有效,但这类模型对神经网络中每个场点的亮度进行记忆化。由于这些模型是针对特定场景的,缺乏3D场景代表,因此不可能进行3D场景代表,像形状操纵这样的经典编辑或组合场景。因此,没有展示出基于NERF的场景的经典编辑和组合。为了获得可解释和控制的场景演示,我们的模范夫妇学习了特定场景的特异功能,并建立了场景神经神经化网络。有了这种混合代表,我们就能将场景的线状从特定几何和外观中分离出来。我们可以通过只优化特定场景的3D地貌代表来概括新场景,同时保持3D场景的参数固定。因此,最初培训阶段所学的演化功能可以很容易应用到新的场景,使我们的方法更加灵活。更重要的是,由于特质量是独立的模型,我们可以对场景进行调和合并,通过对场景进行场景的模型进行调和组合,我们能够将图像进行高质化,然后将整,然后将各种图像进行演示制成。