Image translation and manipulation have gain increasing attention along with the rapid development of deep generative models. Although existing approaches have brought impressive results, they mainly operated in 2D space. In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input. To kick-off this novel task, we propose the Sem2NeRF framework. In particular, Sem2NeRF addresses the highly challenging task by encoding the semantic mask into the latent code that controls the 3D scene representation of a pre-trained decoder. To further improve the accuracy of the mapping, we integrate a new region-aware learning strategy into the design of both the encoder and the decoder. We verify the efficacy of the proposed Sem2NeRF and demonstrate that it outperforms several strong baselines on two benchmark datasets. Code and video are available at https://donydchen.github.io/sem2nerf/
翻译:随着深层基因模型的迅速发展,图像的翻译和操纵日益受到越来越多的关注。虽然现有方法带来了令人印象深刻的成果,但它们主要在2D空间运作。鉴于基于NeRF的3D-Watar基因模型的最新进展,我们引入了一个新的任务,即Semantitic-to-NERF翻译,目的是重建由NeRF模拟的3D场景,以一个单一视图的语义面具作为输入条件。为了启动这一新任务,我们提议Sem2NERF框架。特别是,Sem2NERF将语义面具输入控制预先训练的解码器3D场面的暗码,从而应对这一极具挑战性的任务。为了进一步提高绘图的准确性,我们将新的区域认知学习战略纳入编码器和解码器的设计中。我们核查了拟议的Sem2NERF的功效,并证明它超越了两个基准数据集的若干强有力的基线。代码和视频可在https://donychen.github.io/sem2nerf/ 上查阅。