While recent NeRF-based generative models achieve the generation of diverse 3D-aware images, these approaches have limitations when generating images that contain user-specified characteristics. In this paper, we propose a novel model, referred to as the conditional generative neural radiance fields (CG-NeRF), which can generate multi-view images reflecting extra input conditions such as images or texts. While preserving the common characteristics of a given input condition, the proposed model generates diverse images in fine detail. We propose: 1) a novel unified architecture which disentangles the shape and appearance from a condition given in various forms and 2) the pose-consistent diversity loss for generating multimodal outputs while maintaining consistency of the view. Experimental results show that the proposed method maintains consistent image quality on various condition types and achieves superior fidelity and diversity compared to existing NeRF-based generative models.
翻译:虽然最近基于NerFS的基因变异模型生成了多种3D-觉的图像,但这些方法在生成含有用户特定特征的图像时具有局限性。在本文中,我们提出了一个新颖的模式,称为有条件的神经神经发光场(CG-NERF),可以生成反映图像或文本等额外输入条件的多视图图像。在保留特定输入条件的共同特征的同时,拟议模式生成了详细多样的图像。我们提议:(1) 建立一个新的统一结构,将形状和外观与以各种形式给出的条件分解开来;(2) 在保持观点一致性的同时,产生多式产出时出现与形式一致的多样性损失。实验结果表明,拟议方法保持了各种条件类型的连续图像质量,并实现了与现有基于NERF的基因模型相比的高度忠诚和多样性。