We propose a novel framework for 3D-aware object manipulation, called Auto-Encoding Neural Radiance Fields (AE-NeRF). Our model, which is formulated in an auto-encoder architecture, extracts disentangled 3D attributes such as 3D shape, appearance, and camera pose from an image, and a high-quality image is rendered from the attributes through disentangled generative Neural Radiance Fields (NeRF). To improve the disentanglement ability, we present two losses, global-local attribute consistency loss defined between input and output, and swapped-attribute classification loss. Since training such auto-encoding networks from scratch without ground-truth shape and appearance information is non-trivial, we present a stage-wise training scheme, which dramatically helps to boost the performance. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
翻译:我们为3D意识物体操纵提出了一个新的框架,称为“自动编码神经辐射场 ” ( AE-NeRF ) 。 我们的模型是在自动编码结构中开发的,从图像中提取了3D形状、外观和摄像头姿势等三D分解的属性,并通过分解的基因神经辐射场(NERF)从属性中绘制了一个高质量的图像。为了提高分解能力,我们提出了两个损失,即输入和输出之间界定的全球-地方属性一致性损失,以及互换归性分类损失。由于这种从零到零的自动编码网络没有地面图象形状和外观信息是非三角的,我们提出了一个分阶段培训计划,它极大地有助于提高性能。我们进行了实验,以展示拟议模型在最新方法上的有效性并提供广泛的反动研究。