There has been rapid progress recently on 3D human rendering, including novel view synthesis and pose animation, based on the advances of neural radiance fields (NeRF). However, most existing methods focus on person-specific training and their training typically requires multi-view videos. This paper deals with a new challenging task -- rendering novel views and novel poses for a person unseen in training, using only multiview images as input. For this task, we propose a simple yet effective method to train a generalizable NeRF with multiview images as conditional input. The key ingredient is a dedicated representation combining a canonical NeRF and a volume deformation scheme. Using a canonical space enables our method to learn shared properties of human and easily generalize to different people. Volume deformation is used to connect the canonical space with input and target images and query image features for radiance and density prediction. We leverage the parametric 3D human model fitted on the input images to derive the deformation, which works quite well in practice when combined with our canonical NeRF. The experiments on both real and synthetic data with the novel view synthesis and pose animation tasks collectively demonstrate the efficacy of our method.
翻译:最近,在3D人类成像(3D人类成像)方面取得了迅速的进展,包括根据神经光场的进步(NeRF),新观点合成和造型动画。然而,大多数现有方法侧重于个人特有的培训,通常需要多视图视频。本文涉及一项具有挑战性的新任务 -- -- 仅使用多视图图像作为投入,为在培训过程中看不见的人提供新观点和新面貌。对于这项任务,我们提出了一个简单而有效的方法,用多视图图像作为有条件投入来培训通用NERF。关键成分是一个专用的演示,将光学内光谱和体积变形方案结合起来。使用光学空间使我们的方法能够学习人类的共同特性,并容易地向不同的人普及这些特性。数量变形用于将能量空间与输入空间的投入和目标图像以及用于光度和密度预测的查询图像连接起来。我们利用输入图像上的3D人造模型来获取变形,这在实践上非常有效,这与我们的光学内子成像结合时非常有效。在新视觉合成数据与新视图合成合成和成像任务上进行实验,共同展示方法的功效。