We propose a novel neural rendering pipeline, Hybrid Volumetric-Textural Rendering (HVTR), which synthesizes virtual human avatars from arbitrary poses efficiently and at high quality. First, we learn to encode articulated human motions on a dense UV manifold of the human body surface. To handle complicated motions (e.g., self-occlusions), we then leverage the encoded information on the UV manifold to construct a 3D volumetric representation based on a dynamic pose-conditioned neural radiance field. While this allows us to represent 3D geometry with changing topology, volumetric rendering is computationally heavy. Hence we employ only a rough volumetric representation using a pose-conditioned downsampled neural radiance field (PD-NeRF), which we can render efficiently at low resolutions. In addition, we learn 2D textural features that are fused with rendered volumetric features in image space. The key advantage of our approach is that we can then convert the fused features into a high-resolution, high-quality avatar by a fast GAN-based textural renderer. We demonstrate that hybrid rendering enables HVTR to handle complicated motions, render high-quality avatars under user-controlled poses/shapes and even loose clothing, and most importantly, be efficient at inference time. Our experimental results also demonstrate state-of-the-art quantitative results.
翻译:我们提出一个新的神经转换管道,即混合体积-体外构造(HVTR),它以高效和高质量的方式将任意的合成合成的虚拟人类动因合成成虚拟人类动因成形。首先,我们学会在人体表面密集的紫外线柱上将清晰的人类动作编码起来。要处理复杂的动作(例如自我封闭),我们然后利用紫外线的编码信息,在动态的造形神经光亮场的基础上,构建一个3D体积代表体积。这使我们能够用变化的表层来代表3D几何,而体积成形则在计算上非常重。因此,我们只使用粗体积表示法,使用一个配置的降压式神经光线谱场(PD-NERF),我们可以在低分辨率上高效地完成。此外,我们学习了2D文体格特征,这些特征与图像空间的变体体积特征结合在一起。我们的方法的主要优点是,我们可以随后用一个基于快速的GAN制质成型的文字转换成高分辨率的体积。我们只能使用粗体积的体积表示高性能化的系统化结果,在高质化的内,让我们能化的系统化的系统能化化化化化化化的系统化成一个高质化的、最精质化的、最精化的成能化的成能化的内,让我们化化化的成成能,让我们化的内,让我们化的成,让我们化的内,让我们制成成成。