Rendering moving human bodies at free viewpoints only from a monocular video is quite a challenging problem. The information is too sparse to model complicated human body structures and motions from both view and pose dimensions. Neural radiance fields (NeRF) have shown great power in novel view synthesis and have been applied to human body rendering. However, most current NeRF-based methods bear huge costs for both training and rendering, which impedes the wide applications in real-life scenarios. In this paper, we propose a rendering framework that can learn moving human body structures extremely quickly from a monocular video. The framework is built by integrating both neural fields and neural voxels. Especially, a set of generalizable neural voxels are constructed. With pretrained on various human bodies, these general voxels represent a basic skeleton and can provide strong geometric priors. For the fine-tuning process, individual voxels are constructed for learning differential textures, complementary to general voxels. Thus learning a novel body can be further accelerated, taking only a few minutes. Our method shows significantly higher training efficiency compared with previous methods, while maintaining similar rendering quality. The project page is at https://taoranyi.com/gneuvox .
翻译:渲染运动中的人体只从一个单目视频中得到自由视点是一项极具挑战的任务。从视角和姿态维度得出复杂人体结构和运动的信息过于稀疏。神经辐射场具有在新视角下合成的巨大能力,并已应用于人体渲染。然而,大多数当前基于神经辐射场的方法在训练和渲染方面都付出了巨大的代价,这阻碍了在现实生活场景中的广泛应用。在本文中,我们提出了一个渲染框架,可以从一个单目视频中极快地学习人体运动结构。该框架是通过集成神经场和神经体素来构建的。特别地,我们构建了一组通用的神经体素。这些通用体素在各种人体上得到预训练,它们表示一个基本的骨架,并提供了强大的几何先验知识。在微调过程中,为了学习差异纹理,会构建个体体素,与通用体素互补。因此,学习新的身体可以进一步加速,仅需几分钟。与以前的方法相比,我们的方法在保持类似的渲染质量的同时,显示出了显著更高的训练效率。项目页面在 https://taoranyi.com/gneuvox。