The ability to create realistic, animatable and relightable head avatars from casual video sequences would open up wide ranging applications in communication and entertainment. Current methods either build on explicit 3D morphable meshes (3DMM) or exploit neural implicit representations. The former are limited by fixed topology, while the latter are non-trivial to deform and inefficient to render. Furthermore, existing approaches entangle lighting in the color estimation, thus they are limited in re-rendering the avatar in new environments. In contrast, we propose PointAvatar, a deformable point-based representation that disentangles the source color into intrinsic albedo and normal-dependent shading. We demonstrate that PointAvatar bridges the gap between existing mesh- and implicit representations, combining high-quality geometry and appearance with topological flexibility, ease of deformation and rendering efficiency. We show that our method is able to generate animatable 3D avatars using monocular videos from multiple sources including hand-held smartphones, laptop webcams and internet videos, achieving state-of-the-art quality in challenging cases where previous methods fail, e.g., thin hair strands, while being significantly more efficient in training than competing methods.
翻译:从随意视频序列创造现实、可想象和可点亮的头动动能的能力将打开通信和娱乐的广泛应用。当前的方法要么建立在直立的 3D 可变模贝贝(3DMM) 上,要么利用神经隐含的表达方式。前者受固定的地形的限制,而后者则不易变形,效率低下。此外,在彩色估计中,现有的方法将光线缠绕在一起,因此这些方法在新环境中重新复制阿凡达时是有限的。相比之下,我们提出PointAvatar,一个基于点的变形代表方式,将源颜色分解成内在的阿尔贝多和正常依赖的阴影。我们证明,PointAvatar弥合了现有网形和隐含的表达方式之间的差距,将高质量的几何和外观与地形灵活性、易变形和提高效率相结合。我们表明,我们的方法能够利用多种来源的单色视频生成可计量的3Datars,包括手持智能手机、笔记式网络摄像头和互联网视频,从而将源的颜色分解为内在的反射线,将源颜色分解为内在的反射线和正常的阴影和正常的阴影。我们证明可以将现有的网状质量在前几级上都比较有挑战性地进行竞争性地进行。