High-quality reconstruction of controllable 3D head avatars from 2D videos is highly desirable for virtual human applications in movies, games, and telepresence. Neural implicit fields provide a powerful representation to model 3D head avatars with personalized shape, expressions, and facial parts, e.g., hair and mouth interior, that go beyond the linear 3D morphable model (3DMM). However, existing methods do not model faces with fine-scale facial features, or local control of facial parts that extrapolate asymmetric expressions from monocular videos. Further, most condition only on 3DMM parameters with poor(er) locality, and resolve local features with a global neural field. We build on part-based implicit shape models that decompose a global deformation field into local ones. Our novel formulation models multiple implicit deformation fields with local semantic rig-like control via 3DMM-based parameters, and representative facial landmarks. Further, we propose a local control loss and attention mask mechanism that promote sparsity of each learned deformation field. Our formulation renders sharper locally controllable nonlinear deformations than previous implicit monocular approaches, especially mouth interior, asymmetric expressions, and facial details.
翻译:高质量重建可控三维头像从二维视频中高度渴望用于在电影、游戏和远程存在等虚拟人应用。神经隐式场提供了一种强大的表示形式,以模拟带有个性化形状、表情和面部部位(例如,头发和口腔内部)的三维头像,这超出了线性三维可塑模型(3DMM)。然而,现有的方法并没有模拟细微的面部特征,或者局部控制面部部位,从单眼视频中推断出不对称的表情。此外,大多数情况下仅取决于3DMM参数,且在全局神经场中解决局部特征。我们建立在部分隐式形状模型上,将全局变形场分解为局部变形场。我们的新颖公式模拟多个隐式变形场,通过3DMM-基本参数和代表性面部标志物实现了局部语义刚体控制。此外,我们提出了一种局部控制损失和注意力蒙版机制,促进了每个学习到的变形场的稀疏性。我们的公式呈现出比以前的隐式单眼方法更锐利、局部可控的非线性变形,特别是口腔内部、不对称的表情和面部细节。