Traditional 3D morphable face models (3DMMs) provide fine-grained control over expression but cannot easily capture geometric and appearance details. Neural volumetric representations approach photorealism but are hard to animate and do not generalize well to unseen expressions. To tackle this problem, we propose IMavatar (Implicit Morphable avatar), a novel method for learning implicit head avatars from monocular videos. Inspired by the fine-grained control mechanisms afforded by conventional 3DMMs, we represent the expression- and pose- related deformations via learned blendshapes and skinning fields. These attributes are pose-independent and can be used to morph the canonical geometry and texture fields given novel expression and pose parameters. We employ ray marching and iterative root-finding to locate the canonical surface intersection for each pixel. A key contribution is our novel analytical gradient formulation that enables end-to-end training of IMavatars from videos. We show quantitatively and qualitatively that our method improves geometry and covers a more complete expression space compared to state-of-the-art methods.
翻译:传统的 3D 变形脸模型( 3DMMs) 提供了对表达方式的精细控制, 但无法轻易地捕捉几何和外观细节。 神经体量度表征采用光化现实主义,但很难动画,无法对看不见的表达方式进行概括化。 为了解决这个问题,我们建议使用IMavatar( implicit Morphable avatar), 这是一种从单视视频中学习隐性头部动因的新颖方法。 受常规的 3DMS 提供的精度控制机制的启发, 我们代表了通过学习的混合形状和外观外观场的表达和表面变形。 这些特性是自成一体的, 可以用来根据新的表达方式和外观参数来改变罐体的几何测量和纹理场。 我们使用射线和迭代根调查来定位每个像素的光学表面交叉点。 一个关键贡献是我们新的分析梯度配方, 使得通过视频对IMavatarts进行端对端到端培训。 我们从定量和定性显示我们的方法改进了更完整的表达方式并覆盖了更完整的表达空间。