HQ3DAvatar: 高品质可控的3D头像 (HQ3DAvatar: High Quality Controllable 3D Head Avatar)

Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometric details such as the mouth interior, hair, and topological changes over time. This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. It leverages multiresolution hash encoding in the learned feature space, allowing for high-quality, faster training and high-resolution rendering. At test time, our method is driven by a monocular RGB video. Here, an image encoder extracts face-specific features that also condition the learnable canonical space. This encourages deformation-dependent texture variations during training. We also propose a novel optical flow based loss that ensures correspondences in the learned canonical space, thus encouraging artifact-free and temporally consistent renderings. We show results on challenging facial expressions and show free-viewpoint renderings at interactive real-time rates for medium image resolutions. Our method outperforms all existing approaches, both visually and numerically. We will release our multiple-identity dataset to encourage further research. Our Project page is available at: https://vcai.mpi-inf.mpg.de/projects/HQ3DAvatar/

翻译：近期，多视角体渲染技术在建模和合成高质量头像方面显示出了巨大潜力。兰德的一种常用方法是使用基于网格的模板或基于3D立方体的图形原语来跟踪其基础几何形状，以捕捉完整的头部动态表现。虽然这些基于模型的方法取得了有希望的结果，但它们经常无法学习复杂的几何细节，例如口腔内部、头发、随时间变化的拓扑结构等。本文提出了一种构建高度照片般逼真的数字头像的新方法。我们的方法通过神经网络参数化的隐式函数学习了一个典范空间。它利用所学的特征空间中的多分辨率哈希编码，以实现高质量、更快速的训练和高分辨率的渲染。在测试时，我们的方法通过单目RGB视频驱动。图像编码器提取面部特定的特征，也在训练期间调节学习的典范空间。这个方法在训练期间鼓励变形相关的纹理变化。我们还提出了一种基于光流的损失函数，该函数可以确保在所学的典范空间中相应的点，从而鼓励无伪像的、时间上一致的渲染。我们展示了具有挑战性的面部表情的结果，并显示了中等图像分辨率的交互实时速率的自由视点渲染。我们的方法在视觉和数字上都优于所有现有的方法。我们将发布自己的多个身份数据集，以促进进一步的研究。我们的项目页面位于：https://vcai.mpi-inf.mpg.de/projects/HQ3DAvatar/