Constructing drivable and photorealistic 3D head avatars has become a central task in AR/XR, enabling immersive and expressive user experiences. With the emergence of high-fidelity and efficient representations such as 3D Gaussians, recent works have pushed toward ultra-detailed head avatars. Existing approaches typically fall into two categories: rule-based analytic rigging or neural network-based deformation fields. While effective in constrained settings, both approaches often fail to generalize to unseen expressions and poses, particularly in extreme reenactment scenarios. Other methods constrain Gaussians to the global texel space of 3DMMs to reduce rendering complexity. However, these texel-based avatars tend to underutilize the underlying mesh structure. They apply minimal analytic deformation and rely heavily on neural regressors and heuristic regularization in UV space, which weakens geometric consistency and limits extrapolation to complex, out-of-distribution deformations. To address these limitations, we introduce TexAvatars, a hybrid avatar representation that combines the explicit geometric grounding of analytic rigging with the spatial continuity of texel space. Our approach predicts local geometric attributes in UV space via CNNs, but drives 3D deformation through mesh-aware Jacobians, enabling smooth and semantically meaningful transitions across triangle boundaries. This hybrid design separates semantic modeling from geometric control, resulting in improved generalization, interpretability, and stability. Furthermore, TexAvatars captures fine-grained expression effects, including muscle-induced wrinkles, glabellar lines, and realistic mouth cavity geometry, with high fidelity. Our method achieves state-of-the-art performance under extreme pose and expression variations, demonstrating strong generalization in challenging head reenactment settings.
翻译:构建可驱动且逼真的3D头部化身已成为增强现实/扩展现实领域的核心任务,能够实现沉浸式且富有表现力的用户体验。随着3D高斯等高保真度高效表示方法的出现,近期研究已推动超精细头部化身的发展。现有方法通常分为两类:基于规则的解析式绑定或基于神经网络的形变场。尽管在受限场景中有效,这两种方法往往难以泛化至未见过的表情和姿态,尤其在极端重演场景中。其他方法将高斯分布约束在3DMM的全局纹理空间中以降低渲染复杂度,但这类基于纹理的化身往往未能充分利用底层网格结构。它们仅施加最小限度的解析形变,并严重依赖UV空间中的神经回归器和启发式正则化,这削弱了几何一致性并限制了向复杂分布外形变的泛化能力。为解决这些局限,我们提出TexAvatars——一种混合化身表示方法,将解析式绑定的显式几何基础与纹理空间的空间连续性相结合。我们的方法通过CNN在UV空间中预测局部几何属性,但通过网格感知雅可比矩阵驱动3D形变,实现跨越三角形边界的平滑且语义连贯的过渡。这种混合设计将语义建模与几何控制分离,从而提升了泛化能力、可解释性和稳定性。此外,TexAvatars能够高保真地捕捉细粒度表情效应,包括肌肉诱导的皱纹、眉间纹以及逼真的口腔几何结构。我们的方法在极端姿态和表情变化下实现了最先进的性能,在具有挑战性的头部重演场景中展现出强大的泛化能力。