Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural rendering to improve performance under large pose changes. Nevertheless, the fidelity of identity and expression is not so desirable, especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. Drawing on the recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the 3D dynamic scene into a canonical appearance field and an implicit deformation field, where the former comprises the canonical source face and the latter models the driving pose and expression. In particular, we improve fidelity from two aspects: (i) to enhance identity expressiveness, we design a generalized appearance module that leverages multi-scale volume features to preserve face shape and details; (ii) to improve expression preciseness, we propose a lightweight deformation module that explicitly decouples the pose and expression to enable precise expression modeling. Extensive experiments demonstrate that our proposed approach can generate better results than previous works. Project page: https://www.waytron.net/hidenerf/
翻译:说话人头像生成旨在生成能够保持源图像身份信息并模仿驱动图像运动的面部图像。先前的方法主要依赖于2D表示,因此在遇到头部大幅度旋转时不可避免地会出现面部畸变。最近的工作则采用显式3D结构表示或隐式神经渲染来改善大姿态变化下的生成质量。然而,身份和表情的保真度并不是非常理想,特别是在新视角合成方面。本文提出了HiDe-NeRF,通过可变形神经辐射场实现了高保真度和自由视角的说话人头像合成。HiDe-NeRF基于最近提出的可变形神经辐射场,将3D动态场景表示为规范外观场和隐式畸变场。其中,规范外观场由规范源面部组成,而隐式畸变场则模拟了驱动姿态和表情。具体而言,我们从两个方面改进了保真度:(i)为了增强身份表现力,我们设计了一个广义外观模块,利用多尺度体特征来保留面部形状和细节;(ii)为了提高表情精确度,我们提出了一个轻量级畸变模块,明确地将姿态和表情分离,从而实现了精确的表情建模。广泛的实验证明我们的方法可以产生比以前的方法更好的结果。项目主页: https://www.waytron.net/hidenerf/