Face animation has achieved much progress in computer vision. However, prevailing GAN-based methods suffer from unnatural distortions and artifacts due to sophisticated motion deformation. In this paper, we propose a Face Animation framework with an attribute-guided Diffusion Model (FADM), which is the first work to exploit the superior modeling capacity of diffusion models for photo-realistic talking-head generation. To mitigate the uncontrollable synthesis effect of the diffusion model, we design an Attribute-Guided Conditioning Network (AGCN) to adaptively combine the coarse animation features and 3D face reconstruction results, which can incorporate appearance and motion conditions into the diffusion process. These specific designs help FADM rectify unnatural artifacts and distortions, and also enrich high-fidelity facial details through iterative diffusion refinements with accurate animation attributes. FADM can flexibly and effectively improve existing animation videos. Extensive experiments on widely used talking-head benchmarks validate the effectiveness of FADM over prior arts.
翻译:面部动画在计算机视觉领域取得了很大的进展。然而,基于GAN的现有方法由于复杂的运动变形而导致不自然的扭曲和伪影。在本文中,我们提出了一种基于属性引导的扩散模型的面部动画框架(FADM),这是首个利用扩散模型卓越建模能力进行照片级逼真的头部语音合成的工作。为了缓解扩散模型的无法控制的合成效果,我们设计了一个属性引导的调节网络(AGCN),以自适应地组合粗动画特征和三维面部重建结果,将外观和运动条件纳入扩散过程中。这些特定的设计有助于FADM纠正不自然的伪影和扭曲,并通过准确的动画属性进行迭代扩散细化,丰富高保真面部细节。FADM可以灵活、有效地改进现有的动画视频。广泛的对话头基准实验验证了FADM相对于现有文献的有效性。