In this work, we advance the neural head avatar technology to the megapixel resolution while focusing on the particularly challenging task of cross-driving synthesis, i.e., when the appearance of the driving image is substantially different from the animated source image. We propose a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data to achieve the desired levels of rendered image quality and generalization to novel views and motion. We demonstrate that suggested architectures and methods produce convincing high-resolution neural avatars, outperforming the competitors in the cross-driving scenario. Lastly, we show how a trained high-resolution neural avatar model can be distilled into a lightweight student model which runs in real-time and locks the identities of neural avatars to several dozens of pre-defined source images. Real-time operation and identity lock are essential for many practical applications head avatar systems.
翻译:在本工作中,我们将神经头像技术提升到了百万像素的分辨率,并聚焦于交叉引导合成这个特别具有挑战性的任务。交叉引导合成是指当驱动图像的外观与动画源图像差别很大时。我们提出了一组新的神经架构和训练方法,可以利用中等分辨率的视频数据和高分辨率的图像数据,以实现所需的渲染图像质量和对新视角和动作的泛化。我们证明所建议的架构和方法可以生成令人信服的高分辨率神经头像,且在交叉引导场景中胜过竞争对手。最后,我们展示了如何将训练好的高分辨率神经头像模型提炼成一个轻量级的学生模型,以便在实时运行并将神经头像的身份与数十张预定义的源图像锁定。实时操作和身份锁定对于许多实际应用头像系统来说是必不可少的。