In this work, we advance the neural head avatar technology to the megapixel resolution while focusing on the particularly challenging task of cross-driving synthesis, i.e., when the appearance of the driving image is substantially different from the animated source image. We propose a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data to achieve the desired levels of rendered image quality and generalization to novel views and motion. We demonstrate that suggested architectures and methods produce convincing high-resolution neural avatars, outperforming the competitors in the cross-driving scenario. Lastly, we show how a trained high-resolution neural avatar model can be distilled into a lightweight student model which runs in real-time and locks the identities of neural avatars to several dozens of pre-defined source images. Real-time operation and identity lock are essential for many practical applications head avatar systems.
翻译:在这项工作中,我们将神经头电动技术推广到巨型像素分辨率中,同时侧重于交叉驱动合成这一特别艰巨的任务,即当驱动图像的外观与动动源图像大不相同时,我们提出一套新的神经结构和培训方法,能够利用中分辨率视频数据和高分辨率图像数据,达到制作图像质量的预期水平,将图像的概括化转化为新观点和运动。我们证明建议的结构和方法能够产生令人信服的高分辨率神经动因,在交叉驱动情景中比竞争者表现得更好。最后,我们展示如何将训练有素的高分辨率神经动素模型提炼成一个轻量学生模型,该模型将实时运行并将神经动因的特性锁定在数十个预定义的源图像中。实时操作和身份锁对许多实用应用程序头动源系统至关重要。