As the quality of few shot facial animation from landmarks increases, new applications become possible, such as ultra low bandwidth video chat compression with a high degree of realism. However, there are some important challenges to tackle in order to improve the experience in real world conditions. In particular, the current approaches fail to represent profile views without distortions, while running in a low compute regime. We focus on this key problem by introducing a multi-frames embedding dubbed Frontalizer to improve profile views rendering. In addition to this core improvement, we explore the learning of a latent code conditioning generations along with landmarks to better convey facial expressions. Our dense models achieves 22% of improvement in perceptual quality and 73% reduction of landmark error over the first order model baseline on a subset of DFDC videos containing head movements. Declined with mobile architectures, our models outperform the previous state-of-the-art (improving perceptual quality by more than 16% and reducing landmark error by more than 47% on two datasets) while running on real time on iPhone 8 with very low bandwidth requirements.
翻译:随着一些来自里程碑的短片面部动画质量的提高,新的应用成为了可能,例如超低带宽视频聊天压缩和高度现实主义的超低带宽视频压缩等。 但是,为了改善现实世界中的经验,需要应对一些重要的挑战。 特别是,目前的方法在低计算机制下运行,无法代表不扭曲的剖面观点。 我们通过引入多框架嵌入假面部增殖器来改善剖面观,来集中关注这一关键问题。 除了这一核心改进外,我们还探索了潜在代码的学习,这些代码将几代人与里程碑连接起来,以更好地传达面部表情。 我们密集的模型在包含头部运动的一组DFDC视频的第一顺序基线上实现了感知质量改善的22%和里程碑误差减少的73%。 我们的模型在移动结构下,超越了先前的状态(在两个数据集上将视觉质量提高16%以上,将标志性误差减少47%以上),同时在iPhone 8上实时运行,带宽要求非常低。