We address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space.Deviating from talking head generation conditioned on audio that seldom puts emphasis on realistic head motions, we devise a GAN-based architecture that allows obtaining rich head motion sequences while avoiding known caveats associated with GANs.Namely, the autoregressive generation of incremental outputs ensures smooth trajectories, while a multi-scale discriminator on input pairs drives generation toward better handling of high and low frequency signals and less mode collapse.We demonstrate experimentally the relevance of the proposed architecture and compare with models that showed state-of-the-art performances on similar tasks.
翻译:我们处理的是无条件的头部运动生成到低维的语义空间中的动人面部的任务。 从以很少强调现实头部动作的音频为条件的说话头部生成,我们设计了一个基于GAN的架构,允许获得丰富的头部运动序列,同时避免与GANs相关的已知警告。 Name 可以说,自动递增式的增量产出生成可以确保平稳的轨迹,而投入对口的多级歧视者则推动对高频和低频信号的生成,减少模式崩溃。 我们实验地展示了拟议结构的相关性,并比较了在类似任务上显示最新表现的模型。