Facial expression synthesis has achieved remarkable advances with the advent of Generative Adversarial Networks (GANs). However, GAN-based approaches mostly generate photo-realistic results as long as the testing data distribution is close to the training data distribution. The quality of GAN results significantly degrades when testing images are from a slightly different distribution. Moreover, recent work has shown that facial expressions can be synthesized by changing localized face regions. In this work, we propose a pixel-based facial expression synthesis method in which each output pixel observes only one input pixel. The proposed method achieves good generalization capability by leveraging only a few hundred training images. Experimental results demonstrate that the proposed method performs comparably well against state-of-the-art GANs on in-dataset images and significantly better on out-of-dataset images. In addition, the proposed model is two orders of magnitude smaller which makes it suitable for deployment on resource-constrained devices.
翻译:随着Generation Adversarial Networks(GANs)的出现,以GAN为基础的方法取得了显著的进展。然而,只要测试数据分布接近培训数据分布,基于GAN的方法大多产生光现实效果。测试图像分布略有不同,GAN结果的质量就会显著下降。此外,最近的工作表明,面部表达面部表达方式可以通过改变本地面貌区域而合成。在这项工作中,我们提议了一种以像素为基础的面部表达方式合成方法,其中每个输出像素只观察一个输入像素。拟议方法仅通过利用几百个培训图像而实现良好的概括化能力。实验结果显示,拟议方法在与数据集图像上最先进的GANs相比,表现得相当,在数据集外图像上也明显改善。此外,拟议模型的规模小于两个等量级,因此适合在资源限制的装置上部署。