Generating portrait images by controlling the motions of existing faces is an important task of great consequence to social media industries. For easy use and intuitive control, semantically meaningful and fully disentangled parameters should be used as modifications. However, many existing techniques do not provide such fine-grained controls or use indirect editing methods i.e. mimic motions of other individuals. In this paper, a Portrait Image Neural Renderer (PIRenderer) is proposed to control the face motions with the parameters of three-dimensional morphable face models (3DMMs). The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications. Experiments on both direct and indirect editing tasks demonstrate the superiority of this model. Meanwhile, we further extend this model to tackle the audio-driven facial reenactment task by extracting sequential motions from audio inputs. We show that our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream. Our source code is available at https://github.com/RenYurui/PIRender.
翻译:通过控制现有面孔的移动生成肖像图像是对社会媒体行业产生巨大影响的重要任务。 为了便于使用和直觉控制,应使用精细和完全分离的参数来进行修改。但是,许多现有技术并不提供这种细细的控制,也不使用间接编辑方法,即模仿其他个人的动作。在本文中,提议用一个肖像图像神经导师(PIRDenderer)来用三维可变形脸型模型(3DMS)的参数来控制面部运动。拟议的模型可以产生光真化图像,根据直观修改进行精确移动。直接和间接编辑任务的实验显示了这一模型的优越性。与此同时,我们进一步扩展这一模型,通过从音频输入中提取顺序动作来应对声动面部重新激活任务。我们显示,我们的模型能够产生一致的视频,只有单一参考图像和驱动音频流的令人信服的动作。我们的源代码可在https://github.com/RenYururii/PIRender查阅。