We address the problem of learning person-specific facial priors from a small number (e.g., 20) of portrait photos of the same person. This enables us to edit this specific person's facial appearance, such as expression and lighting, while preserving their identity and high-frequency facial details. Key to our approach, which we dub DiffusionRig, is a diffusion model conditioned on, or "rigged by," crude 3D face models estimated from single in-the-wild images by an off-the-shelf estimator. On a high level, DiffusionRig learns to map simplistic renderings of 3D face models to realistic photos of a given person. Specifically, DiffusionRig is trained in two stages: It first learns generic facial priors from a large-scale face dataset and then person-specific priors from a small portrait photo collection of the person of interest. By learning the CGI-to-photo mapping with such personalized priors, DiffusionRig can "rig" the lighting, facial expression, head pose, etc. of a portrait photo, conditioned only on coarse 3D models while preserving this person's identity and other high-frequency characteristics. Qualitative and quantitative experiments show that DiffusionRig outperforms existing approaches in both identity preservation and photorealism. Please see the project website: https://diffusionrig.github.io for the supplemental material, video, code, and data.
翻译:我们解决了从少量(例如20张)同一人的肖像照片中学习个人特定面部先验的问题。这使我们能够在保留其身份和高频面部细节的同时编辑该特定人物的面部外观,例如表情和照明。我们方法的关键是一个扩散模型,该模型以随机抽样的粗略三维面部模型为条件,并在这些模型上进行蒙皮。从高层次来看,DiffusionRig 学习将三维面部模型的简单渲染映射到给定人物的真实照片。具体而言,DiffusionRig 经过两个阶段的培训:它首先从大规模人脸数据集中学习通用面部先验,然后从感兴趣的人的小型肖像照片集中学习个人特定的先验。通过使用这些个性化的先验来学习 CGI 到照片的映射,DiffusionRig 可以“蒙皮”肖像照片的照明、面部表情、头部旋转等,只需要粗略的 3D 模型,同时保留此人的身份和其他高频特征。定性和定量实验表明,DiffusionRig 在身份保持和逼真度方面优于现有方法。请参阅项目网站:https://diffusionrig.github.io 获取补充材料、视频、代码和数据。