We present a novel approach to single-view face relighting in the wild. Handling non-diffuse effects, such as global illumination or cast shadows, has long been a challenge in face relighting. Prior work often assumes Lambertian surfaces, simplified lighting models or involves estimating 3D shape, albedo, or a shadow map. This estimation, however, is error-prone and requires many training examples with lighting ground truth to generalize well. Our work bypasses the need for accurate estimation of intrinsic components and can be trained solely on 2D images without any light stage data, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We also propose a novel conditioning technique that eases the modeling of the complex interaction between light and geometry by using a rendered shading reference to spatially modulate the DDIM. We achieve state-of-the-art performance on standard benchmark Multi-PIE and can photorealistically relight in-the-wild images. Please visit our page: https://diffusion-face-relighting.github.io
翻译:我们提出了一种新方法,能够在不同的光照条件下对人脸进行单视角的生成。非漫反射的影响因素,如全局光照或投影阴影,长期以来一直是人脸生成中的一个难点。以往的工作通常假设兰伯特表面、简化的光照模型,或涉及估计三维形状、漫反射率或阴影图。然而,这种估计容易出错,需要大量的训练样本和光照真实度数据才能很好地推广。我们的工作跳过了准确估算内部组分的需要,可以仅依靠二维图像进行训练,而不需要任何光照真实度数据、多视图图像或多个样本。我们的主要思想是利用条件扩散隐式模型 (DDIM) 来解码出一个包含光照、三维形状和面部识别相关编码的解缠结光照编码。我们还提出了一种新的调节技术,通过使用渲染阴影参考空间调制 DDIM,以便更轻松地建模光线和几何之间的复杂相互作用。我们在标准基准 Multi-PIE 上实现了最先进的性能,并可以在野外图像上实现光线真实的生成。请访问我们的网页: https://diffusion-face-relighting.github.io