Synthesizing face images from monochrome sketches is one of the most fundamental tasks in the field of image-to-image translation. However, it is still challenging to (1)~make models learn the high-dimensional face features such as geometry and color, and (2)~take into account the characteristics of input sketches. Existing methods often use sketches as indirect inputs (or as auxiliary inputs) to guide the models, resulting in the loss of sketch features or the alteration of geometry information. In this paper, we introduce a Sketch-Guided Latent Diffusion Model (SGLDM), an LDM-based network architect trained on the paired sketch-face dataset. We apply a Multi-Auto-Encoder (AE) to encode the different input sketches from different regions of a face from pixel space to a feature map in latent space, which enables us to reduce the dimension of the sketch input while preserving the geometry-related information of local face details. We build a sketch-face paired dataset based on the existing method that extracts the edge map from an image. We then introduce a Stochastic Region Abstraction (SRA), an approach to augment our dataset to improve the robustness of SGLDM to handle sketch input with arbitrary abstraction. The evaluation study shows that SGLDM can synthesize high-quality face images with different expressions, facial accessories, and hairstyles from various sketches with different abstraction levels.
翻译:从单色草图中合成面容图像是图像到图像翻译领域最根本的任务之一。 然而,对于(1) 建立模型以学习高维面貌特征,例如几何和颜色,以及(2) 考虑输入草图的特征,仍然具有挑战性。 现有方法经常使用草图作为间接投入(或辅助投入)来指导模型,从而导致草图特征的丧失或几何信息的变化。 在本文中,我们引入了一个基于图像到图像化成图像领域的基于LDM的网络设计师,即基于LDM的网络设计师。我们采用了多自动- Encoder(AE)来将来自面部不同区域的不同输入草图进行编码,从像素空间到潜在空间的地貌图图,这使我们能够减少草图投入的维度,同时保存与当地面貌细节相关的地理信息信息。我们根据从图像上提取边缘地图的现有方法,建立了一个以LDMDM为主的网络设计师。我们随后采用了一个多层次的直径直径直径直径,用Sto-Ecoder-Degraphal deal commadial shal shal shal Shamagraphal 研究,我们用一个高的Sq Somal-graphal-graphal-graphal-graphal-graphal-graphal-graphyal-graphal-graphal-graphal 研究,我们用Sho化了一种高可增加了一种高的Sq-graphal-Sq-Sq-Sq-Sqsmal-graphyal-Sqsmal-graphal-graphyal-graphal-graphal-graphal-graphal-graphal-graphal-graphal-graphal-graphal-graphal-SG-SD-Sq-Sq-SG-SG-SG-SG-SG-Sq-Sq-Sq-Sq-SG-graphal-graphal-SG-SG-SG-SG-SG-SG-SD-SG-SG-SG-SG-SG-SG-SG-SG-SG-S</s>