Image editing using diffusion models has witnessed extremely fast-paced growth recently. There are various ways in which previous works enable controlling and editing images. Some works use high-level conditioning such as text, while others use low-level conditioning. Nevertheless, most of them lack fine-grained control over the properties of the different objects present in the image, i.e. object-level image editing. In this work, we consider an image as a composition of multiple objects, each defined by various properties. Out of these properties, we identify structure and appearance as the most intuitive to understand and useful for editing purposes. We propose Structure-and-Appearance Paired Diffusion model (PAIR-Diffusion), which is trained using structure and appearance information explicitly extracted from the images. The proposed model enables users to inject a reference image's appearance into the input image at both the object and global levels. Additionally, PAIR-Diffusion allows editing the structure while maintaining the style of individual components of the image unchanged. We extensively evaluate our method on LSUN datasets and the CelebA-HQ face dataset, and we demonstrate fine-grained control over both structure and appearance at the object level. We also applied the method to Stable Diffusion to edit any real image at the object level.
翻译:近来,扩散模型在图像编辑中的应用迅速增长。之前的研究中有一些使用高级的条件,例如文本,而另一些使用低级的条件。然而,大多数方法都无法精细控制图像中不同物体的属性,即对象级别的图像编辑。本研究中,我们将一张图像视为多个物体的组合,每个物体由各种属性定义。我们认为,其中最直观和有用于编辑目的的属性是结构和外观。因此,我们提出了一种结构和外观对齐扩散模型(PAIR-Diffusion),该模型使用明确从图像中提取的结构和外观信息进行训练。所提出的模型使用户能够在物体和全局级别上将参考图像的外观注入到输入图像中。另外,PAIR-Diffusion允许在保持图像各个组成部分的样式不变的情况下编辑结构。我们在LSUN数据集和CelebA-HQ人脸数据集上对我们的方法进行了广泛评估,并展示了在对象级别上对结构和外观的精细控制。我们还将该方法应用于稳定扩散,以在对象级别上编辑任何真实图像。