Image-based fashion design with AI techniques has attracted increasing attention in recent years. We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image. It is a challenging task since there are no reference images available for the newly designed output fashion images. Although diffusion-based image translation or neural style transfer (NST) has enabled flexible style transfer, it is often difficult to maintain the original structure of the image realistically during the reverse diffusion, especially when the referenced appearance image greatly differs from the common clothing appearance. To tackle this issue, we present a novel diffusion model-based unsupervised structure-aware transfer method to semantically generate new clothes from a given clothing image and a reference appearance image. In specific, we decouple the foreground clothing with automatically generated semantic masks by conditioned labels. And the mask is further used as guidance in the denoising process to preserve the structure information. Moreover, we use the pre-trained vision Transformer (ViT) for both appearance and structure guidance. Our experimental results show that the proposed method outperforms state-of-the-art baseline models, generating more realistic images in the fashion design task. Code and demo can be found at https://github.com/Rem105-210/DiffFashion.
翻译:使用 AI 技术的基于图像的时装设计近年来引起了越来越多的注意。 我们侧重于一个新的时装设计任务, 目的是在保存着装图像结构的同时将参考外观图像转移到衣着图像上, 这是一项艰巨的任务, 因为新设计的输出时装图像没有可用的参考图像。 虽然基于扩散的图像翻译或神经风格传输( NST) 已经允许了灵活的风格传输, 但是在反向扩散过程中, 通常很难现实地维持图像的原始结构, 特别是在被引用的外观图像与普通外观差异很大的情况下。 为了解决这个问题, 我们提出了一个基于新型的传播模型, 且不受监督的结构- 意识传输方法, 用于从给定的服装图像和参考外观图像中生成新服装。 具体地说, 我们将前方服装与自动生成的语义面罩转换( NST ), 并且该面罩还被进一步用作解析过程中的指南, 以保存结构信息。 此外, 我们使用预先训练过的视觉变形器( VIT) 来处理外观和结构指导。 我们的实验结果显示, 方法超越Forforforforformasmas- formaps- stragesormaps- degard- degard- strutslationslationslationslusmuslatedaldaldsumdaldslock- degaldaldaldaldaldaldsumsaldaldaldormormormormormormdsumdsmusdsmusdsaldaldsmdsmusdsmusdsmdsaldsmdsmusdaldaldsldaldaldaldaldaldaldaldaldsaldaldsmdsmdsmdsmdaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldsaldaldalds.daldalds.daldaldaldaldaldaldaldaldsaldsaldsaldaldsaldsaldsaldsmd