The fashion industry has diverse applications in multi-modal image generation and editing. It aims to create a desired high-fidelity image with the multi-modal conditional signal as guidance. Most existing methods learn different condition guidance controls by introducing extra models or ignoring the style prior knowledge, which is difficult to handle multiple signal combinations and faces a low-fidelity problem. In this paper, we adapt both style prior knowledge and flexibility of multi-modal control into one unified two-stage framework, M6-Fashion, focusing on the practical AI-aided Fashion design. It decouples style codes in both spatial and semantic dimensions to guarantee high-fidelity image generation in the first stage. M6-Fashion utilizes self-correction for the non-autoregressive generation to improve inference speed, enhance holistic consistency, and support various signal controls. Extensive experiments on a large-scale clothing dataset M2C-Fashion demonstrate superior performances on various image generation and editing tasks. M6-Fashion model serves as a highly potential AI designer for the fashion industry.
翻译:时装产业在多模式图像生成和编辑方面有着多种应用,目的是以多模式有条件信号作为指导,创造理想的高不忠形象,大多数现有方法通过引入额外模型或忽视风格前知识学习不同的条件指导控制,这种知识难以处理多重信号组合,面临低信仰问题。在本文中,我们将多模式控制先前的知识和灵活性调整成一个统一的两阶段框架,即M6-时装,重点是实用的AI帮助的时装设计。它分解空间和语义层面的风格代码,以保证在第一阶段生成高不端图像。 M6-时装利用非航空一代的自我校正,以提高推断速度,增强整体一致性,并支持各种信号控制。大规模服装数据集M2C-时装的大规模实验展示了各种图像生成和编辑任务的优异性表现。 M6-时装模型是时装产业高度潜在的AI设计师。