Image-based head swapping task aims to stitch a source head to another source body flawlessly. This seldom-studied task faces two major challenges: 1) Preserving the head and body from various sources while generating a seamless transition region. 2) No paired head swapping dataset and benchmark so far. In this paper, we propose a semantic-mixing diffusion model for head swapping (HS-Diffusion) which consists of a latent diffusion model (LDM) and a semantic layout generator. We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping. Semantic-mixing LDM can further implement a fine-grained head swapping with the inpainted layout as condition by a progressive fusion process, while preserving head and body with high-quality reconstruction. To this end, we propose a semantic calibration strategy for natural inpainting and a neck alignment for geometric realism. Importantly, we construct a new image-based head swapping benchmark and design two tailor-designed metrics (Mask-FID and Focal-FID). Extensive experiments demonstrate the superiority of our framework. The code will be available: https://github.com/qinghew/HS-Diffusion.
翻译:基于图像的换头任务旨在无缝地将一个源头颅缝合到另一个源身体上。这个鲜为人知的任务面临两个主要挑战:1)在产生无缝过渡区域的同时保留头部和身体的各种源;2)迄今为止没有配对的换头数据集和基准。在本文中,我们提出了一种用于换头的语义混合扩散模型(HS-Diffusion),包括潜在扩散模型(LDM)和语义布局生成器。我们混合源头和源身体的语义布局,然后通过语义布局生成器补全过渡区域,实现了粗粒度的换头。语义混合LDM可以通过渐进融合过程进一步实现以填充布局为条件的细粒度换头,同时保留高质量的头部和身体重建。为此,我们提出了一种自然修复的语义校准策略和用于几何现实感的颈部对齐。重要的是,我们构建了一个新的基于图像的换头基准,并设计了两个量身定制的指标(Mask-FID和Focal-FID)。大量实验证明了我们框架的优越性。代码将可用:https://github.com/qinghew/HS-Diffusion.