HS-Diffusion: 学习用于换头的语义混合扩散模型 (HS-Diffusion: Learning a Semantic-Mixing Diffusion Model for Head Swapping) - 专知论文

会员服务 ·

0

混合 · 扩散模型 · 粒度 · 生成器 · 语义校准 ·

2023 年 3 月 30 日

HS-Diffusion: Learning a Semantic-Mixing Diffusion Model for Head Swapping

翻译：HS-Diffusion: 学习用于换头的语义混合扩散模型

Qinghe Wang,Lijie Liu,Miao Hua,Qian He,Pengfei Zhu,Bing Cao,Qinghua Hu

Image-based head swapping task aims to stitch a source head to another source body flawlessly. This seldom-studied task faces two major challenges: 1) Preserving the head and body from various sources while generating a seamless transition region. 2) No paired head swapping dataset and benchmark so far. In this paper, we propose a semantic-mixing diffusion model for head swapping (HS-Diffusion) which consists of a latent diffusion model (LDM) and a semantic layout generator. We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping. Semantic-mixing LDM can further implement a fine-grained head swapping with the inpainted layout as condition by a progressive fusion process, while preserving head and body with high-quality reconstruction. To this end, we propose a semantic calibration strategy for natural inpainting and a neck alignment for geometric realism. Importantly, we construct a new image-based head swapping benchmark and design two tailor-designed metrics (Mask-FID and Focal-FID). Extensive experiments demonstrate the superiority of our framework. The code will be available: https://github.com/qinghew/HS-Diffusion.

翻译：基于图像的换头任务旨在无缝地将一个源头颅缝合到另一个源身体上。这个鲜为人知的任务面临两个主要挑战：1）在产生无缝过渡区域的同时保留头部和身体的各种源；2）迄今为止没有配对的换头数据集和基准。在本文中，我们提出了一种用于换头的语义混合扩散模型（HS-Diffusion），包括潜在扩散模型（LDM）和语义布局生成器。我们混合源头和源身体的语义布局，然后通过语义布局生成器补全过渡区域，实现了粗粒度的换头。语义混合LDM可以通过渐进融合过程进一步实现以填充布局为条件的细粒度换头，同时保留高质量的头部和身体重建。为此，我们提出了一种自然修复的语义校准策略和用于几何现实感的颈部对齐。重要的是，我们构建了一个新的基于图像的换头基准，并设计了两个量身定制的指标（Mask-FID和Focal-FID）。大量实验证明了我们框架的优越性。代码将可用：https://github.com/qinghew/HS-Diffusion.

0

相关内容

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

专知

11+阅读 · 2018年3月20日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

行人重识别目标中心编码外观模型的研究

国家自然科学基金

1+阅读 · 2015年12月31日

Neolaxiflorin B的全合成研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于语义的轮胎花纹层次化建模方法及其关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模非结构化四面体网格高性能解耦并行生成算法

国家自然科学基金

0+阅读 · 2012年12月31日

无溶剂合成介孔硅铝催化材料

国家自然科学基金

0+阅读 · 2012年12月31日

Cr2AlC 211 型MAX相薄膜的合成及抗辐照损伤特性

国家自然科学基金

0+阅读 · 2012年12月31日

三维模型的分析与检索研究

国家自然科学基金

1+阅读 · 2010年12月31日

基于视频语义理解的艺术风格化研究

国家自然科学基金

1+阅读 · 2009年12月31日

6-环己甲基取代嘧啶酮类化合物分子设计、合成及抗HIV活性研究

国家自然科学基金

0+阅读 · 2009年12月31日

沸腾液体膨胀蒸气爆炸（BLEVE）微观动力学演化过程及其抑制机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Open-Set Likelihood Maximization for Few-Shot Learning

Arxiv

0+阅读 · 2023年5月19日

Structural Pruning for Diffusion Models

Arxiv

0+阅读 · 2023年5月18日

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月18日

Pyramid Diffusion Models For Low-light Image Enhancement

Arxiv

0+阅读 · 2023年5月17日

AMD: Autoregressive Motion Diffusion

Arxiv

0+阅读 · 2023年5月17日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Arxiv

16+阅读 · 2022年3月25日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Arxiv

11+阅读 · 2019年11月25日

f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

Arxiv

11+阅读 · 2019年3月25日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

专知

11+阅读 · 2018年3月20日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

Open-Set Likelihood Maximization for Few-Shot Learning

Arxiv

0+阅读 · 2023年5月19日

Structural Pruning for Diffusion Models

Arxiv

0+阅读 · 2023年5月18日

DiffUTE: Universal Text Editing Diffusion Model

Arxiv

0+阅读 · 2023年5月18日

Pyramid Diffusion Models For Low-light Image Enhancement

Arxiv

0+阅读 · 2023年5月17日

AMD: Autoregressive Motion Diffusion

Arxiv

0+阅读 · 2023年5月17日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Arxiv

16+阅读 · 2022年3月25日

Spatially Consistent Representation Learning

Arxiv

14+阅读 · 2021年3月10日

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Arxiv

11+阅读 · 2019年11月25日

f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning

Arxiv

11+阅读 · 2019年3月25日

相关基金

行人重识别目标中心编码外观模型的研究

国家自然科学基金

1+阅读 · 2015年12月31日

Neolaxiflorin B的全合成研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于语义的轮胎花纹层次化建模方法及其关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模非结构化四面体网格高性能解耦并行生成算法

国家自然科学基金

0+阅读 · 2012年12月31日

无溶剂合成介孔硅铝催化材料

国家自然科学基金

0+阅读 · 2012年12月31日

Cr2AlC 211 型MAX相薄膜的合成及抗辐照损伤特性

国家自然科学基金

0+阅读 · 2012年12月31日

三维模型的分析与检索研究

国家自然科学基金

1+阅读 · 2010年12月31日

基于视频语义理解的艺术风格化研究

国家自然科学基金

1+阅读 · 2009年12月31日

6-环己甲基取代嘧啶酮类化合物分子设计、合成及抗HIV活性研究

国家自然科学基金

0+阅读 · 2009年12月31日

沸腾液体膨胀蒸气爆炸（BLEVE）微观动力学演化过程及其抑制机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员