基于协作扩散的多模态人脸生成和编辑 (Collaborative Diffusion for Multi-Modal Face Generation and Editing) - 专知论文

会员服务 ·

0

单模 · 模态 · 协作 · 人脸生成 · 扩散模型 ·

2023 年 4 月 20 日

Collaborative Diffusion for Multi-Modal Face Generation and Editing

翻译：基于协作扩散的多模态人脸生成和编辑

Ziqi Huang,Kelvin C. K. Chan,Yuming Jiang,Ziwei Liu

from arxiv, CVPR 2023. Project page: https://ziqihuangg.github.io/projects/collaborative-diffusion.html Code: https://github.com/ziqihuangg/Collaborative-Diffusion

Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.

翻译：扩散模型最近成为一种强大的生成工具。尽管取得了很大进展，但现有的扩散模型主要集中在单模态控制上，即扩散过程仅由条件的一种模态驱动。为了进一步释放用户的创造力，希望模型能够同时通过多种模态进行控制，例如通过描述年龄（文本驱动）来绘制面部形状（掩码驱动）进行生成和编辑。在这项工作中，我们提出了基于协作扩散的方法，其中预训练的单模态扩散模型协作实现多模态人脸生成和编辑而无需重新训练。我们的主要思想是，由不同模态驱动的扩散模型在潜在去噪步骤中具有互补性，可以建立双向连接。具体而言，我们提出了动态扩散器，这是一个元网络，根据每个预训练的单模态模型预测每个模型的时空影响函数，从而自适应地幻觉多模态去噪步骤。协作扩散不仅协作了来自单模态扩散模型的生成能力，还集成了多个单模态操作以执行多模态编辑。广泛的定性和定量实验表明了我们框架在图像质量和条件一致性方面的优越性。

0

相关内容

【伯克利博士论文】可迁移生成模型，137页pdf

【伯克利博士论文】可迁移生成模型，137页pdf

专知会员服务

54+阅读 · 2023年5月23日

DiffRec: 扩散推荐模型（SIGIR'23）

DiffRec: 扩散推荐模型（SIGIR'23）

专知会员服务

48+阅读 · 2023年4月16日

【CVPR2022】开放集半监督图像生成

【CVPR2022】开放集半监督图像生成

专知会员服务

23+阅读 · 2022年5月3日

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

专知会员服务

12+阅读 · 2022年3月24日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

7 Papers & Radios | 谷歌推出DreamBooth扩散模型；张益唐零点猜想论文出炉

7 Papers & Radios | 谷歌推出DreamBooth扩散模型；张益唐零点猜想论文出炉

机器之心

2+阅读 · 2022年11月13日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

退化Fisher方程解的渐进性研究

国家自然科学基金

0+阅读 · 2015年12月31日

一类随机Navier-Stokes方程的数值解及其应用

国家自然科学基金

1+阅读 · 2015年12月31日

随机偏微分方程多辛几何算法及不确定性量化

国家自然科学基金

0+阅读 · 2015年12月31日

PI3K/Nrf2信号通路协同调控乳腺癌EMT及侵袭转移的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

几类具非标准增长的拟线性椭圆和抛物型方程的研究

国家自然科学基金

0+阅读 · 2012年12月31日

一类时滞积分方程解的存在性

国家自然科学基金

0+阅读 · 2012年12月31日

点传递图中若干问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于物理和几何的相变与凝聚现象

国家自然科学基金

0+阅读 · 2012年12月31日

基于两重网格的Navier-Stokes方程并行自适应后处理及变分多尺度算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

脉冲泛函微分方程边值问题及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Benchmarking Spatial Relationships in Text-to-Image Generation

Arxiv

0+阅读 · 2023年6月6日

Interactive Editing for Text Summarization

Arxiv

0+阅读 · 2023年6月5日

Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays

Arxiv

0+阅读 · 2023年6月5日

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

Arxiv

0+阅读 · 2023年6月5日

Stable Diffusion is Untable

Arxiv

0+阅读 · 2023年6月5日

PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

Arxiv

0+阅读 · 2023年6月2日

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Arxiv

16+阅读 · 2022年3月25日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

VIP会员

文章信息

相关主题

相关VIP内容

【伯克利博士论文】可迁移生成模型，137页pdf

【伯克利博士论文】可迁移生成模型，137页pdf

专知会员服务

54+阅读 · 2023年5月23日

DiffRec: 扩散推荐模型（SIGIR'23）

DiffRec: 扩散推荐模型（SIGIR'23）

专知会员服务

48+阅读 · 2023年4月16日

【CVPR2022】开放集半监督图像生成

【CVPR2022】开放集半监督图像生成

专知会员服务

23+阅读 · 2022年5月3日

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

【ACL2022】解释生成的多尺度分布深度变分自编码器, Multi-Scale Distribution Deep Variational Autoencoder for Explanation Generation

专知会员服务

12+阅读 · 2022年3月24日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

7 Papers & Radios | 谷歌推出DreamBooth扩散模型；张益唐零点猜想论文出炉

7 Papers & Radios | 谷歌推出DreamBooth扩散模型；张益唐零点猜想论文出炉

机器之心

2+阅读 · 2022年11月13日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

Benchmarking Spatial Relationships in Text-to-Image Generation

Arxiv

0+阅读 · 2023年6月6日

Interactive Editing for Text Summarization

Arxiv

0+阅读 · 2023年6月5日

Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays

Arxiv

0+阅读 · 2023年6月5日

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

Arxiv

0+阅读 · 2023年6月5日

Stable Diffusion is Untable

Arxiv

0+阅读 · 2023年6月5日

PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

Arxiv

0+阅读 · 2023年6月2日

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception

Arxiv

16+阅读 · 2022年3月25日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

相关基金

退化Fisher方程解的渐进性研究

国家自然科学基金

0+阅读 · 2015年12月31日

一类随机Navier-Stokes方程的数值解及其应用

国家自然科学基金

1+阅读 · 2015年12月31日

随机偏微分方程多辛几何算法及不确定性量化

国家自然科学基金

0+阅读 · 2015年12月31日

PI3K/Nrf2信号通路协同调控乳腺癌EMT及侵袭转移的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

几类具非标准增长的拟线性椭圆和抛物型方程的研究

国家自然科学基金

0+阅读 · 2012年12月31日

一类时滞积分方程解的存在性

国家自然科学基金

0+阅读 · 2012年12月31日

点传递图中若干问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于物理和几何的相变与凝聚现象

国家自然科学基金

0+阅读 · 2012年12月31日

基于两重网格的Navier-Stokes方程并行自适应后处理及变分多尺度算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

脉冲泛函微分方程边值问题及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员