插上并玩转多模态合成：使用扩散模型 (Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models) - 专知论文

会员服务 ·

0

扩散模型 · 配对数据 · DDPM · 模态 · 多模 ·

2023 年 4 月 20 日

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

翻译：插上并玩转多模态合成：使用扩散模型

Nithin Gopalakrishnan Nair,Wele Gedara Chaminda Bandara,Vishal M. Patel

from arxiv, Accepted at CVPR 2023

Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html

翻译：生成满足多个限制条件的照片在内容创作行业中有广泛的应用。完成这项任务的关键障碍是需要包含所有模式（即限制条件）和相应输出的配对数据。此外，现有方法需要使用所有模态之间的配对数据进行重新训练来引入新条件。本文基于去噪扩散概率模型（DDPM）提出了解决这个问题的解决方案。我们选择扩散模型而不是其他生成模型的动机来自于扩散模型的灵活内部结构。由于DDPM中每个采样步骤都遵循高斯分布，我们展示了存在一个闭合形式的解决方案，用于生成满足各种限制条件的图像。我们的方法可以将多个在多个子任务上进行训练的扩散模型合并，并通过我们提出的采样策略征服合并的任务。我们还引入了一个新颖的可靠性参数，它允许在采样期间使用针对不同数据集训练的不同现成扩散模型来引导采样器达到满足多个限制条件的期望结果。我们在各种标准的多模态任务上进行实验，以展示我们方法的有效性。更多详情可以在 https://nithin-gk.github.io/projectpages/Multidiff/index.html 中找到。

0

相关内容

扩散模型

扩散模型是近年来快速发展并得到广泛关注的生成模型。它通过一系列的加噪和去噪过程，在复杂的图像分布和高斯分布之间建立联系，使得模型最终能将随机采样的高斯噪声逐步去噪得到一张图像。

【CVPR2022】开放集半监督图像生成

【CVPR2022】开放集半监督图像生成

专知会员服务

23+阅读 · 2022年5月3日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

近期必读的5篇顶会CVPR 2021【图像/视频描述生成】相关论文和代码

专知会员服务

48+阅读 · 2021年4月25日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

采样提速256倍，蒸馏扩散模型生成图像质量媲美教师模型，只需4步

采样提速256倍，蒸馏扩散模型生成图像质量媲美教师模型，只需4步

机器之心

0+阅读 · 2022年10月11日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

CVPR2019 | 03-23日更新6篇论文及代码汇总（三维重建、图像文本生成等）

CVPR2019 | 03-23日更新6篇论文及代码汇总（三维重建、图像文本生成等）

极市平台

14+阅读 · 2019年3月23日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

AAV-p65shRNA和AAV-BMP4联合应用抑制早期骨性关节炎软骨细胞退变的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

助剂修饰增强银基光催化材料的稳定性与光催化活性

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

Cu/Al复合带固-液铸轧电流强化复合成形技术基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

机器人节律运动控制框架模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

有机低维晶态光电材料及器件的设计、构筑和应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

六边形多源CT结构及混合投影分离算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

胰安肽（Aglycin）治疗2型糖尿病的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型前氮磷川超强碱的合成及在有机反应中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

强激光诱导核反应的理论和数值模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models

Arxiv

0+阅读 · 2023年6月6日

Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach

Arxiv

0+阅读 · 2023年6月6日

Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models

Arxiv

0+阅读 · 2023年6月5日

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

Arxiv

0+阅读 · 2023年6月5日

INDigo: An INN-Guided Probabilistic Diffusion Algorithm for Inverse Problems

Arxiv

0+阅读 · 2023年6月5日

Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data Using Contrastive Learning with Varying Pre-Training Domains

Arxiv

0+阅读 · 2023年6月2日

PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

Arxiv

0+阅读 · 2023年6月2日

Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2022】开放集半监督图像生成

【CVPR2022】开放集半监督图像生成

专知会员服务

23+阅读 · 2022年5月3日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

近期必读的5篇顶会CVPR 2021【图像/视频描述生成】相关论文和代码

专知会员服务

48+阅读 · 2021年4月25日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《可信的医学问答：以评估为中心的综述》

深度学习视频超分辨率综述

2025年人工智能趋势报告（中英文版）｜附340页PDF文件下载

【剑桥博士论文】基于图像的三维重建：神经隐式表示的可微渲染方法

相关资讯

采样提速256倍，蒸馏扩散模型生成图像质量媲美教师模型，只需4步

采样提速256倍，蒸馏扩散模型生成图像质量媲美教师模型，只需4步

机器之心

0+阅读 · 2022年10月11日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

CVPR2019 | 03-23日更新6篇论文及代码汇总（三维重建、图像文本生成等）

CVPR2019 | 03-23日更新6篇论文及代码汇总（三维重建、图像文本生成等）

极市平台

14+阅读 · 2019年3月23日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models

Arxiv

0+阅读 · 2023年6月6日

Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach

Arxiv

0+阅读 · 2023年6月6日

Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models

Arxiv

0+阅读 · 2023年6月5日

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

MotionDiffuser: Controllable Multi-Agent Motion Prediction using Diffusion

Arxiv

0+阅读 · 2023年6月5日

INDigo: An INN-Guided Probabilistic Diffusion Algorithm for Inverse Problems

Arxiv

0+阅读 · 2023年6月5日

Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data Using Contrastive Learning with Varying Pre-Training Domains

Arxiv

0+阅读 · 2023年6月2日

PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars

Arxiv

0+阅读 · 2023年6月2日

Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

相关基金

AAV-p65shRNA和AAV-BMP4联合应用抑制早期骨性关节炎软骨细胞退变的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

助剂修饰增强银基光催化材料的稳定性与光催化活性

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

Cu/Al复合带固-液铸轧电流强化复合成形技术基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

机器人节律运动控制框架模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

有机低维晶态光电材料及器件的设计、构筑和应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

六边形多源CT结构及混合投影分离算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

胰安肽（Aglycin）治疗2型糖尿病的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型前氮磷川超强碱的合成及在有机反应中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

强激光诱导核反应的理论和数值模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员