更新: 具有跨模式指导的统一文本到图像传播生成 (UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance) - 专知论文

会员服务 ·

0

Guidance · SimPLe · MoDELS · 逼真度 · 语言模型化 ·

2022 年 11 月 3 日

UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance

翻译：更新: 具有跨模式指导的统一文本到图像传播生成

Wei Li,Xue Xu,Xinyan Xiao,Jiachen Liu,Hu Yang,Guohao Li,Zhanpeng Wang,Zhifan Feng,Qiaoqiao She,Yajuan Lyu,Hua Wu

from arxiv, First Version, 16 pages

Diffusion generative models have recently greatly improved the power of text-conditioned image generation. Existing image generation models mainly include text conditional diffusion model and cross-modal guided diffusion model, which are good at small scene image generation and complex scene image generation respectively. In this work, we propose a simple yet effective approach, namely UPainting, to unify simple and complex scene image generation, as shown in Figure 1. Based on architecture improvements and diverse guidance schedules, UPainting effectively integrates cross-modal guidance from a pretrained image-text matching model into a text conditional diffusion model that utilizes a pretrained Transformer language model as the text encoder. Our key findings is that combining the power of large-scale Transformer language model in understanding language and image-text matching model in capturing cross-modal semantics and style, is effective to improve sample fidelity and image-text alignment of image generation. In this way, UPainting has a more general image generation capability, which can generate images of both simple and complex scenes more effectively. To comprehensively compare text-to-image models, we further create a more general benchmark, UniBench, with well-written Chinese and English prompts in both simple and complex scenes. We compare UPainting with recent models and find that UPainting greatly outperforms other models in terms of caption similarity and image fidelity in both simple and complex scenes. UPainting project page \url{https://upainting.github.io/}.

翻译：在这项工作中,我们提出了一个简单而有效的方法,即Upainting,以统一简单而复杂的现场图像生成,如图1所示,以在结构改进和不同指导时间表的基础上,将预先培训的图像匹配模型的跨模式性指导有效地纳入到一个有条件的文本传播模型中,该模型使用预先培训的变异语言模型作为文本编码。我们的主要发现是,将大规模变异语言模型在理解语言和图像文本匹配模型中的力量结合起来,以捕捉跨模式的语义和风格,能够有效地改进图像生成的精度和图像文本一致性。在这种方式中,更新具有更一般的图像生成能力,可以更有效地生成简单和复杂的图像。为了全面比较文本到图像模型,我们进一步创建了一个更普遍的缩略基准,UniBench 将大型变异异语言模型和图像匹配模型结合起来,在简单和复杂的英式模型中,我们进一步创建了一个更普通的直观基准,UniBeinthillech 和快速的图像模型。

0

相关内容

Guidance

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

浅谈问题生成（Question Generation）

浅谈问题生成（Question Generation）

PaperWeekly

5+阅读 · 2021年12月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

LncRNA-HOTAIR介导酸性微环境下胰腺癌细胞侵袭转移的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米晶铁芯的介观高频饱和机理与双尺度耦合建模方法

国家自然科学基金

0+阅读 · 2013年12月31日

陶瓷/金属杂化超常材料电磁感应透明效应及调谐机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Julia集的分形性质与牛顿映照若干问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向传动轴系的橡胶－硅油组合式粘弹性减振器设计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

MiR-155/β-arrestin 2/GSK3β通路在Sca-1+心脏干细胞向心肌分化中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-145/PAK4/LIMK1调控通路介导结直肠癌肝转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非均匀电蠕变导致裂纹尖端铁电畴反转及发射机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于有序硅纳米线阵列的有机-无机杂化太阳电池界面调控及光电特性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Hsa-mir-126调控PKCdelta/ERK信号通路及其在系统性红斑狼疮发病机理中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Medical Diffusion: Denoising Diffusion Probabilistic Models for 3D Medical Image Generation

Medical Diffusion: Denoising Diffusion Probabilistic Models for 3D Medical Image Generation

Arxiv

0+阅读 · 2022年12月23日

GENIE: Large Scale Pre-training for Text Generation with Diffusion Model

GENIE: Large Scale Pre-training for Text Generation with Diffusion Model

Arxiv

0+阅读 · 2022年12月22日

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Arxiv

0+阅读 · 2022年12月22日

Critic-Guided Decoding for Controlled Text Generation

Arxiv

0+阅读 · 2022年12月21日

DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation

Arxiv

0+阅读 · 2022年12月21日

DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models

Arxiv

0+阅读 · 2022年12月17日

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2022年12月16日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

数据智能体综述：新兴范式还是被高估的炒作？

海底战已至：美国构思海底安全战略 | 最新报告

【ICCV2025教程】视觉异常检测中的基础模型：进展、挑战与应用

美军将无人自主等新技术融入潜艇部队以更具杀伤力

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

浅谈问题生成（Question Generation）

浅谈问题生成（Question Generation）

PaperWeekly

5+阅读 · 2021年12月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Medical Diffusion: Denoising Diffusion Probabilistic Models for 3D Medical Image Generation

Medical Diffusion: Denoising Diffusion Probabilistic Models for 3D Medical Image Generation

Arxiv

0+阅读 · 2022年12月23日

GENIE: Large Scale Pre-training for Text Generation with Diffusion Model

GENIE: Large Scale Pre-training for Text Generation with Diffusion Model

Arxiv

0+阅读 · 2022年12月22日

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Arxiv

0+阅读 · 2022年12月22日

Critic-Guided Decoding for Controlled Text Generation

Arxiv

0+阅读 · 2022年12月21日

DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation

Arxiv

0+阅读 · 2022年12月21日

DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models

Arxiv

0+阅读 · 2022年12月17日

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2022年12月16日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

LncRNA-HOTAIR介导酸性微环境下胰腺癌细胞侵袭转移的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米晶铁芯的介观高频饱和机理与双尺度耦合建模方法

国家自然科学基金

0+阅读 · 2013年12月31日

陶瓷/金属杂化超常材料电磁感应透明效应及调谐机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Julia集的分形性质与牛顿映照若干问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向传动轴系的橡胶－硅油组合式粘弹性减振器设计方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

MiR-155/β-arrestin 2/GSK3β通路在Sca-1+心脏干细胞向心肌分化中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-145/PAK4/LIMK1调控通路介导结直肠癌肝转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非均匀电蠕变导致裂纹尖端铁电畴反转及发射机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于有序硅纳米线阵列的有机-无机杂化太阳电池界面调控及光电特性研究

国家自然科学基金

0+阅读 · 2011年12月31日

Hsa-mir-126调控PKCdelta/ERK信号通路及其在系统性红斑狼疮发病机理中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员