文本到图像合成 (Layout-Bridging Text-to-Image Synthesis) - 专知论文

会员服务 ·

0

Learning · MoDELS · Extensibility · Guidance · Performer ·

2022 年 8 月 12 日

Layout-Bridging Text-to-Image Synthesis

翻译：文本到图像合成

Jiadong Liang,Wenjie Pei,Feng Lu

The crux of text-to-image synthesis stems from the difficulty of preserving the cross-modality semantic consistency between the input text and the synthesized image. Typical methods, which seek to model the text-to-image mapping directly, could only capture keywords in the text that indicates common objects or actions but fail to learn their spatial distribution patterns. An effective way to circumvent this limitation is to generate an image layout as guidance, which is attempted by a few methods. Nevertheless, these methods fail to generate practically effective layouts due to the diversity of input text and object location. In this paper we push for effective modeling in both text-to-layout generation and layout-to-image synthesis. Specifically, we formulate the text-to-layout generation as a sequence-to-sequence modeling task, and build our model upon Transformer to learn the spatial relationships between objects by modeling the sequential dependencies between them. In the stage of layout-to-image synthesis, we focus on learning the textual-visual semantic alignment per object in the layout to precisely incorporate the input text into the layout-to-image synthesizing process. To evaluate the quality of generated layout, we design a new metric specifically, dubbed Layout Quality Score, which considers both the absolute distribution errors of bounding boxes in the layout and the mutual spatial relationships between them. Extensive experiments on three datasets demonstrate the superior performance of our method over state-of-the-art methods on both predicting the layout and synthesizing the image from the given text.

翻译：文本到图像合成的柱石源于难以保持输入文本和合成图像之间的跨模式语义一致性。典型的方法试图直接模拟文本到图像的映射,但只能捕捉文本中显示共同对象或行动的关键字, 但却没有学习它们的空间分布模式。绕开这一限制的有效方法就是生成图像布局作为指导, 这是少数方法尝试的。然而, 由于输入文本和对象位置的多样性, 这些方法无法产生实际有效的布局。在本文中, 我们推力在文本到布局生成和布局到图像合成中进行有效的建模。具体地说, 我们将文本到布局生成作为顺序到图像映射的模型, 并且用变动器来学习天体之间的空间关系, 在布局到图像合成的合成阶段, 我们侧重于在布局上给出的文本到准确的图像到将输入文本纳入到文本的文本中, 具体地将图像的布局和布局的布局的布局的布局的绝对质量关系。

0

相关内容

Learning

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

基于GIS的玉米品种种植灾害胁迫风险评价与区划制图方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

p-MSK1 (Thr581)影响结直肠癌预后的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

大气压直流微等离子体辅助液相合成和液相修饰纳米粒子研究

国家自然科学基金

0+阅读 · 2012年12月31日

综合InSAR与GPS江苏沿海湿地储水量变化监测研究

国家自然科学基金

0+阅读 · 2012年12月31日

锡林郭勒草地排放的挥发性有机物对大气环境的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

含2-氨基嘧啶π-共轭聚合物的合成及其光、电性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

乌鲁木齐市大气污染空间格局变化趋势及酸沉降的研究

国家自然科学基金

0+阅读 · 2011年12月31日

张应力作用下颅底软骨联合的差异蛋白质组学研究

国家自然科学基金

0+阅读 · 2009年12月31日

一种多旋翼多功能空中机器人的设计研究

国家自然科学基金

0+阅读 · 2009年12月31日

MIMO检测与合并中的智能信号处理研究

国家自然科学基金

0+阅读 · 2009年12月31日

KPE: Keypoint Pose Encoding for Transformer-based Image Generation

Arxiv

0+阅读 · 2022年10月6日

Progressive Denoising Model for Fine-Grained Text-to-Image Generation

Arxiv

0+阅读 · 2022年10月5日

Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

Arxiv

0+阅读 · 2022年9月29日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

Arxiv

10+阅读 · 2020年3月20日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

VIP会员

文章信息

相关主题

相关VIP内容

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

【ICML2025教程】生成式人工智能遇上强化学习

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

全球人工智能社会发展研究报告(2025)

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

相关论文

KPE: Keypoint Pose Encoding for Transformer-based Image Generation

Arxiv

0+阅读 · 2022年10月6日

Progressive Denoising Model for Fine-Grained Text-to-Image Generation

Arxiv

0+阅读 · 2022年10月5日

Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

Arxiv

0+阅读 · 2022年9月29日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

Arxiv

10+阅读 · 2020年3月20日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

相关基金

基于GIS的玉米品种种植灾害胁迫风险评价与区划制图方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

p-MSK1 (Thr581)影响结直肠癌预后的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

大气压直流微等离子体辅助液相合成和液相修饰纳米粒子研究

国家自然科学基金

0+阅读 · 2012年12月31日

综合InSAR与GPS江苏沿海湿地储水量变化监测研究

国家自然科学基金

0+阅读 · 2012年12月31日

锡林郭勒草地排放的挥发性有机物对大气环境的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

含2-氨基嘧啶π-共轭聚合物的合成及其光、电性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

乌鲁木齐市大气污染空间格局变化趋势及酸沉降的研究

国家自然科学基金

0+阅读 · 2011年12月31日

张应力作用下颅底软骨联合的差异蛋白质组学研究

国家自然科学基金

0+阅读 · 2009年12月31日

一种多旋翼多功能空中机器人的设计研究

国家自然科学基金

0+阅读 · 2009年12月31日

MIMO检测与合并中的智能信号处理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员