扩展文本条件下的文本到图像生成: $P+$ ($P+$: Extended Textual Conditioning in Text-to-Image Generation) - 专知论文

会员服务 ·

0

图像模型 · 文本到图像生成 · 图像生成 · U-Net · 表现力 ·

2023 年 3 月 27 日

$P+$: Extended Textual Conditioning in Text-to-Image Generation

翻译：扩展文本条件下的文本到图像生成: $P+$

Andrey Voynov,Qinghao Chu,Daniel Cohen-Or,Kfir Aberman

We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$. This space consists of multiple textual conditions, derived from per-layer prompts, each corresponding to a layer of the denoising U-net of the diffusion model. We show that the extended space provides greater disentangling and control over image synthesis. We further introduce Extended Textual Inversion (XTI), where the images are inverted into $P+$, and represented by per-layer tokens. We show that XTI is more expressive and precise, and converges faster than the original Textual Inversion (TI) space. The extended inversion method does not involve any noticeable trade-off between reconstruction and editability and induces more regular inversions. We conduct a series of extensive experiments to analyze and understand the properties of the new space, and to showcase the effectiveness of our method for personalizing text-to-image models. Furthermore, we utilize the unique properties of this space to achieve previously unattainable results in object-style mixing using text-to-image models. Project page: https://prompt-plus.github.io

翻译：我们在文本到图像模型中引入了一个扩展文本条件空间，称为$P+$。该空间包含了多个文本条件，这些条件来自扩散模型中去噪U-Net的每一层提示，每个条件都对应一个层。我们证明了扩展空间提供了更大的图像合成解缠和控制。我们进一步引入了扩展文本反转（XTI），其中图像被反向到$P+$，并由每层的token表示。我们证明了XTI比原始的文本反转（TI）空间更具表现力和精度，并且收敛速度更快。扩展反转方法不涉及重建和编辑之间的明显折衷，并引发了更规则的反转。我们进行了一系列广泛的实验来分析和理解新空间的性质，并展示了我们的方法在个性化文本到图像模型方面的有效性。此外，我们利用该空间的独特属性，在使用文本到图像模型进行对象样式混合时实现了以前无法实现的结果。项目页面：https://prompt-plus.github.io

0

相关内容

图像模型

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

专知会员服务

22+阅读 · 2022年12月5日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

miR-148a调控Wnt5A拮抗剂基因甲基化修饰在结核感染过程中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

NiO-GaN组合激光器件的电致激光发射机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

蓝宝石外延生长GaN厚膜的自分离及断裂机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

循环载荷下单晶高温合金变形和裂纹形核扩展的位错机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于复杂网络的中文文本语义相似度研究

国家自然科学基金

3+阅读 · 2012年12月31日

带有特定类型非线性微分算子的常微分方程边值问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

不同断裂形式的宏细观机理及裂纹扩展方向研究

国家自然科学基金

0+阅读 · 2009年12月31日

汉语句子理解中语义和句法整合的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

Optimal Decision Trees For Interpretable Clustering with Constraints (Extended Version)

Arxiv

0+阅读 · 2023年5月16日

Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Arxiv

0+阅读 · 2023年5月16日

A Conditional Denoising Diffusion Probabilistic Model for Radio Interferometric Image Reconstruction

Arxiv

0+阅读 · 2023年5月16日

A Lightweight Domain Adversarial Neural Network Based on Knowledge Distillation for EEG-based Cross-subject Emotion Recognition

Arxiv

0+阅读 · 2023年5月12日

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Arxiv

0+阅读 · 2023年5月12日

New constructions of optimal linear codes from simplicial complexes

Arxiv

0+阅读 · 2023年5月12日

Foundations of Spatial Perception for Robotics: Hierarchical Representations and Real-time Systems

Arxiv

0+阅读 · 2023年5月11日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance

Arxiv

13+阅读 · 2021年3月10日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

VIP会员

文章信息

相关主题

文本到图像生成

相关VIP内容

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

【AAAI2023】用于复杂场景图像合成的特征金字塔扩散模型

专知会员服务

22+阅读 · 2022年12月5日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Optimal Decision Trees For Interpretable Clustering with Constraints (Extended Version)

Arxiv

0+阅读 · 2023年5月16日

Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Arxiv

0+阅读 · 2023年5月16日

A Conditional Denoising Diffusion Probabilistic Model for Radio Interferometric Image Reconstruction

Arxiv

0+阅读 · 2023年5月16日

A Lightweight Domain Adversarial Neural Network Based on Knowledge Distillation for EEG-based Cross-subject Emotion Recognition

Arxiv

0+阅读 · 2023年5月12日

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Arxiv

0+阅读 · 2023年5月12日

New constructions of optimal linear codes from simplicial complexes

Arxiv

0+阅读 · 2023年5月12日

Foundations of Spatial Perception for Robotics: Hierarchical Representations and Real-time Systems

Arxiv

0+阅读 · 2023年5月11日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

SDD-FIQA: Unsupervised Face Image Quality Assessment with Similarity Distribution Distance

Arxiv

13+阅读 · 2021年3月10日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

相关基金

miR-148a调控Wnt5A拮抗剂基因甲基化修饰在结核感染过程中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

NiO-GaN组合激光器件的电致激光发射机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

蓝宝石外延生长GaN厚膜的自分离及断裂机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

循环载荷下单晶高温合金变形和裂纹形核扩展的位错机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于复杂网络的中文文本语义相似度研究

国家自然科学基金

3+阅读 · 2012年12月31日

带有特定类型非线性微分算子的常微分方程边值问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

不同断裂形式的宏细观机理及裂纹扩展方向研究

国家自然科学基金

0+阅读 · 2009年12月31日

汉语句子理解中语义和句法整合的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员