使用文字到图像扩散模型驯服编码器实现零微调图像自定义 (Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models) - 专知论文

会员服务 ·

0

嵌入 · 微调 · 扩散模型 · 网络设计 · 范例 ·

2023 年 4 月 5 日

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

翻译：使用文字到图像扩散模型驯服编码器实现零微调图像自定义

Xuhui Jia,Yang Zhao,Kelvin C. K. Chan,Yandong Li,Han Zhang,Boqing Gong,Tingbo Hou,Huisheng Wang,Yu-Chuan Su

This paper proposes a method for generating images of customized objects specified by users. The method is based on a general framework that bypasses the lengthy optimization required by previous approaches, which often employ a per-object optimization paradigm. Our framework adopts an encoder to capture high-level identifiable semantics of objects, producing an object-specific embedding with only a single feed-forward pass. The acquired object embedding is then passed to a text-to-image synthesis model for subsequent generation. To effectively blend a object-aware embedding space into a well developed text-to-image model under the same generation context, we investigate different network designs and training strategies, and propose a simple yet effective regularized joint training scheme with an object identity preservation loss. Additionally, we propose a caption generation scheme that become a critical piece in fostering object specific embedding faithfully reflected into the generation process, while keeping control and editing abilities. Once trained, the network is able to produce diverse content and styles, conditioned on both texts and objects. We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity, without the need of test-time optimization. Systematic studies are also conducted to analyze our models, providing insights for future work.

翻译：本文提出了一种根据用户指定的定制对象生成图像的方法。该方法基于一般框架，绕过先前方法所需的漫长优化，这些方法通常采用每个对象优化范例。我们的框架采用编码器来捕捉对象高可识别语义，只需一个前向传递即可生成具有单独对象嵌入的定制化对象图片。然后，所获取的对象嵌入被传递给文本到图像合成模型进行后续生成。为了在相同的生成环境中有效地融合基于对象的嵌入空间到经过完善的文本到图像模型中，我们研究了不同的网络设计和训练策略，并提出了一个简单而有效的正则化联合训练方案，其中包括维护对象身份的衍生嵌入的标识损失。此外，我们提出了一种字幕生成方案，该方案在促进对象特定嵌入到生成过程中被忠实反映的同时保持了控制和编辑能力。一旦训练完成，该网络就能够产生多样的内容和风格，这些内容和风格与文本和对象的条件有关。我们通过实验证明，我们提出的方法能够生成具有引人注目的输出质量、外观多样性和对象忠实度的图像，无需测试时优化。我们还进行了系统研究，分析了我们的模型，为未来的工作提供了洞察力。

0

相关内容

【CVPR2023】基于图像特定提示学习的零样本生成模型自适应

【CVPR2023】基于图像特定提示学习的零样本生成模型自适应

专知会员服务

31+阅读 · 2023年4月7日

粗粒化分子构象生成

粗粒化分子构象生成

专知会员服务

10+阅读 · 2022年9月18日

【CVPR 2022】未知损坏的一体化图像恢复,All-In-One Image Restoration for Unknown Corruption

【CVPR 2022】未知损坏的一体化图像恢复,All-In-One Image Restoration for Unknown Corruption

专知会员服务

17+阅读 · 2022年3月28日

【CVPR2021】GAN人脸预训练模型

【CVPR2021】GAN人脸预训练模型

专知会员服务

24+阅读 · 2021年4月10日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

7 Papers & Radios | 谷歌推出DreamBooth扩散模型；张益唐零点猜想论文出炉

7 Papers & Radios | 谷歌推出DreamBooth扩散模型；张益唐零点猜想论文出炉

机器之心

2+阅读 · 2022年11月13日

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

新智元

0+阅读 · 2022年10月8日

逼真度超越「AI设计师」DALL·E 2！谷歌大脑推出新的文本生成图像模型Imagen

逼真度超越「AI设计师」DALL·E 2！谷歌大脑推出新的文本生成图像模型Imagen

大数据文摘

1+阅读 · 2022年5月24日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

量子码的构造

国家自然科学基金

1+阅读 · 2015年12月31日

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

结构化矢量图的模式样本合成与操控

国家自然科学基金

0+阅读 · 2013年12月31日

面向协作生成服务的社交搜索研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于相机的低质量文本图像的复原与增强关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIGHT增强IR加Veliparib诱导的衰老肿瘤细胞疫苗的抗肿瘤作用

国家自然科学基金

0+阅读 · 2012年12月31日

面向海量图像数据的检索技术的研究

国家自然科学基金

0+阅读 · 2011年12月31日

高精细模型的向量位移映射表示及几何处理

国家自然科学基金

0+阅读 · 2011年12月31日

基于局部不变性特征流的相异场景密集匹配

国家自然科学基金

0+阅读 · 2011年12月31日

冻融循环作用下砼及预应力砼结构的受力性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月25日

PTW: Pivotal Tuning Watermarking for Pre-Trained Image Generators

Arxiv

0+阅读 · 2023年5月24日

How Old is GPT?: The HumBEL Framework for Evaluating Language Models using Human Demographic Data

Arxiv

0+阅读 · 2023年5月24日

A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces

Arxiv

0+阅读 · 2023年5月23日

Realistic Noise Synthesis with Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2023】基于图像特定提示学习的零样本生成模型自适应

【CVPR2023】基于图像特定提示学习的零样本生成模型自适应

专知会员服务

31+阅读 · 2023年4月7日

粗粒化分子构象生成

粗粒化分子构象生成

专知会员服务

10+阅读 · 2022年9月18日

【CVPR 2022】未知损坏的一体化图像恢复,All-In-One Image Restoration for Unknown Corruption

【CVPR 2022】未知损坏的一体化图像恢复,All-In-One Image Restoration for Unknown Corruption

专知会员服务

17+阅读 · 2022年3月28日

【CVPR2021】GAN人脸预训练模型

【CVPR2021】GAN人脸预训练模型

专知会员服务

24+阅读 · 2021年4月10日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

7 Papers & Radios | 谷歌推出DreamBooth扩散模型；张益唐零点猜想论文出炉

7 Papers & Radios | 谷歌推出DreamBooth扩散模型；张益唐零点猜想论文出炉

机器之心

2+阅读 · 2022年11月13日

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

新智元

0+阅读 · 2022年10月8日

逼真度超越「AI设计师」DALL·E 2！谷歌大脑推出新的文本生成图像模型Imagen

逼真度超越「AI设计师」DALL·E 2！谷歌大脑推出新的文本生成图像模型Imagen

大数据文摘

1+阅读 · 2022年5月24日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年5月25日

PTW: Pivotal Tuning Watermarking for Pre-Trained Image Generators

Arxiv

0+阅读 · 2023年5月24日

How Old is GPT?: The HumBEL Framework for Evaluating Language Models using Human Demographic Data

Arxiv

0+阅读 · 2023年5月24日

A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces

Arxiv

0+阅读 · 2023年5月23日

Realistic Noise Synthesis with Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

42+阅读 · 2023年4月19日

Understanding Diffusion Models: A Unified Perspective

Arxiv

14+阅读 · 2022年8月25日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

相关基金

量子码的构造

国家自然科学基金

1+阅读 · 2015年12月31日

GPU加速和风格感知的艺术图像和谐克隆

国家自然科学基金

4+阅读 · 2014年12月31日

结构化矢量图的模式样本合成与操控

国家自然科学基金

0+阅读 · 2013年12月31日

面向协作生成服务的社交搜索研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于相机的低质量文本图像的复原与增强关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIGHT增强IR加Veliparib诱导的衰老肿瘤细胞疫苗的抗肿瘤作用

国家自然科学基金

0+阅读 · 2012年12月31日

面向海量图像数据的检索技术的研究

国家自然科学基金

0+阅读 · 2011年12月31日

高精细模型的向量位移映射表示及几何处理

国家自然科学基金

0+阅读 · 2011年12月31日

基于局部不变性特征流的相异场景密集匹配

国家自然科学基金

0+阅读 · 2011年12月31日

冻融循环作用下砼及预应力砼结构的受力性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员