以编码器为基础的基于编码器的文本到图像模型快速个化域域图</s> (Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models)

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based domain-tuning approach. Our key insight is that by underfitting on a large set of concepts from a given domain, we can improve generalization and create a model that is more amenable to quickly adding novel concepts from the same domain. Specifically, we employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain, e.g. a specific face, and learns to map it into a word-embedding representing the concept. Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts. Together, these components are used to guide the learning of unseen concepts, allowing us to personalize a model using only a single image and as few as 5 training steps - accelerating personalization from dozens of minutes to seconds, while preserving quality.

翻译：文字到图像个人化的目的是教一个经过预先训练的传播模型,以了解用户提供的新概念,将其嵌入由自然语言提示指导的新场景。然而,当前个性化方法与冗长的培训时间、高存储要求或身份丢失相争。为了克服这些限制,我们提议了一个基于编码器的域调法方法。我们的关键见解是,通过从一个特定领域对一大批概念进行校准,我们可以改进一般化,创建一个更便于迅速从同一领域添加新概念的模式。具体地说,我们使用两个组成部分: 首先,一个编码器,作为输入一个特定领域目标概念的单一图像,例如一个特定面孔,并学会将其绘制成一个代表这个概念的字组。其次,一套固定化的文本到图像模型的权重,以学习如何有效地吸收更多概念。这些组成部分一起用来指导对未知概念的学习,使我们能够将一个模型个人化为个人化,只使用一个图像,只有5个培训步骤—— 加速个人化,同时保存数十分钟的质量。</s>

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日