以确定性目标评估加速文字翻版 (Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation)

Text-to-image generation models represent the next step of evolution in image synthesis, offering natural means of flexible yet fine-grained control over the result. One emerging area of research is the rapid adaptation of large text-to-image models to smaller datasets or new visual concepts. However, the most efficient method of adaptation, called textual inversion, has a known limitation of long training time, which both restricts practical applications and increases the experiment time for research. In this work, we study the training dynamics of textual inversion, aiming to speed it up. We observe that most concepts are learned at early stages and do not improve in quality later, but standard model convergence metrics fail to indicate that. Instead, we propose a simple early stopping criterion that only requires computing the textual inversion loss on the same inputs for all training iterations. Our experiments on both Latent Diffusion and Stable Diffusion models for 93 concepts demonstrate the competitive performance of our method, speeding adaptation up to 15 times with no significant drops in quality.

翻译：文本到图像生成模型代表了图像合成的下一个进化步骤,提供了灵活但细微控制结果的自然手段。一个新兴的研究领域是将大型文本到图像模型迅速适应于较小的数据集或新的视觉概念。然而,最有效的适应方法,即称为文字翻版,对长期培训时间的已知限制是有限的,这既限制了实际应用,也增加了实验研究时间。在这项工作中,我们研究了文本转换的培训动态,目的是加速其速度。我们发现大多数概念都是在早期阶段学习的,在质量上没有在以后改进,但标准模型合并指标却没有表明这一点。相反,我们提出了一个简单的早期停止标准,要求所有培训迭代输入的相同投入只计算文字转换损失。我们关于93个概念的Lenttent Difupulation和Stabtable Difulution模型的实验显示了我们方法的竞争性表现,在质量没有显著下降的情况下加快了15次的适应速度。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/