Schrödinger 蝙蝠: 扩散模型有时在超置中生成聚合单词 (Schrödinger's Bat: Diffusion Models Sometimes Generate Polysemous Words in Superposition)

Recent work has shown that despite their impressive capabilities, text-to-image diffusion models such as DALL-E 2 (Ramesh et al., 2022) can display strange behaviours when a prompt contains a word with multiple possible meanings, often generating images containing both senses of the word (Rassin et al., 2022). In this work we seek to put forward a possible explanation of this phenomenon. Using the similar Stable Diffusion model (Rombach et al., 2022), we first show that when given an input that is the sum of encodings of two distinct words, the model can produce an image containing both concepts represented in the sum. We then demonstrate that the CLIP encoder used to encode prompts (Radford et al., 2021) encodes polysemous words as a superposition of meanings, and that using linear algebraic techniques we can edit these representations to influence the senses represented in the generated images. Combining these two findings, we suggest that the homonym duplication phenomenon described by Rassin et al. (2022) is caused by diffusion models producing images representing both of the meanings that are present in superposition in the encoding of a polysemous word.

翻译：最近的工作表明,尽管其能力令人印象深刻,但文本到图像的传播模型,如DALL-E 2 (Ramesh等人,2022年)等尽管其能力令人印象深刻,但当提示包含一个具有多种可能含义的单词时,文本到图像的传播模型(Ramesh等人,2022年)可能表现出奇怪的行为,这常常产生含有该词两种感知的图像(Rassin等人,2022年),在这项工作中,我们试图提出对这一现象的可能解释。我们使用类似的稳定传播模型(Rombach等人,2022年),我们首先显示,如果输入一个包含两个不同词的编码,该模型就可以产生包含两个概念的总和的图像。我们然后表明,用于编码提示的 CLIP 编码器(Radford等人,2021年) 的编码组合单词作为含义的叠加,并且使用线性代数的代数技术,我们可以对这些表达方式进行编辑,以影响生成图像中所代表的感知感知的感官。将这两种发现结合起来,我们建议Rassin等人所描述的同性重复的现象(2022年)是由于在目前两个版本中的图像中产生一个多式的图像。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/