We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that \texttt{Apoploe vesrreaitais} means birds and \texttt{Contarra ccetnxniams luryca tanniounons} (sometimes) means bugs or pests. We find that these prompts are often consistent in isolation but also sometimes in combinations. We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.
翻译:我们发现 DALLE-2 似乎有一个隐藏的词汇表, 可以用荒谬的提示来生成图像。 例如, 似乎\ texttt{ Apopploe vesrreaitais} 意指鸟类和\ textt{ Contra ccentnxniams luryca tanniounons} (有时) 意指昆虫或害虫。 我们发现这些提示往往是孤立的, 但有时也是组合的。 我们展示了我们的黑盒方法, 来发现看起来随机但与视觉概念有某种对应的词。 这造成了重要的安全和可解释性挑战 。