The strength of modern generative models lies in their ability to be controlled through text-based prompts. Typical "hard" prompts are made from interpretable words and tokens, and must be hand-crafted by humans. There are also "soft" prompts, which consist of continuous feature vectors. These can be discovered using powerful optimization methods, but they cannot be easily interpreted, re-used across models, or plugged into a text-based interface. We describe an approach to robustly optimize hard text prompts through efficient gradient-based optimization. Our approach automatically generates hard text-based prompts for both text-to-image and text-to-text applications. In the text-to-image setting, the method creates hard prompts for diffusion models, allowing API users to easily generate, discover, and mix and match image concepts without prior knowledge on how to prompt the model. In the text-to-text setting, we show that hard prompts can be automatically discovered that are effective in tuning LMs for classification.
翻译:现代基因模型的力量在于它们能够通过基于文本的提示来控制。 典型的“ 硬性” 提示来自可解释的单词和符号, 并且必须由人类手工制作。 还有一些“ 软性” 提示, 由连续的特性矢量组成。 这些提示可以使用强大的优化方法来发现, 但是它们不能轻易地解释, 在各种模型之间重新使用, 或者插入基于文本的界面。 我们描述一种通过高效的基于梯度的优化来强有力优化硬性文本的方法。 我们的方法会自动生成基于文本的硬性提示, 既包括文本到图像的提示, 也包括文本到图像的应用。 在文本到图像的设置中, 该方法为传播模型创造了硬性提示, 使API 用户在不事先了解如何推动模型的情况下, 能够轻松生成、 发现、 混合和匹配图像概念。 在文本到文本的设置中, 我们显示硬性提示可以被自动发现, 能够对 LMM 进行精准分类 。