Recent progress in generative models, especially in text-guided diffusion models, has enabled the production of aesthetically-pleasing imagery resembling the works of professional human artists. However, one has to carefully compose the textual description, called the prompt, and augment it with a set of clarifying keywords. Since aesthetics are challenging to evaluate computationally, human feedback is needed to determine the optimal prompt formulation and keyword combination. In this paper, we present a human-in-the-loop approach to learning the most useful combination of prompt keywords using a genetic algorithm. We also show how such an approach can improve the aesthetic appeal of images depicting the same descriptions.
翻译:基因模型的最近进展,特别是文本制成的传播模型,使得能够制作与人类专业艺术家作品相似的美学取人图像。然而,人们必须仔细地拼写文字描述,称为即时描述,并用一套澄清关键词加以补充。由于审美在计算评估上具有挑战性,因此需要人的反馈来确定最理想的迅速配制和关键词组合。在本文中,我们提出了一个人与人之间使用基因算法学习最有用的速用关键词组合的方法。我们还展示了这种方法如何能够改善描述相同描述的图像的审美吸引力。