Despite the progress made in the style transfer task, most previous work focus on transferring only relatively simple features like color or texture, while missing more abstract concepts such as overall art expression or painter-specific traits. However, these abstract semantics can be captured by models like DALL-E or CLIP, which have been trained using huge datasets of images and textual documents. In this paper, we propose StylerDALLE, a style transfer method that exploits both of these models and uses natural language to describe abstract art styles. Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation, i.e., from input content image to output stylized image, in the discrete latent space of a large-scale pretrained vector-quantized tokenizer. To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision that ensures stylization and content preservation simultaneously. Experimental results demonstrate the superiority of our method, which can effectively transfer art styles using language instructions at different granularities. Code is available at https://github.com/zipengxuc/StylerDALLE.
翻译:尽管在风格转换任务方面取得了进展,但大多数先前的工作重心只是相对简单的特征,如颜色或纹理等,而忽略了更抽象的概念,如总体艺术表达或画家特质等。然而,这些抽象的语义可以被DALL-E或CLIP等模型所捕捉,这些模型经过了使用图像和文本文档的巨大数据集的培训。在本文中,我们提出了StyrrDALLE,这是一个利用这些模型和自然语言来描述抽象艺术风格的风格转换方法。具体地说,我们把语言指导风格传输任务设计成一种非不显性符号序列翻译,即从输入内容图像到输出模缩化图像,在大规模预先训练的矢量代号的离散潜在空间里。为了纳入样式信息,我们提议与基于CLIP的语言监督一起实施强化学习战略,以确保同步和内容保存。实验结果展示了我们方法的优越性,可以在不同颗粒度上使用语言指令有效地传输艺术风格。代码可在 https://glius/Emplyusub./spencomcodedeal.</s>