Despite having promising results, style transfer, which requires preparing style images in advance, may result in lack of creativity and accessibility. Following human instruction, on the other hand, is the most natural way to perform artistic style transfer that can significantly improve controllability for visual effect applications. We introduce a new task -- language-driven image style transfer (\texttt{LDIST}) -- to manipulate the style of a content image, guided by a text. We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish \texttt{LDIST} by the patch-wise style discriminator. The discriminator considers the correlation between language and patches of style images or transferred results to jointly embed style instructions. CLVA further compares contrastive pairs of content image and style instruction to improve the mutual relativeness between transfer results. The transferred results from the same content image can preserve consistent content structures. Besides, they should present analogous style patterns from style instructions that contain similar visual semantics. The experiments show that our CLVA is effective and achieves superb transferred results on \texttt{LDIST}.
翻译:尽管取得了令人充满希望的结果,但风格转换需要事先制作样式图像,这可能导致缺乏创造力和无障碍性。另一方面,在人类教学之后,艺术风格转换是最自然的方法,可以大大提高视觉效果应用程序的可控性。我们引入了一项新的任务 -- -- 语言驱动图像样式转换(\ textt{LDIST}) -- -- 以文本为指导,操控内容图像的样式。我们提出了具有对比性的语言视觉艺术家(CLVA),该视觉艺术家学习从样式指令中提取视觉语义,并通过补丁风格分析师完成\ textt{LDIST}。歧视者考虑了风格图像的语言和补丁或结果传输到联合嵌入样式指令之间的关联性。CLVA进一步比较了内容图像和风格教学的对比性配对,以提高传输结果之间的相对性。同一内容图像的传输结果可以维护一致的内容结构。此外,它们应该从含有类似视觉语义的样式指示中呈现相似的样式模式。实验显示,我们的CLVA是有效的,并且实现了在\texttLDIS}。