Automatic image editing has great demands because of its numerous applications, and the use of natural language instructions is essential to achieving flexible and intuitive editing as the user imagines. A pioneering work in text-driven image editing, StyleCLIP, finds an edit direction in the CLIP space and then edits the image by mapping the direction to the StyleGAN space. At the same time, it is difficult to tune appropriate inputs other than the original image and text instructions for image editing. In this study, we propose a method to construct the edit direction adaptively in the StyleGAN and CLIP spaces with SVM. Our model represents the edit direction as a normal vector in the CLIP space obtained by training a SVM to classify positive and negative images. The images are retrieved from a large-scale image corpus, originally used for pre-training StyleGAN, according to the CLIP similarity between the images and the text instruction. We confirmed that our model performed as well as the StyleCLIP baseline, whereas it allows simple inputs without increasing the computational time.
翻译:自动图像编辑在众多应用方面有着巨大需求,自然语言指令的使用对于实现用户想象中的灵活和直观的编辑是至关重要的。StyleCLIP是文本驱动图像编辑的开创性工作,它在CLIP空间中找到一个编辑方向,然后通过将方向映射到StyleGAN空间来编辑图像。与此同时,对于图像编辑除了原始图像和文本指令以外调整适当的输入是困难的。在本研究中,我们提出了一种以SVM自适应地在StyleGAN和CLIP空间中构造编辑方向的方法。我们的模型将编辑方向表示为在CLIP空间中的法向量,通过训练SVM分类正负图像获得。使用大规模图像语料库,该语料库最初用于训练StyleGAN,根据图像和文本指令之间的CLIP相似性检索图像。我们证实,我们的模型的表现与StyleCLIP基线相当,同时它允许简单的输入而不增加计算时间。