Hair editing is an interesting and challenging problem in computer vision and graphics. Many existing methods require well-drawn sketches or masks as conditional inputs for editing, however these interactions are neither straightforward nor efficient. In order to free users from the tedious interaction process, this paper proposes a new hair editing interaction mode, which enables manipulating hair attributes individually or jointly based on the texts or reference images provided by users. For this purpose, we encode the image and text conditions in a shared embedding space and propose a unified hair editing framework by leveraging the powerful image text representation capability of the Contrastive Language-Image Pre-Training (CLIP) model. With the carefully designed network structures and loss functions, our framework can perform high-quality hair editing in a disentangled manner. Extensive experiments demonstrate the superiority of our approach in terms of manipulation accuracy, visual realism of editing results, and irrelevant attribute preservation. Project repo is https://github.com/wty-ustc/HairCLIP.
翻译:许多现有方法都需要精心绘制的草图或面具,作为编辑的有条件投入,但这些互动既非直截了当,也非有效。为了让用户摆脱烦琐的互动过程,本文件建议采用一种新的发型编辑互动模式,根据用户提供的文本或参考图像,单独或联合地调整发型属性。为此目的,我们将图像和文本条件编码在一个共同的嵌入空间中,并提议一个统一的发型编辑框架,利用竞争语言培训前培训模式(CLIP)的强大图像表达能力。在精心设计的网络结构和损失功能下,我们的框架可以以分解的方式进行高质量的发型编辑。广泛的实验表明我们的方法在操纵精度、编辑结果的视觉现实主义和无关的属性保护方面优胜。项目雷波是 https://github.com/wty-ustc/HairCLIP。