In the realm of multi-modality, text-guided image retouching techniques emerged with the advent of deep learning. Most currently available text-guided methods, however, rely on object-level supervision to constrain the region that may be modified. This not only makes it more challenging to develop these algorithms, but it also limits how widely deep learning can be used for image retouching. In this paper, we offer a text-guided mask-free image retouching approach that yields consistent results to address this concern. In order to perform image retouching without mask supervision, our technique can construct plausible and edge-sharp masks based on the text for each object in the image. Extensive experiments have shown that our method can produce high-quality, accurate images based on spoken language. The source code will be released soon.
翻译:在多时制领域,随着深入学习的到来,出现了文本制版图像转换技术。但是,大多数现有的文本制版方法都依靠目标级监督来限制可能修改的区域。这不仅使得开发这些算法更具挑战性,而且还限制了如何广泛利用深度学习来改造图像。在本文中,我们提供了一种无文本制版面图像调整方法,为解决这一问题提供了一致的结果。为了在没有遮罩监督的情况下进行图像转换,我们的技术可以根据图像中每个对象的文本构建出合理和精锐的面罩。广泛的实验表明,我们的方法可以产生高质量的、准确的、基于口头语言的图像。源代码将很快发布。</s>