Most existing image restoration methods use neural networks to learn strong image-level priors from huge data to estimate the lost information. However, these works still struggle in cases when images have severe information deficits. Introducing external priors or using reference images to provide information also have limitations in the application domain. In contrast, text input is more readily available and provides information with higher flexibility. In this work, we design an effective framework that allows the user to control the restoration process of degraded images with text descriptions. We use the text-image feature compatibility of the CLIP to alleviate the difficulty of fusing text and image features. Our framework can be used for various image restoration tasks, including image inpainting, image super-resolution, and image colorization. Extensive experiments demonstrate the effectiveness of our method.
翻译:多数现有的图像恢复方法都使用神经网络从巨大的数据中学习强烈的图像级前科,以估计丢失的信息。然而,在图像严重缺乏信息的情况下,这些工作仍然难以完成。引入外部前科或使用参考图像提供信息在应用领域也有局限性。相反,文本输入更容易获得,并且提供更具灵活性的信息。在这项工作中,我们设计了一个有效的框架,使用户能够控制退化图像的恢复过程和文字描述。我们使用CLIP的文本图像兼容性来减轻引信文本和图像特征的困难。我们的框架可以用于各种图像恢复任务,包括图像油漆、图像超分辨率和图像色彩化。广泛的实验证明了我们方法的有效性。</s>