迭代提示学习用于无监督背光图像增强 (Iterative Prompt Learning for Unsupervised Backlit Image Enhancement)

We propose a novel unsupervised backlit image enhancement method, abbreviated as CLIP-LIT, by exploring the potential of Contrastive Language-Image Pre-Training (CLIP) for pixel-level image enhancement. We show that the open-world CLIP prior not only aids in distinguishing between backlit and well-lit images, but also in perceiving heterogeneous regions with different luminance, facilitating the optimization of the enhancement network. Unlike high-level and image manipulation tasks, directly applying CLIP to enhancement tasks is non-trivial, owing to the difficulty in finding accurate prompts. To solve this issue, we devise a prompt learning framework that first learns an initial prompt pair by constraining the text-image similarity between the prompt (negative/positive sample) and the corresponding image (backlit image/well-lit image) in the CLIP latent space. Then, we train the enhancement network based on the text-image similarity between the enhanced result and the initial prompt pair. To further improve the accuracy of the initial prompt pair, we iteratively fine-tune the prompt learning framework to reduce the distribution gaps between the backlit images, enhanced results, and well-lit images via rank learning, boosting the enhancement performance. Our method alternates between updating the prompt learning framework and enhancement network until visually pleasing results are achieved. Extensive experiments demonstrate that our method outperforms state-of-the-art methods in terms of visual quality and generalization ability, without requiring any paired data.

翻译：我们提出了一种新颖的无监督背光图像增强方法CLIP-LIT，通过探索对基于像素的图像增强的对比性语言-图像预训练（CLIP）的潜力来实现。我们显示了开放世界CLIP先验不仅有助于区分背光和良好照明的图像，而且还能知觉具有不同亮度的异质区域，从而促进增强网络的优化。与高级和图像处理任务不同，直接将CLIP应用于增强任务是棘手的，这是由于在寻找准确提示方面存在困难。为了解决这个问题，我们设计了一个提示学习框架，首先通过约束CLIP潜在空间中的提示（负/正样本）与相应图像（背光图像/良好照明图像）之间的文本-图像相似度来学习初始提示对。然后，我们基于增强结果与初始提示对之间的文本-图像相似度来训练增强网络。为了进一步提高初始提示对的准确性，我们通过排名学习迭代微调提示学习框架，以减少背光图像、增强结果和良好照明图像之间的分布差异，从而提高增强性能。我们的方法通过更新提示学习框架和增强网络来交替进行，直到实现视觉上的良好结果。广泛的实验表明，我们的方法在视觉质量和泛化能力方面优于最先进的方法，而不需要任何成对数据。