While Transformer has achieved remarkable performance in various high-level vision tasks, it is still challenging to exploit the full potential of Transformer in image restoration. The crux lies in the limited depth of applying Transformer in the typical encoder-decoder framework for image restoration, resulting from heavy self-attention computation load and inefficient communications across different depth (scales) of layers. In this paper, we present a deep and effective Transformer-based network for image restoration, termed as U2-Former, which is able to employ Transformer as the core operation to perform image restoration in a deep encoding and decoding space. Specifically, it leverages the nested U-shaped structure to facilitate the interactions across different layers with different scales of feature maps. Furthermore, we optimize the computational efficiency for the basic Transformer block by introducing a feature-filtering mechanism to compress the token representation. Apart from the typical supervision ways for image restoration, our U2-Former also performs contrastive learning in multiple aspects to further decouple the noise component from the background image. Extensive experiments on various image restoration tasks, including reflection removal, rain streak removal and dehazing respectively, demonstrate the effectiveness of the proposed U2-Former.
翻译:虽然变异器在各种高层次的视觉任务中取得了显著的成绩,但充分发挥变异器在图像恢复中的全部潜力仍是一项艰巨的任务。 关键在于在典型的图像恢复编码-代码框架的典型编码-代码框架中应用变异器的深度有限, 其原因包括大量自省计算负荷, 以及不同深度( 尺度) 不同层的通信效率低下。 在本文中, 我们展示了一个以深而有效的变异器为基础的图像恢复网络, 称为U2- Former, 它能够利用变异器作为核心操作, 在深层编码和解码空间进行图像恢复。 具体地说, 它利用嵌入的U型结构促进不同层层与不同地貌地图的相互作用。 此外, 我们优化了基本变异器块的计算效率, 引入了一个功能过滤机制来压缩象征性的表示。 除了典型的图像恢复监督方式外, 我们的U2- Former还在多个方面进行对比性学习, 以进一步调和从背景图像中分离噪音组成部分。 对各种图像恢复任务进行了广泛的实验, 包括反映射去除、 摘除和去雨中和去。