Convolutional neural network (CNN) and Transformer have achieved great success in multimedia applications. However, little effort has been made to effectively and efficiently harmonize these two architectures to satisfy image deraining. This paper aims to unify these two architectures to take advantage of their learning merits for image deraining. In particular, the local connectivity and translation equivariance of CNN and the global aggregation ability of self-attention (SA) in Transformer are fully exploited for specific local context and global structure representations. Based on the observation that rain distribution reveals the degradation location and degree, we introduce degradation prior to help background recovery and accordingly present the association refinement deraining scheme. A novel multi-input attention module (MAM) is proposed to associate rain perturbation removal and background recovery. Moreover, we equip our model with effective depth-wise separable convolutions to learn the specific feature representations and trade off computational complexity. Extensive experiments show that our proposed method (dubbed as ELF) outperforms the state-of-the-art approach (MPRNet) by 0.25 dB on average, but only accounts for 11.7\% and 42.1\% of its computational cost and parameters. The source code is available at https://github.com/kuijiang94/Magic-ELF.
翻译:在多媒体应用方面,电传神经网络(CNN)和变异器取得了巨大的成功,然而,在有效和高效地协调这两个结构以降低图像排减率方面,没有做出什么努力来有效而高效地协调这两个结构,本文件旨在统一这两个结构,以利用它们学习的优点来降低图像排减率;特别是,CNN的本地连通性和翻译等同性,以及变异器中自控(SA)的全球集合能力,已充分用于具体的当地背景和全球结构表现;根据雨水分布显示退化位置和程度的观察,我们先进行退化,然后帮助背景恢复,从而提出协会的改进脱线计划;建议采用新的多投入关注模块(MAM),将雨水渗透去除和背景恢复联系起来;此外,我们为我们的模型配备了有效的深度、可分解的连动演进式,以了解具体的特征表现和计算复杂性。 广泛实验显示,我们提出的方法(作为ELF的浸泡)超越了国家-艺术方法(MPRNet),然后进行背景恢复,从而提出改进了分解计划;建议采用新的多投入的模块模块单元单元(MPRNet),但只能计算11.7/MAK1/LFILF的计算。