学习稀疏Transformer网络以实现高效的图像去雨 (Learning A Sparse Transformer Network for Effective Image Deraining)

Transformers-based methods have achieved significant performance in image deraining as they can model the non-local information which is vital for high-quality image reconstruction. In this paper, we find that most existing Transformers usually use all similarities of the tokens from the query-key pairs for the feature aggregation. However, if the tokens from the query are different from those of the key, the self-attention values estimated from these tokens also involve in feature aggregation, which accordingly interferes with the clear image restoration. To overcome this problem, we propose an effective DeRaining network, Sparse Transformer (DRSformer) that can adaptively keep the most useful self-attention values for feature aggregation so that the aggregated features better facilitate high-quality image reconstruction. Specifically, we develop a learnable top-k selection operator to adaptively retain the most crucial attention scores from the keys for each query for better feature aggregation. Simultaneously, as the naive feed-forward network in Transformers does not model the multi-scale information that is important for latent clear image restoration, we develop an effective mixed-scale feed-forward network to generate better features for image deraining. To learn an enriched set of hybrid features, which combines local context from CNN operators, we equip our model with mixture of experts feature compensator to present a cooperation refinement deraining scheme. Extensive experimental results on the commonly used benchmarks demonstrate that the proposed method achieves favorable performance against state-of-the-art approaches. The source code and trained models are available at https://github.com/cschenxiang/DRSformer.

翻译：基于Transformer的方法已经在图像去雨方面取得了显著的性能，因为它们可以建模非局部信息，这对于高质量图像重建非常重要。然而，我们发现大多数现有的Transformer通常使用查询-键对中的所有标记相似性进行特征聚合。然而，如果查询的标记与键的标记不同，那么从这些标记估计的自注意力值也会涉及特征聚合，从而干扰清晰图像的恢复。为了解决这个问题，我们提出了一种有效的去雨网络Sparse Transformer (DRSformer)，可以自适应地保留最有用的自注意力值进行特征聚合，从而更好地促进高质量图像重建。具体来说，我们开发了一个可学习的top-k选择运算符，以自适应地为每个查询保留最关键的键的注意力分数，以便更好地进行特征聚合。同时，由于Transformer中的简单前馈网络不能建模多尺度信息，这对于潜在的清晰图像恢复是重要的，因此我们开发了一个有效的混合尺度前馈网络，以生成更好的用于图像去雨的特征。为了学习一组丰富的混合特征，结合CNN运算符的本地上下文，我们为模型配备了专家混合特征补偿器，呈现出一种协作精炼去雨方案。广泛的实验结果表明，所提出的方法在常用的基准测试中表现优异，超越了最先进的方法。源代码和训练模型可在 https://github.com/cschenxiang/DRSformer 上获得。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【CVPR2022】MSDN: 零样本学习的互语义蒸馏网络

专知会员服务

21+阅读 · 2022年3月8日

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日

【CVPR2020-英伟达】从图像集合中学习自监督视点，Self-Supervised Viewpoint Learning From Image Collections

专知会员服务

24+阅读 · 2020年4月4日