Most of the existing blind image Super-Resolution (SR) methods assume that the blur kernels are space-invariant. However, the blur involved in real applications are usually space-variant due to object motion, out-of-focus, etc., resulting in severe performance drop of the advanced SR methods. To address this problem, we firstly introduce two new datasets with out-of-focus blur, i.e., NYUv2-BSR and Cityscapes-BSR, to support further researches of blind SR with space-variant blur. Based on the datasets, we design a novel Cross-MOdal fuSion network (CMOS) that estimate both blur and semantics simultaneously, which leads to improved SR results. It involves a feature Grouping Interactive Attention (GIA) module to make the two modalities interact more effectively and avoid inconsistency. GIA can also be used for the interaction of other features because of the universality of its structure. Qualitative and quantitative experiments compared with state-of-the-art methods on above datasets and real-world images demonstrate the superiority of our method, e.g., obtaining PSNR/SSIM by +1.91/+0.0048 on NYUv2-BSR than MANet.
翻译:大多数现有的盲超分辨率方法假定模糊核是空间不变的。然而,在真实应用中涉及的模糊通常是空间变化的,由于对象运动、失焦等原因,导致高级超分辨率方法的性能严重下降。为了解决这个问题,我们首先介绍了两个包含失焦模糊数据集,即NYUv2-BSR和Cityscapes-BSR,以支持进一步研究具有空间变量模糊的盲超分辨率。基于这些数据集,我们设计了一个新型的交叉模态融合网络(CMOS),它可以同时估计模糊和语义,从而实现了更好的SR结果。它包括一个特征分组交互式注意力(GIA)模块,使两种模态的交互更有效,并避免不一致性。由于其结构的普适性,GIA也可以用于其他特征的交互。与NYUv2-BSR上的MANet相比,以上数据集和真实图像的定性和定量实验表明了我们方法的优越性,例如在NYUv2-BSR上获得了+1.91 / +0.0048的PSNR / SSIM。