Images taken in dynamic scenes may contain unwanted motion blur, which significantly degrades visual quality. Such blur causes short- and long-range region-specific smoothing artifacts that are often directional and non-uniform, which is difficult to be removed. Inspired by the current success of transformers on computer vision and image processing tasks, we develop, Stripformer, a transformer-based architecture that constructs intra- and inter-strip tokens to reweight image features in the horizontal and vertical directions to catch blurred patterns with different orientations. It stacks interlaced intra-strip and inter-strip attention layers to reveal blur magnitudes. In addition to detecting region-specific blurred patterns of various orientations and magnitudes, Stripformer is also a token-efficient and parameter-efficient transformer model, demanding much less memory usage and computation cost than the vanilla transformer but works better without relying on tremendous training data. Experimental results show that Stripformer performs favorably against state-of-the-art models in dynamic scene deblurring.
翻译:在动态场景中拍摄的图像可能包含不必要的运动模糊, 这会大大降低视觉质量。 这种模糊导致长、 长、 长、 偏向性的滑动工艺品, 这些工艺品往往方向性和非统一性, 难以去除。 在目前变压器在计算机视觉和图像处理任务上的成功激励下, 我们开发了“ 脱光”, 一个基于变压器的架构, 构建了横向和纵向图象, 以重估水平和垂直方向上的模糊图象特征, 以捕捉不同方向的模糊图象。 它堆叠着相交织的内足和间注意层, 以揭示模糊的大小。 除了探测不同方向和大小的区域特有的模糊模式外, 脱光还是一种象征性的、 参数高效的变压器模型, 要求比香草变压器少得多的内存和计算成本, 但不依赖于巨大的培训数据, 运行得更好。 实验结果表明, 脱光对动态场面脱云中最先进的模型表现优。