用于单一图像超级分辨率的高效变换器 (Efficient Transformer for Single Image Super-Resolution)

Single image super-resolution task has witnessed great strides with the development of deep learning. However, most existing studies focus on building a more complex neural network with a massive number of layers, bringing heavy computational cost and memory storage. Recently, as Transformer yields brilliant results in NLP tasks, more and more researchers start to explore the application of Transformer in computer vision tasks. But with the heavy computational cost and high GPU memory occupation of the vision Transformer, the network can not be designed too deep. To address this problem, we propose a novel Efficient Super-Resolution Transformer (ESRT) for fast and accurate image super-resolution. ESRT is a hybrid Transformer where a CNN-based SR network is first designed in the front to extract deep features. Specifically, there are two backbones for formatting the ESRT: lightweight CNN backbone (LCB) and lightweight Transformer backbone (LTB). Among them, LCB is a lightweight SR network to extract deep SR features at a low computational cost by dynamically adjusting the size of the feature map. LTB is made up of an efficient Transformer (ET) with a small GPU memory occupation, which benefited from the novel efficient multi-head attention (EMHA). In EMHA, a feature split module (FSM) is proposed to split the long sequence into sub-segments and then these sub-segments are applied by attention operation. This module can significantly decrease the GPU memory occupation. Extensive experiments show that our ESRT achieves competitive results. Compared with the original Transformer which occupies 16057M GPU memory, the proposed ET only occupies 4191M GPU memory with better performance.

翻译：单一图像超分辨率任务随着深层学习的发展而取得了长足的进步。然而,大多数现有研究都侧重于建立一个更复杂的神经网络,其层数众多,计算成本和存储存储量都很高。最近,随着变异器在 NLP 任务中取得辉煌的成果,越来越多的研究人员开始探索变异器在计算机视觉任务中的应用。但是,由于计算成本高,而且视觉变异器的GPU内存占用率高,因此网络的设计不会太深。为了解决这个问题,我们建议建立一个新的高效超级分辨率变异器(ESRT),用于快速和准确的图像超分辨率解析。ESRT是一个混合式变异器,其中以CNNS为基础的SR网络首先在前部设计了深度的深度功能。具体地说,有两种骨干为ESRT:轻度CN骨干(LCB)和轻度变压变压器骨干(LTB),其中,LCB是一个较轻的SR网络,通过动态调整地计算成本来提取深度的SR特性。LTBTB将原始图的大小改成一个高效的变异性变压(ET) 和智能变异式的内存操作(ET) 。