Implicit neural representation has recently shown a promising ability in representing images with arbitrary resolutions. In this paper, we present a Local Implicit Transformer (LIT), which integrates the attention mechanism and frequency encoding technique into a local implicit image function. We design a cross-scale local attention block to effectively aggregate local features. To further improve representative power, we propose a Cascaded LIT (CLIT) that exploits multi-scale features, along with a cumulative training strategy that gradually increases the upsampling scales during training. We have conducted extensive experiments to validate the effectiveness of these components and analyze various training strategies. The qualitative and quantitative results demonstrate that LIT and CLIT achieve favorable results and outperform the prior works in arbitrary super-resolution tasks.
翻译:最近,隐式神经表示已经展示了在表示任意像素的具有极高分辨率(arbitrary resolutions)的图像方面的潜力。在本文中,我们提出了一个局部隐式转换器(Local Implicit Transformer,LIT),它将注意机制和频率编码技术集成为局部隐式图像函数。我们设计了跨尺度局部注意块(cross-scale local attention block)以有效地聚合局部特征。为了进一步提高代表能力,我们提出了一个级联LIT(Cascaded LIT,CLIT),利用多尺度特征以及渐进式训练策略,在训练过程中逐渐增加上采样尺度。我们进行了广泛的实验证明了这些组件的有效性,并分析了各种训练策略。定量和定性结果表明,LIT和CLIT实现了良好的结果,并在任意超分辨率任务中优于先前的工作。