Recently, learned image compression methods have made remarkable achievements, some of which have outperformed the traditional image codec VVC. The advantages of learned image compression methods over traditional image codecs can be largely attributed to their powerful nonlinear transform coding. Convolutional layers and shifted window transformer (Swin-T) blocks are the basic units of neural networks, and their representation capabilities play an important role in nonlinear transform coding. In this paper, to improve the ability of the vanilla convolution to extract local features, we propose a novel prior-guided convolution (PGConv), where asymmetric convolutions (AConvs) and difference convolutions (DConvs) are introduced to strengthen skeleton elements and extract high-frequency information, respectively. A re-parameterization strategy is also used to reduce the computational complexity of PGConv. Moreover, to improve the ability of the Swin-T block to extract non-local features, we propose a novel multi-scale gated transformer (MGT), where dilated window-based multi-head self-attention blocks with different dilation rates and depth-wise convolution layers with different kernel sizes are used to extract multi-scale features, and a gate mechanism is introduced to enhance non-linearity. Finally, we propose a novel joint Multi-scale Gated Transformer and Prior-guided Convolutional Network (MGTPCN) for learned image compression. Experimental results show that our MGTPCN surpasses state-of-the-art algorithms with a better trade-off between performance and complexity.
翻译:近年来,学习型图像压缩方法取得了显著成就,部分方法已超越传统图像编码标准VVC。学习型图像压缩方法相对于传统图像编码器的优势,很大程度上归功于其强大的非线性变换编码能力。卷积层和移位窗口Transformer(Swin-T)块是神经网络的基本单元,其表示能力在非线性变换编码中起着重要作用。本文为提升基础卷积提取局部特征的能力,提出了一种新颖的先验引导卷积(PGConv),其中引入非对称卷积(AConvs)和差分卷积(DConvs)分别用于增强骨架元素和提取高频信息,并采用重参数化策略以降低PGConv的计算复杂度。此外,为提升Swin-T块提取非局部特征的能力,提出了一种新颖的多尺度门控Transformer(MGT),其中采用基于膨胀窗口的多头自注意力块(具有不同膨胀率)和深度可分离卷积层(具有不同核尺寸)来提取多尺度特征,并引入门控机制以增强非线性。最终,我们提出了一种新颖的联合多尺度门控Transformer与先验引导卷积网络(MGTPCN)用于学习型图像压缩。实验结果表明,我们的MGTPCN在性能与复杂度之间取得了更优的平衡,超越了现有最先进算法。