Convolutional neural networks have achieved excellent results in compressed video quality enhancement task in recent years. State-of-the-art methods explore the spatiotemporal information of adjacent frames mainly by deformable convolution. However, offset fields in deformable convolution are difficult to train, and its instability in training often leads to offset overflow, which reduce the efficiency of correlation modeling. In this work, we propose a transformer-based compressed video quality enhancement (TVQE) method, consisting of Swin-AutoEncoder based Spatio-Temporal feature Fusion (SSTF) module and Channel-wise Attention based Quality Enhancement (CAQE) module. The proposed SSTF module learns both local and global features with the help of Swin-AutoEncoder, which improves the ability of correlation modeling. Meanwhile, the window mechanism-based Swin Transformer and the encoderdecoder structure greatly improve the execution efficiency. On the other hand, the proposed CAQE module calculates the channel attention, which aggregates the temporal information between channels in the feature map, and finally achieves the efficient fusion of inter-frame information. Extensive experimental results on the JCT-VT test sequences show that the proposed method achieves better performance in average for both subjective and objective quality. Meanwhile, our proposed method outperforms existing ones in terms of both inference speed and GPU consumption.
翻译:近年来,电动神经网络在压缩视频质量提升任务方面取得了优异的成果; 艺术状态方法主要通过变形变化来探索相邻框架的波形时光信息; 然而,在变形变异中抵消字段很难进行训练,培训不稳定往往会抵消溢出,从而降低相关模型建模的效率。 在这项工作中,我们建议采用基于变压器的压缩视频质量提升(TVQE)方法,包括基于Swin-AutoEncoder的Spatio-Tempio-Tempal特征整合模块和基于频道注意质量增强模块。 拟议的SSTF模块在Swin-AutoEncoder的帮助下学习了本地和全球的特征,提高了相关模型建模能力。 与此同时,基于窗口机制的Swin变异器和编码结构极大地提高了执行效率。 另一方面, 拟议的CAQE模块计算了频道关注度,将地貌地图中各频道之间的时间信息汇总在一起,以频道关注度为基础的质量增强质量增强(CAQEE)模块在Swin-VF中最终在拟议标准质量测试方法上实现高效的测试结果。