Motion estimation and motion compensation are indispensable parts of inter prediction in video coding. Since the motion vector of objects is mostly in fractional pixel units, original reference pictures may not accurately provide a suitable reference for motion compensation. In this paper, we propose a deep reference picture generator which can create a picture that is more relevant to the current encoding frame, thereby further reducing temporal redundancy and improving video compression efficiency. Inspired by the recent progress of Convolutional Neural Network(CNN), this paper proposes to use a dilated CNN to build the generator. Moreover, we insert the generated deep picture into Versatile Video Coding(VVC) as a reference picture and perform a comprehensive set of experiments to evaluate the effectiveness of our network on the latest VVC Test Model VTM. The experimental results demonstrate that our proposed method achieves on average 9.7% bit saving compared with VVC under low-delay P configuration.
翻译:移动估计和运动补偿是视频编码中相互预测不可或缺的部分。 由于物体的移动矢量大多在分像单位中, 原始参考图片可能无法准确提供运动补偿的适当参考。 在本文中, 我们提议一个深参考图片生成器, 来生成一个与当前编码框架更相关的图片, 从而进一步减少时间冗余, 提高视频压缩效率。 在革命神经网络( CNN) 最新进展的启发下, 本文建议使用一个放大的CNN来构建生成的生成器。 此外, 我们将生成的深度图片插入Versatile视频编码( VVVC) 中, 作为参考图片, 并进行一系列全面的实验, 以评估我们最新的VVC测试VTM模型网络的有效性。 实验结果显示,与低delay P配置下的VC相比, 我们的拟议方法平均实现了9. 7%的节位率。