The previous deep video compression approaches only use the single scale motion compensation strategy and rarely adopt the mode prediction technique from the traditional standards like H.264/H.265 for both motion and residual compression. In this work, we first propose a coarse-to-fine (C2F) deep video compression framework for better motion compensation, in which we perform motion estimation, compression and compensation twice in a coarse to fine manner. Our C2F framework can achieve better motion compensation results without significantly increasing bit costs. Observing hyperprior information (i.e., the mean and variance values) from the hyperprior networks contains discriminant statistical information of different patches, we also propose two efficient hyperprior-guided mode prediction methods. Specifically, using hyperprior information as the input, we propose two mode prediction networks to respectively predict the optimal block resolutions for better motion coding and decide whether to skip residual information from each block for better residual coding without introducing additional bit cost while bringing negligible extra computation cost. Comprehensive experimental results demonstrate our proposed C2F video compression framework equipped with the new hyperprior-guided mode prediction methods achieves the state-of-the-art performance on HEVC, UVG and MCL-JCV datasets.
翻译:先前的深视频压缩方法只使用单比例动作补偿策略,而很少采用H.264/H.265等传统标准对运动和剩余压缩采用H.264/H.265等传统标准的模式预测技术。在这项工作中,我们首先提出一个粗到粗(C2F)深视频压缩框架,以更好的运动补偿,我们以粗到细的方式进行两次运动估计、压缩和补偿。我们的C2F框架可以取得更好的运动补偿结果,而不会大大增加比特成本。观测超顶级网络的超大型信息(即平均值和差异值)包含不同补丁的扭曲性统计信息,我们还提出两种高效的超主要指导模式预测方法。具体地说,我们用超主要信息作为投入,提出两个模式预测网络,分别预测最佳的区块分辨率,以更好的动作编码。我们C2F框架可以取得更好的运动补偿结果,但不会带来微不足道的额外计算费用。全面实验结果表明,我们拟议的C2F视频视频压缩框架配备了新的超大比例制模式预测方法,我们提出了两个模式预测方法。具体地说,我们用超主要信息预测网络,可以实现美国高频和高频公司的状态数据。