Most of the existing neural video compression methods adopt the predictive coding framework, which first generates the predicted frame and then encodes its residue with the current frame. However, as for compression ratio, predictive coding is only a sub-optimal solution as it uses simple subtraction operation to remove the redundancy across frames. In this paper, we propose a deep contextual video compression framework to enable a paradigm shift from predictive coding to conditional coding. In particular, we try to answer the following questions: how to define, use, and learn condition under a deep video compression framework. To tap the potential of conditional coding, we propose using feature domain context as condition. This enables us to leverage the high dimension context to carry rich information to both the encoder and the decoder, which helps reconstruct the high-frequency contents for higher video quality. Our framework is also extensible, in which the condition can be flexibly designed. Experiments show that our method can significantly outperform the previous state-of-the-art (SOTA) deep video compression methods. When compared with x265 using veryslow preset, we can achieve 26.0% bitrate saving for 1080P standard test videos.
翻译:大部分现有的神经视频压缩方法都采用了预测编码框架, 它首先生成了预测框架, 然后将其残渣编码为当前框架。 但是, 对于压缩比率, 预测编码只是一个亚最佳的解决方案, 因为它使用简单的减法操作来消除框架的冗余。 在本文中, 我们提议了一个深背景视频压缩框架, 以便从预测编码向有条件编码进行范式转变。 特别是, 我们试图回答下列问题: 如何在深层视频压缩框架下定义、 使用和学习条件。 为了挖掘有条件编码的潜力, 我们提议使用特性域环境作为条件。 这使我们能够利用高维环境将丰富的信息传送到编码器和解码器, 帮助重建高频内容以达到更高的视频质量。 我们的框架也可以扩展, 以便从预测编码到有条件的编码。 实验显示, 我们的方法可以大大超越先前的状态( SOATA) 深视频压缩方法。 与使用 甚低前置的 x265 相比, 我们可以用高维特域环境环境来将高频段图像保存为 10P 标准 80 。