Using more reference frames can significantly improve the compression efficiency in neural video compression. However, in low-latency scenarios, most existing neural video compression frameworks usually use the previous one frame as reference. Or a few frameworks which use the previous multiple frames as reference only adopt a simple multi-reference frames propagation mechanism. In this paper, we present a more reasonable multi-reference frames propagation mechanism for neural video compression, called butterfly multi-reference frame propagation mechanism (Butterfly), which allows a more effective feature fusion of multi-reference frames. By this, we can generate more accurate temporal context conditional prior for Contextual Coding Module. Besides, when the number of decoded frames does not meet the required number of reference frames, we duplicate the nearest reference frame to achieve the requirement, which is better than duplicating the furthest one. Experiment results show that our method can significantly outperform the previous state-of-the-art (SOTA), and our neural codec can achieve -7.6% bitrate save on HEVC Class D dataset when compares with our base single-reference frame model with the same compression configuration.
翻译:使用更多的参考框架可以显著提高神经视频压缩的压缩效率。 但是,在低纬度情景中,大多数现有神经视频压缩框架通常使用前一个框架作为参考。 或者使用前多个框架作为参考的几个框架只采用简单的多参照框架传播机制。 在本文中,我们提出了一个更合理的神经视频压缩多参照框架传播机制(称为蝴蝶多参照框架传播机制(Butly)),这可以使多参照框架的特性更有效地融合。 这样,我们就可以产生更准确的时间背景,在上下文编码模块之前设定条件。 此外,当解码框架的数量不能满足所需的参考框架数量时,我们重复了最接近的参考框架,以达到这一要求,这比最远的放大框架更好。 实验结果显示,我们的方法可以大大超越以前的状态(SOTA),而我们的神经代码可以实现-7.6%的比特率,在与同一压缩配置的基本单一参照框架模型相比时,在HEVC类D数据集上可以实现-7.6%的保存率。</s>