For any video codecs, the coding efficiency highly relies on whether the current signal to be encoded can find the relevant contexts from the previous reconstructed signals. Traditional codec has verified more contexts bring substantial coding gain, but in a time-consuming manner. However, for the emerging neural video codec (NVC), its contexts are still limited, leading to low compression ratio. To boost NVC, this paper proposes increasing the context diversity in both temporal and spatial dimensions. First, we guide the model to learn hierarchical quality patterns across frames, which enriches long-term and yet high-quality temporal contexts. Furthermore, to tap the potential of optical flow-based coding framework, we introduce a group-based offset diversity where the cross-group interaction is proposed for better context mining. In addition, this paper also adopts a quadtree-based partition to increase spatial context diversity when encoding the latent representation in parallel. Experiments show that our codec obtains 23.5% bitrate saving over previous SOTA NVC. Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR. The codes are at https://github.com/microsoft/DCVC.
翻译:对于任何视频编码器,编码效率高度取决于当前要编码的信号能否从先前重建的信号中找到相关背景。传统编码器已经核实了更多的背景,带来了大量编码收益,但需要花费时间。然而,对于新兴神经视频编码(NVC)而言,其背景仍然有限,导致压缩率低。为了提升NVC,本文提议在时间和空间两个层面增加背景多样性。首先,我们指导模型学习跨框架的等级质量模式,这丰富了长期和高品质的时间背景。此外,为了挖掘光源流编码框架的潜力,我们引入了基于集团的抵消多样性,其中提议进行跨集团的互动,以更好地进行背景采矿。此外,本文还采用了基于树的四边分隔法,以便在同时将潜在代表配置相加时,增加空间背景多样性。实验显示我们的代码比前SOTA NVC获得23.5%的比特拉节储蓄。更好的是,我们的编码器已经超过下一代在RGB/MISC 和 MAU420 的RGB/MSLC CLC 中的传统代码。</s>