Neural video codecs (NVCs), leveraging the power of end-to-end learning, have demonstrated remarkable coding efficiency improvements over traditional video codecs. Recent research has begun to pay attention to the quality structures in NVCs, optimizing them by introducing explicit hierarchical designs. However, less attention has been paid to the reference structure design, which fundamentally should be aligned with the hierarchical quality structure. In addition, there is still significant room for further optimization of the hierarchical quality structure. To address these challenges in NVCs, we propose EHVC, an efficient hierarchical neural video codec featuring three key innovations: (1) a hierarchical multi-reference scheme that draws on traditional video codec design to align reference and quality structures, thereby addressing the reference-quality mismatch; (2) a lookahead strategy to utilize an encoder-side context from future frames to enhance the quality structure; (3) a layer-wise quality scale with random quality training strategy to stabilize quality structures during inference. With these improvements, EHVC achieves significantly superior performance to the state-of-the-art NVCs. Code will be released in: https://github.com/bytedance/NEVC.
翻译:神经视频编码器(NVCs)凭借端到端学习的强大能力,在编码效率上已展现出超越传统视频编码器的显著提升。近期研究开始关注NVCs中的质量结构,并通过引入显式的分层设计对其进行优化。然而,参考结构设计却较少受到关注,而该结构本质上应与分层质量结构保持一致。此外,现有分层质量结构仍有巨大的进一步优化空间。为应对NVCs中的这些挑战,我们提出了EHVC——一种高效的分层神经视频编码器,其具备三项关键创新:(1)借鉴传统视频编码器设计的分层多参考方案,使参考结构与质量结构对齐,从而解决参考-质量失配问题;(2)一种前瞻策略,利用来自未来帧的编码端上下文以增强质量结构;(3)结合随机质量训练策略的逐层质量缩放机制,以在推理过程中稳定质量结构。通过这些改进,EHVC实现了显著优于现有先进NVCs的性能。代码将在以下地址发布:https://github.com/bytedance/NEVC。