This paper introduces a learned hierarchical B-frame coding scheme in response to the Grand Challenge on Neural Network-based Video Coding at ISCAS 2023. We address specifically three issues, including (1) B-frame coding, (2) YUV 4:2:0 coding, and (3) content-adaptive variable-rate coding with only one single model. Most learned video codecs operate internally in the RGB domain for P-frame coding. B-frame coding for YUV 4:2:0 content is largely under-explored. In addition, while there have been prior works on variable-rate coding with conditional convolution, most of them fail to consider the content information. We build our scheme on conditional augmented normalized flows (CANF). It features conditional motion and inter-frame codecs for efficient B-frame coding. To cope with YUV 4:2:0 content, two conditional inter-frame codecs are used to process the Y and UV components separately, with the coding of the UV components conditioned additionally on the Y component. Moreover, we introduce adaptive feature modulation in every convolutional layer, taking into account both the content information and the coding levels of B-frames to achieve content-adaptive variable-rate coding. Experimental results show that our model outperforms x265 and the winner of last year's challenge on commonly used datasets in terms of PSNR-YUV.
翻译:根据ISCAS 2023 的神经网络视频编码大挑战,本文件引入了一种有学识的等级B框架编码办法。我们专门处理三个问题,包括:(1) B框架编码,(2) YUV 4:2:0编码,(3) 内容调整可变节率编码,只有一个模式。大多数有学识的视频编码器在RGB域内部运行,用于P框架编码。YUV 4:0内容的B框架编码基本上未得到充分开发。此外,虽然以前曾就有条件的变价编码进行了工作,但大多数没有考虑内容信息。我们建立我们有条件的增强正常流动(CANF)的系统。它为高效的B框架编码设定了有条件的动态和跨框架编码。为了应对YV4:2:0内容,使用两个有条件的跨框架编码器件分别处理Y和UV的组件,而UV组件的编码主要以Y部分为条件。此外,我们在每一个革命性编码结构的编码中引入了适应性特征调控调,在每一个变式的标准化数据结构中,同时将我们使用共同的变式数据级数据级数据记录。