Conventional video compression methods employ a linear transform and block motion model, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to combinatorial nature of the end-to-end optimization problem. Learned video compression allows end-to-end rate-distortion optimized training of all nonlinear modules, quantization parameter and entropy model simultaneously. While previous work on learned video compression considered training a sequential video codec based on end-to-end optimization of cost averaged over pairs of successive frames, it is well-known in conventional video compression that hierarchical, bi-directional coding outperforms sequential compression. In this paper, we propose for the first time end-to-end optimization of a hierarchical, bi-directional motion compensated learned codec by accumulating cost function over fixed-size groups of pictures (GOP). Experimental results show that the rate-distortion performance of our proposed learned bi-directional {\it GOP coder} outperforms the state-of-the-art end-to-end optimized learned sequential compression as expected.
翻译:常规视频压缩方法采用线性变换和块状运动模式,运动估计、模式和量化参数选择的步骤,以及由于端到端优化问题的组合性质而使酶编码个别优化。 学习视频压缩允许同时对所有非线性模块、 量化参数和酶模型进行端到端的调制优化培训。 先前的学习视频压缩工作考虑根据对相继框架的双对平均成本的端到端优化来培训一个连续视频编码器,但在常规视频压缩中广为人知的是,等级化、双向编码超出连续压缩。 在本文件中,我们提议通过在固定规模的图片组(GOP)上积累成本功能,对分级到端的分级、双向运动进行第一次端优化,以补偿所学的编码。 实验结果显示,我们所拟议的双向平均成本优化的双向制成的分解制成,其速度优于预期的状态端到端最优化的连续压缩。