Learned video compression methods have demonstrated great promise in catching up with traditional video codecs in their rate-distortion (R-D) performance. However, existing learned video compression schemes are limited by the binding of the prediction mode and the fixed network framework. They are unable to support various inter prediction modes and thus inapplicable for various scenarios. In this paper, to break this limitation, we propose a versatile learned video compression (VLVC) framework that uses one model to support all possible prediction modes. Specifically, to realize versatile compression, we first build a motion compensation module that applies multiple 3D motion vector fields (i.e., voxel flows) for weighted trilinear warping in spatial-temporal space. The voxel flows convey the information of temporal reference position that helps to decouple inter prediction modes away from framework designing. Secondly, in case of multiple-reference-frame prediction, we apply a flow prediction module to predict accurate motion trajectories with unified polynomial functions. We show that the flow prediction module can largely reduce the transmission cost of voxel flows. Experimental results demonstrate that our proposed VLVC not only supports versatile compression in various settings, but also is the first end-to-end learned video compression method that outperforms the latest VVC/H.266 standard reference software in terms of MS-SSIM.
翻译:视频压缩方法显示,在超速扭曲(R-D)性能中追赶传统视频解码器(传统解码器)方面很有希望。然而,现有的已学视频压缩方案受到预测模式和固定网络框架的约束,因此无法支持各种预测模式,因此不适用于各种设想方案。在本文件中,为了打破这一限制,我们提议了一个多功能的视频压缩(VLVC)框架,使用一种模型来支持所有可能的预测模式。具体地说,为了实现多功能压缩,我们首先为空间时空空间空间中加权三线对流应用3D运动矢量字段(即, voxel流)来建立一个运动补偿模块。实验结果显示,我们拟议的VLV-C 流流传递的时间参考位置信息,帮助将不同预测模式与框架的设计脱钩。第二,在多参照框架预测的情况下,我们应用一个流动预测模块来预测具有统一多功能的准确运动轨迹。我们显示流动预测模块可以大幅降低 voxel 流动的传输成本。实验性结果显示,我们拟议的VLV-C 标准格式中的最新版本格式不仅支持了VLVS- 26 格式中的最新版本软件的压缩方法。