Video streaming usage has seen a significant rise as entertainment, education, and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users, and reduce energy use and costs overall. In this paper, we present an application of the MuZero algorithm to the challenge of video compression. Specifically, we target the problem of learning a rate control policy to select the quantization parameters (QP) in the encoding process of libvpx, an open source VP9 video compression library widely used by popular video-on-demand (VOD) services. We treat this as a sequential decision making problem to maximize the video quality with an episodic constraint imposed by the target bitrate. Notably, we introduce a novel self-competition based reward mechanism to solve constrained RL with variable constraint satisfaction difficulty, which is challenging for existing constrained RL methods. We demonstrate that the MuZero-based rate control achieves an average 6.28% reduction in size of the compressed videos for the same delivered video quality level (measured as PSNR BD-rate) compared to libvpx's two-pass VBR rate control policy, while having better constraint satisfaction behavior.
翻译:随着娱乐、教育和企业日益依赖在线视频,视频流用率大幅上升。 优化视频压缩有可能增加用户获取内容的机会和质量,降低能源使用和总体成本。 在本文中,我们介绍了对视频压缩挑战应用“ MuZero 算法 ” 。 具体地说,我们针对在libvpx编码过程中选择量化参数(QP)的学习率控制政策问题,libvpx编码过程是一个开放源VP9视频压缩库,广受流行视频点点服务使用。 我们将此视为一个连续决策问题,以目标比特率施加的附带限制来最大限度地提高视频质量。 值得注意的是,我们引入了一种新的基于自我竞争的奖赏机制来解决限制的RL,但有各种制约性满意度困难。 这对现有的限制RL方法具有挑战性。 我们证明,基于 Muzero的利率控制使交付的视频质量水平(与PSNR BD-rat)相比,压缩视频质量水平的尺寸平均减少6.28%(与PSNR BD-rat), 而libvpx的满意度政策则有更好的满意度。