Video coding technology has been continuously improved for higher compression ratio with higher resolution. However, the state-of-the-art video coding standards, such as H.265/HEVC and Versatile Video Coding, are still designed with the assumption the compressed video will be watched by humans. With the tremendous advance and maturation of deep neural networks in solving computer vision tasks, more and more videos are directly analyzed by deep neural networks without humans' involvement. Such a conventional design for video coding standard is not optimal when the compressed video is used by computer vision applications. While the human visual system is consistently sensitive to the content with high contrast, the impact of pixels on computer vision algorithms is driven by specific computer vision tasks. In this paper, we explore and summarize recent progress on computer vision task oriented video coding and emerging video coding standard, Video Coding for Machines.
翻译:高分辨率高压缩率的视频编码技术不断得到改进。然而,H.265/HEVC和Versatile视频编码等最新视频编码标准的设计仍然以人类观看压缩视频为假设。随着深度神经网络在解决计算机视觉任务方面的巨大进步和成熟,越来越多的视频直接由没有人类参与的深神经网络进行直接分析。当压缩视频应用程序使用压缩视频时,这种常规的视频编码标准不是最佳的。虽然人类视觉系统始终对内容保持高度对比敏感,但像素对计算机视觉算法的影响是由具体的计算机视觉任务驱动的。在本文中,我们探索和总结计算机视觉任务导向的视频编码和新兴视频编码标准(机器的视频编码)的最新进展。