The latest video coding standard, Versatile Video Coding (VVC), achieves almost twice coding efficiency compared to its predecessor, the High Efficiency Video Coding (HEVC). However, achieving this efficiency (for intra coding) requires 31x computational complexity compared to HEVC, making it challenging for low power and real-time applications. This paper, proposes a novel machine learning approach that jointly and separately employs two modalities of features, to simplify the intra coding decision. First a set of features are extracted that use the existing DCT core of VVC, to assess the texture characteristics, and forms the first modality of data. This produces high quality features with almost no overhead. The distribution of intra modes at the neighboring blocks is also used to form the second modality of data, which provides statistical information about the frame. Second, a two-step feature reduction method is designed that reduces the size of feature set, such that a lightweight model with a limited number of parameters can be used to learn the intra mode decision task. Third, three separate training strategies are proposed (1) an offline training strategy using the first (single) modality of data, (2) an online training strategy that uses the second (single) modality, and (3) a mixed online-offline strategy that uses bimodal learning. Finally, a low-complexity encoding algorithms is proposed based on the proposed learning strategies. Extensive experimental results show that the proposed methods can reduce up to 24% of encoding time, with a negligible loss of coding efficiency. Moreover, it is demonstrated how a bimodal learning strategy can boost the performance of learning. Lastly, the proposed method has a very low computational overhead (0.2%), and uses existing components of a VVC encoder, which makes it much more practical compared to competing solutions.
翻译:最新的视频编码标准 Versatile Video Coarding (VVC), 与其前身 高效率视频编码 (HEVC) 相比, 实现了近两倍的编码效率。 然而, 实现这一效率( 内部编码) 需要31x计算复杂度, 与 HEVC 相比, 使得它对于低功率和实时应用程序具有挑战性。 本文提出一种新的机器学习方法, 联合和分别使用两种功能模式, 以简化内部编码决定。 首先, 抽取一套功能, 使用 VVVC 现有的 DCT 核心, 来评估文本特性, 并形成第一个数据模式 。 这产生高质量的升级战略 。 相邻区内部模式的分布也用来形成第二个数据模式, 提供有关框架的统计信息。 其次, 两步设置的功能缩减方法, 从而可以使用一个较轻的重量模型, 且参数有限, 可以用来学习内部模式决定任务。 第三, 三个不同的培训战略是(1) 使用第二个离线培训战略, 使用第一个( singlevelylevelyal) rual real comlistrual comliver comliver comliver 战略,, 一种基于 rodustrual rodudustrual roduding roduding roduding rodustrual commodal roducede