视频编码中对色素内预测的有关注的神经网络 (Attention-Based Neural Networks for Chroma Intra Prediction in Video Coding)

Neural networks can be successfully used to improve several modules of advanced video coding schemes. In particular, compression of colour components was shown to greatly benefit from usage of machine learning models, thanks to the design of appropriate attention-based architectures that allow the prediction to exploit specific samples in the reference region. However, such architectures tend to be complex and computationally intense, and may be difficult to deploy in a practical video coding pipeline. This work focuses on reducing the complexity of such methodologies, to design a set of simplified and cost-effective attention-based architectures for chroma intra-prediction. A novel size-agnostic multi-model approach is proposed to reduce the complexity of the inference process. The resulting simplified architecture is still capable of outperforming state-of-the-art methods. Moreover, a collection of simplifications is presented in this paper, to further reduce the complexity overhead of the proposed prediction architecture. Thanks to these simplifications, a reduction in the number of parameters of around 90% is achieved with respect to the original attention-based methodologies. Simplifications include a framework for reducing the overhead of the convolutional operations, a simplified cross-component processing model integrated into the original architecture, and a methodology to perform integer-precision approximations with the aim to obtain fast and hardware-aware implementations. The proposed schemes are integrated into the Versatile Video Coding (VVC) prediction pipeline, retaining compression efficiency of state-of-the-art chroma intra-prediction methods based on neural networks, while offering different directions for significantly reducing coding complexity.

翻译：特别是,由于设计了适当的关注型建筑,可以预测利用参照区域的具体样本,因此,对彩色组件的压缩被证明大大受益于机器学习模型的使用;然而,这些建筑往往复杂,计算紧张,可能难以在实际视频编码管道中部署,这项工作的重点是减少这些方法的复杂性,设计一套简化的、具有成本效益的、基于关注内部染色体的复杂结构;提议采用新的多模型方法,减少推断过程的复杂程度;由此形成的简化结构仍然能够超过最先进的方法;此外,本文还收集了一些简化,以进一步降低拟议的视频编码结构的复杂程度;由于这些简化,在最初基于关注的方法方面减少了大约90%的参数;简化了减少革命性内脏操作的顶部结构,简化了跨结构,简化了跨结构,实现了最初的准确性结构;同时,将硬拷贝系统整合成最初的硬拷贝性结构;同时,将硬拷贝系统整合了硬拷贝方法,将硬拷贝方法转化为最初的硬拷贝性结构。