面部视频压缩的感知质量评估：一个基准和有效的方法 (Perceptual Quality Assessment of Face Video Compression: A Benchmark and An Effective Method)

Recent years have witnessed an exponential increase in the demand for face video compression, and the success of artificial intelligence has expanded the boundaries beyond traditional hybrid video coding. Generative coding approaches have been identified as promising alternatives with reasonable perceptual rate-distortion trade-offs, leveraging the statistical priors of face videos. However, the great diversity of distortion types in spatial and temporal domains, ranging from the traditional hybrid coding frameworks to generative models, present grand challenges in compressed face video quality assessment (VQA). In this paper, we introduce the large-scale Compressed Face Video Quality Assessment (CFVQA) database, which is the first attempt to systematically understand the perceptual quality and diversified compression distortions in face videos. The database contains 3,240 compressed face video clips in multiple compression levels, which are derived from 135 source videos with diversified content using six representative video codecs, including two traditional methods based on hybrid coding frameworks, two end-to-end methods, and two generative methods. In addition, a FAce VideO IntegeRity (FAVOR) index for face video compression was developed to measure the perceptual quality, considering the distinct content characteristics and temporal priors of the face videos. Experimental results exhibit its superior performance on the proposed CFVQA dataset. The benchmark is now made publicly available at: https://github.com/Yixuan423/Compressed-Face-Videos-Quality-Assessment.

翻译：近年来，对面部视频压缩的需求呈指数增长，人工智能的成功推动了超越传统混合视频编码的边界。生成编码方法是具有良好感知速率-失真平衡的有前途的选择，利用面部视频的统计先验知识。然而，空间和时间域中扭曲类型的多样性，涵盖了从传统混合编码框架到生成模型的方法，使得压缩面部视频质量评估（VQA）面临巨大挑战。在本文中，我们介绍了大规模压缩面部视频质量评估（CFVQA）数据库，这是首次试图系统地了解面部视频中的感知质量和多种压缩扭曲。该数据库包括来自135个源视频的3240个压缩面部视频剪辑，在多个压缩级别上进行了处理，并使用六种代表性视频编解码器，包括两种基于混合编码框架的传统方法、两种端到端方法和两种生成方法。此外，为了考虑面部视频的独特内容特征和时间先验知识，我们开发了一个面部视频完整性（FAVOR）指标来衡量感知质量。实验结果表明，FAVOR指标在所提出的CFVQA数据集上具有更好的性能。该基准现已可在https://github.com/Yixuan423/Compressed-Face-Videos-Quality-Assessment公开获得。