With the explosive increase of User Generated Content (UGC), UGC video quality assessment (VQA) becomes more and more important for improving users' Quality of Experience (QoE). However, most existing UGC VQA studies only focus on the visual distortions of videos, ignoring that the user's QoE also depends on the accompanying audio signals. In this paper, we conduct the first study to address the problem of UGC audio and video quality assessment (AVQA). Specifically, we construct the first UGC AVQA database named the SJTU-UAV database, which includes 520 in-the-wild UGC audio and video (A/V) sequences, and conduct a user study to obtain the mean opinion scores of the A/V sequences. The content of the SJTU-UAV database is then analyzed from both the audio and video aspects to show the database characteristics. We also design a family of AVQA models, which fuse the popular VQA methods and audio features via support vector regressor (SVR). We validate the effectiveness of the proposed models on the three databases. The experimental results show that with the help of audio signals, the VQA models can evaluate the perceptual quality more accurately. The database will be released to facilitate further research.
翻译:随着用户生成内容(UGC)的爆炸性增加,UGC视频质量评估(VQA)对于提高用户的经验质量(QoE)越来越重要。然而,大多数现有的UGC VQA研究只侧重于视频的视觉扭曲,忽视用户的QE也取决于随带的音频信号。在本文件中,我们进行第一项研究,以解决UGC音像质量评估(AVQA)的问题。具体地说,我们建造了第一个名为SJTU-UAVA数据库,其中包括520个UGC视听序列,并进行一项用户研究,以获得A/V序列的平均评分。然后,SJTU-UAV数据库的内容从音频和视频方面进行分析,以显示数据库的特性。我们还设计了AVQA模型的系列,通过支持矢量再分析器(SVR)将广受欢迎的VQA方法和音频特征整合起来。我们验证了三个数据库中的拟议模型的有效性,将更准确地显示VA级质量的实验结果。</s>