No-reference (NR) perceptual video quality assessment (VQA) is a complex, unsolved, and important problem to social and streaming media applications. Efficient and accurate video quality predictors are needed to monitor and guide the processing of billions of shared, often imperfect, user-generated content (UGC). Unfortunately, current NR models are limited in their prediction capabilities on real-world, "in-the-wild" UGC video data. To advance progress on this problem, we created the largest (by far) subjective video quality dataset, containing 39, 000 realworld distorted videos and 117, 000 space-time localized video patches ('v-patches'), and 5.5M human perceptual quality annotations. Using this, we created two unique NR-VQA models: (a) a local-to-global region-based NR VQA architecture (called PVQ) that learns to predict global video quality and achieves state-of-the-art performance on 3 UGC datasets, and (b) a first-of-a-kind space-time video quality mapping engine (called PVQ Mapper) that helps localize and visualize perceptual distortions in space and time. We will make the new database and prediction models available immediately following the review process.
翻译:没有参照(NR)的视频质量概念评估(VQA)对于社交和流媒体应用来说是一个复杂、尚未解决和重要的问题。需要高效和准确的视频质量预测器来监测和指导数十亿共享的、往往是不完善的用户生成的内容(UGC ) 的处理。 不幸的是,目前的NR模型在现实世界的预测能力方面是有限的,“世间”UGC视频数据。为了推进这一问题的进展,我们创建了最大的(迄今为止)主观视频质量数据集,其中包括39 000个现实世界扭曲视频和117 000个空间时地本地视频补丁(v-patches)和5.5M 人类感知质量说明。我们为此创建了两种独特的NR-VQA模型:(a) 以本地到全球为基地的NRVQA模型(称为PVQQ),该模型学会预测全球视频质量,并在3个UGC数据集上实现最新水平的视频质量表现,以及(b) 首次在视频质量图像模型和图像图像模型上进行实时的实时分析。