User-generated-content (UGC) videos have dominated the Internet during recent years. While it is well-recognized that the perceptual quality of these videos can be affected by diverse factors, few existing methods explicitly explore the effects of different factors in video quality assessment (VQA) for UGC videos, i.e. the UGC-VQA problem. In this work, we make the first attempt to disentangle the effects of aesthetic quality issues and technical quality issues risen by the complicated video generation processes in the UGC-VQA problem. To overcome the absence of respective supervisions during disentanglement, we propose the Limited View Biased Supervisions (LVBS) scheme where two separate evaluators are trained with decomposed views specifically designed for each issue. Composed of an Aesthetic Quality Evaluator (AQE) and a Technical Quality Evaluator (TQE) under the LVBS scheme, the proposed Disentangled Objective Video Quality Evaluator (DOVER) reach excellent performance (0.91 SRCC for KoNViD-1k, 0.89 SRCC for LSVQ, 0.88 SRCC for YouTube-UGC) in the UGC-VQA problem. More importantly, our blind subjective studies prove that the separate evaluators in DOVER can effectively match human perception on respective disentangled quality issues. Codes and demos are released in https://github.com/teowu/dover.
翻译:近些年来,用户产生的内容(UGC)视频在互联网上占据主导地位。虽然人们广泛认识到这些视频的感知质量可能受到多种因素的影响,但现有的方法很少明确探讨对UGC视频进行视频质量评估的不同因素(VQA)的影响,即UGC-VQA问题。在这项工作中,我们首次试图分解美学质量问题和技术质量问题的影响,这些影响是UGC-VQA问题中复杂的视频生成过程所增加的技术质量问题。为克服在混乱期间缺乏相关监督的情况,我们建议采用 " 有限视图双向监督(LVBS) " 方案,其中对两名独立的评价员进行了专门为每个问题设计的分解观点的培训(VQA) 。根据LBS-V计划,我们首次试图将审美质量问题和技术质量评估(TQE), 拟议的 " 分解目标视频质量评估 " (DOVER) 方案在KonviD-1k、 0.89 SRCC 和 " 各自 " 透明 " 准则 " 中有效证明 " 高中 " 。