User-generated-content (UGC) videos have dominated the Internet during recent years. While many methods attempt to objectively assess the quality of these UGC videos, the mechanisms of human quality perception in the UGC-VQA problem is still yet to be explored. To better explain the quality perception mechanisms and learn more robust representations, we aim to disentangle the effects of aesthetic quality issues and technical quality issues risen by the complicated video generation processes in the UGC-VQA problem. To overcome the absence of respective supervisions during disentanglement, we propose the Limited View Biased Supervisions (LVBS) scheme where two separate evaluators are trained with decomposed views specifically designed for each issue. Composed of an Aesthetic Quality Evaluator (AQE) and a Technical Quality Evaluator (TQE) under the LVBS scheme, the proposed Disentangled Objective Video Quality Evaluator (DOVER) reach excellent performance (0.91 SRCC for KoNViD-1k, 0.89 SRCC for LSVQ, 0.88 SRCC for YouTube-UGC) in the UGC-VQA problem. More importantly, our blind subjective studies prove that the separate evaluators in DOVER can effectively match human perception on respective disentangled quality issues. Codes and demos are released in https://github.com/teowu/dover.
翻译:近年来,虽然许多方法试图客观评估这些UGC视频的质量,但UGC-VQA问题中的人类质量认知机制仍有待探讨。为了更好地解释质量认知机制,并学习更强有力的表述,我们旨在解开UGC-VQA问题中复杂的视频生成过程所增加的审美质量问题和技术质量问题的影响。为了克服在混乱过程中缺乏各自监督的问题,我们提议采用 " 有限视图双向监督(LVBS)方案 ",对两名独立的评价人员进行专门为每个问题设计的分解观点的培训。根据LBS方案,将一个审美质量评估员(AQE)和一个技术质量评估员(TQE)综合起来,拟议的分解的客观视频质量评估(DOVA)将达到出色的表现(KonvetiD-1k的SRCC为0.91 SRC,LSVQ的0.89 SRCC(LSVQ), 0.88 SRCC为YouBA-UGA/Melimal-A 问题,在分别的DVA质量研究中有效展示了我们的CA-CA 。