In this work, we propose a no-reference video quality assessment method, aiming to achieve high-generalization capability in cross-content, -resolution and -frame rate quality prediction. In particular, we evaluate the quality of a video by learning effective feature representations in spatial-temporal domain. In the spatial domain, to tackle the resolution and content variations, we impose the Gaussian distribution constraints on the quality features. The unified distribution can significantly reduce the domain gap between different video samples, resulting in a more generalized quality feature representation. Along the temporal dimension, inspired by the mechanism of visual perception, we propose a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality. Experiments show that our method outperforms the state-of-the-art methods on cross-dataset settings, and achieves comparable performance on intra-dataset configurations, demonstrating the high-generalization capability of the proposed method.
翻译:在这项工作中,我们提出一个不参考视频质量评估方法,目的是在跨内容、分辨率和框架率质量预测方面实现高普及能力;特别是,我们通过学习空间时空领域有效的特征表现来评估视频的质量;在空间领域,为了解决分辨率和内容差异,我们将高斯分配限制在质量特征上;统一分发可以大大缩小不同视频样本之间的域间差距,从而导致更普遍的质量特征表现。在视觉感知机制的启发下,在时间层面的同时,我们提出一个金字塔时间汇总模块,利用短期和长期记忆来综合框架质量。实验表明,我们的方法优于关于交叉数据设置的最先进方法,并在内部数据集配置上取得可比的性能,展示了拟议方法的高度普及能力。