In Neural Processing Letters 50,3 (2019) a machine learning approach to blind video quality assessment was proposed. It is based on temporal pooling of features of video frames, taken from the last pooling layer of deep convolutional neural networks. The method was validated on two established benchmark datasets and gave results far better than the previous state-of-the-art. In this letter we report the results from our careful reimplementations. The performance results, claimed in the paper, cannot be reached, and are even below the state-of-the-art by a large margin. We show that the originally reported wrong performance results are a consequence of two cases of data leakage. Information from outside the training dataset was used in the fine-tuning stage and in the model evaluation.
翻译:在《神经处理信》50.3(2019年)中,提出了对盲视视频质量评估的机械学习方法,其基础是从深层神经神经网络最后的集合层中抽取的视频框架特征的实时集成,该方法在两个既定的基准数据集上得到验证,并比以往的先进技术效果要好得多。我们在此信中报告我们认真重新实施的结果。文件中声称的绩效结果无法达到,甚至大大低于最新水平。我们表明,最初报告的错误性能结果是数据泄漏的两个案例的结果。培训数据集以外的信息在微调阶段和模型评价中使用。