The ground truth used for training image, video, or speech quality prediction models is based on the Mean Opinion Scores (MOS) obtained from subjective experiments. Usually, it is necessary to conduct multiple experiments, mostly with different test participants, to obtain enough data to train quality models based on machine learning. Each of these experiments is subject to an experiment-specific bias, where the rating of the same file may be substantially different in two experiments (e.g. depending on the overall quality distribution). These different ratings for the same distortion levels confuse neural networks during training and lead to lower performance. To overcome this problem, we propose a bias-aware loss function that estimates each dataset's biases during training with a linear function and considers it while optimising the network weights. We prove the efficiency of the proposed method by training and validating quality prediction models on synthetic and subjective image and speech quality datasets.
翻译:用于培训图像、视频或语言质量预测模型的地面真实性依据是主观实验得出的平均意见评分(MOS),通常需要进行多次实验,主要是与不同的测试参与者进行,以获得足够的数据来培训基于机器学习的高质量模型,其中每项实验都存在实验性偏差,同一文件的评分在两个实验中可能大不相同(例如,取决于总体质量分布)。同样的扭曲程度的不同评分在培训期间混淆神经网络,导致性能下降。为了克服这一问题,我们提议了一个有偏差感的损失函数,在以线性功能进行的培训中估计每个数据集的偏差,并在优化网络加权的同时加以考虑。我们通过对合成和主观图像和语音质量数据集进行培训和验证质量预测模型,证明拟议方法的效率。我们通过对合成和主观图像和语音质量数据集进行培训和验证,证明拟议方法的效率。