Objective audio quality measurement systems often use perceptual models to predict the subjective quality scores of processed signals, as reported in listening tests. Most systems map different metrics of perceived degradation into a single quality score predicting subjective quality. This requires a quality mapping stage that is informed by real listening test data using statistical learning (i.e., a data-driven approach) with distortion metrics as input features. However, the amount of reliable training data is limited in practice, and usually not sufficient for a comprehensive training of large learning models. Models of cognitive effects in objective systems can, however, improve the learning model. Specifically, considering the salience of certain distortion types, they provide additional features to the mapping stage that improve the learning process, especially for limited amounts of training data. We propose a novel data-driven salience model that informs the quality mapping stage by explicitly estimating the cognitive/degradation metric interactions using a salience measure. Systems incorporating the novel salience model are shown to outperform equivalent systems that only use statistical learning to combine cognitive and degradation metrics, as well as other well-known measurement systems, for a representative validation dataset.
翻译:客观的音质测量系统往往使用认知模型来预测经处理的信号的主观质量分数,如监听测试中所报告的那样。大多数系统将所觉察到的退化的不同度量绘制成单一的质量分数,预测主观质量。这要求有一个质量绘图阶段,以真正的听觉测试数据为基础,利用统计学习(即数据驱动方法),以扭曲度量作为输入特征。然而,可靠的培训数据数量在实践中有限,通常不足以全面培训大型学习模型。但是,客观系统中的认知效果模型可以改进学习模型。具体地说,考虑到某些偏差类型的突出特征,它们为绘图阶段提供了更多的特征,从而改进学习过程,特别是数量有限的培训数据。我们提出了一个新的数据驱动特征模型,通过使用突出度度度度来明确估计认知/降解度度的相互作用,为质量绘图阶段提供依据。将新突出度模型纳入的系统显示,优等同系统将仅使用统计学习将认知和退化度指标与其他知名测量系统相结合,用于具有代表性的验证数据集。