The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests. Non-intrusive speech quality assessment has attracted much attention recently due to the lack of access to clean reference signals for objective evaluations in real scenarios. In this paper, we propose a novel non-intrusive speech quality measurement model, MetricNet, which leverages label distribution learning and joint speech reconstruction learning to achieve significantly improved performance compared to the existing non-intrusive speech quality measurement models. We demonstrate that the proposed approach yields promisingly high correlation to the intrusive objective evaluation of speech quality on clean, noisy and processed speech data.
翻译:客观的言语质量评估通常是通过比较收到的言语信号和其清洁参考来进行,而人则能够评价言语质量而不作任何参考,例如在平均评分(MOS)测试中,非侵扰性言语质量评估最近引起很大关注,原因是在真实情景中缺乏获得客观评价的清洁参考信号的机会。在本文件中,我们提出了一个新的非侵扰性言语质量衡量模型MeticriNet,利用标签分发学习和联合言语重建学习实现与现有非侵扰性言语质量衡量模型相比的显著改进。我们证明,拟议方法与对清洁、吵闹和经处理的言语质量的侵扰性客观评估密切相关。