Wideband codecs such as AMR-WB or EVS are widely used in (mobile) speech communication. Evaluation of coded speech quality is often performed subjectively by an absolute category rating (ACR) listening test. However, the ACR test is impractical for online monitoring of speech communication networks. Perceptual evaluation of speech quality (PESQ) is one of the widely used metrics instrumentally predicting the results of an ACR test. However, the PESQ algorithm requires an original reference signal, which is usually unavailable in network monitoring, thus limiting its applicability. NISQA is a new non-intrusive neural-network-based speech quality measure, focusing on super-wideband speech signals. In this work, however, we aim at predicting the well-known PESQ metric using a non-intrusive PESQ-DNN model. We illustrate the potential of this model by predicting the PESQ scores of wideband-coded speech obtained from AMR-WB or EVS codecs operating at different bitrates in noisy, tandeming, and error-prone transmission conditions. We compare our methods with the state-of-the-art network topologies of QualityNet, WaweNet, and DNSMOS -- all applied to PESQ prediction -- by measuring the mean absolute error (MAE) and the linear correlation coefficient (LCC). The proposed PESQ-DNN offers the best total MAE and LCC of 0.11 and 0.92, respectively, in conditions without frame loss, and still is best when including frame loss. Note that our model could be similarly used to non-intrusively predict POLQA or other (intrusive) metrics. Upon article acceptance, code will be provided at GitHub.
翻译:宽带编解码器,如AMR-WB或EVS在(移动)语音通信中广泛使用。编码语音质量的评估通常通过绝对类别评级(ACR)听力测试主观完成。然而,ACR测试对于在线监视语音通信网络来说是不实用的。语音质量的感知评估(PESQ)是一种广泛使用的度量标准,可以仪器预测ACR测试的结果。然而,PESQ算法需要原始参考信号,通常在网络监视中不可用,从而限制了它的适用性。NISQA是一种新的非侵入性神经网络语音质量测量,专注于超宽带语音信号。然而,在本文中,我们旨在使用非侵入式PESQ-DNN模型预测众所周知的PESQ度量标准。我们通过预测在嘈杂、串联和容易出错的传输条件下使用AMR-WB或EVS编解码器以不同比特率编码的宽带编码语音的PESQ分数来说明该模型的潜力。我们通过测量平均绝对误差(MAE)和线性相关系数(LCC)将我们的方法与QualityNet、WaweNet和DNSMOS的最先进网络拓扑进行比较,这些方法都应用于PESQ预测。所提出的PESQ-DNN在没有帧丢失条件下,提供了最佳的总MAE和LCC,分别为0.11和0.92,在包括帧丢失时仍然是最佳的。请注意,我们的模型可以类似地用于非侵入性预测POLQA或其他(侵入性)度量标准。在文章接受后,代码将在GitHub上提供。