Human subjective evaluation is the gold standard to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. We have recently developed a non-intrusive speech quality metric called Deep Noise Suppression Mean Opinion Score (DNSMOS) using the scores from ITU-T Rec. P.808 subjective evaluation. The P.808 scores reflect the overall quality of the audio clip. ITU-T Rec. P.835 subjective evaluation framework gives the standalone quality scores of speech and background noise in addition to the overall quality. In this work, we train an objective metric based on P.835 human ratings that outputs 3 scores: i) speech quality (SIG), ii) background noise quality (BAK), and iii) the overall quality (OVRL) of the audio. The developed metric is highly correlated with human ratings, with a Pearson's Correlation Coefficient (PCC)=0.94 for SIG and PCC=0.98 for BAK and OVRL. This is the first non-intrusive P.835 predictor we are aware of. DNSMOS P.835 is made publicly available as an Azure service.
翻译:人类主观评价是评价为人类感知而优化的言语质量的黄金标准。概念客观指标是主观分数的替代物。我们最近利用ITU-T Rec. P.808主观评价的分数,开发了非侵入性言语质量指标,称为深噪音抑制平均意见评分(DNSMOS)。P.808分反映了音频剪辑的整体质量。ITU-T Rec. P.835主观评价框架除了总体质量外,还给出了单独的言语质量评分和背景噪音评分。在这项工作中,我们根据P.835人类评分培训了一种客观指标,结果3分为:i)言语质量(SIG)、ii)背景噪音质量(BAK)和iii),声音总体质量(OVRL)。开发的评分与人类评分高度相关,Pearson的调率(PCC)为0.94分,而BAK和VVRL为P.98分的PCC=0.98。这是我们所知道的ANSS P.835公开提供的Aser服务。