Speech intelligibility assessment is essential for many speech-related applications. However, most objective intelligibility metrics are intrusive, as they require clean reference speech in addition to the degraded or processed signal for evaluation. Furthermore, existing metrics such as STOI are primarily designed for normal hearing listeners, and their predictive accuracy for hearing impaired speech intelligibility remains limited. On the other hand, the GESI (Gammachirp Envelope Similarity Index) can be used to estimate intelligibility for hearing-impaired listeners, but it is also intrusive, as it depends on reference signals. This requirement limits its applicability in real-world scenarios. To overcome this limitation, this study proposes DeepGESI, a non-intrusive deep learning-based model capable of accurately and efficiently predicting the speech intelligibility of hearing-impaired listeners without requiring any clean reference speech. Experimental results demonstrate that, under the test conditions of the 2nd Clarity Prediction Challenge(CPC2) dataset, the GESI scores predicted by DeepGESI exhibit a strong correlation with the actual GESI scores. In addition, the proposed model achieves a substantially faster prediction speed compared to conventional methods.
翻译:语音清晰度评估对于许多语音相关应用至关重要。然而,大多数客观清晰度指标是侵入式的,因为它们除了需要评估的降质或处理信号外,还需要纯净的参考语音。此外,现有指标(如STOI)主要针对正常听力听众设计,其对听力受损语音清晰度的预测准确性仍然有限。另一方面,GESI(Gammachirp包络相似度指数)可用于估计听力受损者的清晰度,但它同样是侵入式的,因为它依赖于参考信号。这一要求限制了其在现实场景中的适用性。为克服此限制,本研究提出DeepGESI,一种基于深度学习的非侵入式模型,能够准确高效地预测听力受损者的语音清晰度,且无需任何纯净参考语音。实验结果表明,在第二届清晰度预测挑战(CPC2)数据集的测试条件下,DeepGESI预测的GESI分数与实际GESI分数表现出强相关性。此外,与传统方法相比,所提模型实现了显著更快的预测速度。