Non-intrusive intelligibility prediction is important for its application in realistic scenarios, where a clean reference signal is difficult to access. The construction of many non-intrusive predictors require either ground truth intelligibility labels or clean reference signals for supervised learning. In this work, we leverage an unsupervised uncertainty estimation method for predicting speech intelligibility, which does not require intelligibility labels or reference signals to train the predictor. Our experiments demonstrate that the uncertainty from state-of-the-art end-to-end automatic speech recognition (ASR) models is highly correlated with speech intelligibility. The proposed method is evaluated on two databases and the results show that the unsupervised uncertainty measures of ASR models are more correlated with speech intelligibility from listening results than the predictions made by widely used intrusive methods.
翻译:非侵入性智能预测对于在现实情景中应用这一预测十分重要,因为在现实情景中,清洁参考信号难以获取。构建许多非侵入性预测器需要地面真知灼见标签或用于监督学习的清洁参考信号。在这项工作中,我们利用一种不受监督的不确定性估计方法来预测语言智能,这不需要智能标签或参考信号来训练预测器。我们的实验表明,来自最先进的端至端自动语音识别模型的不确定性与语音智能高度相关。在两个数据库中评估了拟议方法,结果显示,非侵入性预测模型的未受监督的不确定性计量与监听结果的语音智能比广泛使用的侵入性方法所作的预测更为相关。