Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity classification and auxiliary automatic speech recognition (ASR). For the baseline experiments, we employ hand-crafted acoustic features and machine learning classifiers such as SVM, MLP, and XGBoost. Explored on the Korean dysarthric speech QoLT database, our model outperforms the traditional baseline methods, with a relative percentage increase of 1.25% for F1-score. In addition, the proposed model surpasses the model trained without ASR head, achieving 10.61% relative percentage improvements. Furthermore, we present how multi-task learning affects the severity classification performance by analyzing the latent representations and regularization effect.
翻译:自动评估语音障碍对于持续治疗和康复至关重要。然而,获取非典型语音的困难往往会导致数据稀缺问题。为了解决这个问题,我们提出了一种新颖的自监督模型与多任务学习相结合的语音障碍严重性自动评估方法。Wav2vec 2.0 XLS-R同时进行两个不同的任务:严重性分类和辅助自动语音识别(ASR)。对于基准实验,我们采用手工制作的声学特征和机器学习分类器,如SVM,MLP和XGBoost。在韩国语音障碍QoLT数据库中进行实验,我们的模型优于传统的基准方法,F1得分相对百分比提高了1.25%。此外,所提出的模型超越了没有ASR头部训练的模型,实现了10.61%的相对百分比提高。此外,我们通过分析潜在表示和正则化效应展示了多任务学习对严重程度分类性能的影响。