Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity level classification and an auxilary automatic speech recognition (ASR). For the baseline experiments, we employ hand-crafted features such as eGeMaps and linguistic features, and SVM, MLP, and XGBoost classifiers. Explored on the Korean dysarthric speech QoLT database, our model outperforms the traditional baseline methods, with a relative percentage increase of 4.79% for classification accuracy. In addition, the proposed model surpasses the model trained without ASR head, achieving 10.09% relative percentage improvements. Furthermore, we present how multi-task learning affects the severity classification performance by analyzing the latent representations and regularization effect.
翻译:持续治疗和康复必须自动评估阅读系统语言。然而,获得非典型语言是一种挑战,往往导致数据稀缺问题。为了解决这一问题,我们提议了一种新型的对阅读系统语言的自动严重程度评估方法,同时采用自我监督的模式,同时进行多任务学习。Wav2vec 2.0 XLS-R是针对两种不同任务共同培训的:严重程度分类和辅助性自动语音识别(ASR)。对于基线实验,我们采用手工制作的特征,如eGeMaps和语言特征,以及SVM、MLP和XGBoost分类器。在韩国读系统语言语言QoLT数据库中探索,我们的模型超越了传统的基线方法,在分类准确性方面相对增长了4.79%。此外,拟议的模型超过了在没有ASR头的情况下培训的模型,实现了1.09%的相对百分比改进。此外,我们介绍了多任务学习如何通过分析潜在表现和正规化效果影响严重程度分类的绩效。