This paper considers a representation learning strategy to model speech signals from patients with Parkinson's disease and cleft lip and palate. In particular, it compares different parametrized representation types such as wideband and narrowband spectrograms, and wavelet-based scalograms, with the goal of quantifying the representation capacity of each. Methods for quantification include the ability of the proposed model to classify different pathologies and the associated disease severity. Additionally, this paper proposes a novel fusion strategy called multi-spectral fusion that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders. The proposed models are able to classify the speech from Parkinson's disease patients with accuracy up to 95\%. The proposed models were also able to asses the dysarthria severity of Parkinson's disease patients with a Spearman correlation up to 0.75. These results outperform those observed in literature where the same problem was addressed with the same corpus.
翻译:本文探讨了一种代表学习战略,用以模拟帕金森病患者的言语信号,以及嘴唇和嘴唇的裂缝。特别是,它比较了宽带和窄带光谱图以及波子成像等不同的称谓类型,以量化每个患者的代言能力为目标。量化方法包括拟议模式对不同病理和相关疾病严重程度进行分类的能力。此外,本文还提出了一种新型的聚合战略,称为多谱谱聚变,将宽带和窄带光谱分辨率结合起来,采用基于自动立体的代言学习战略。拟议的模型能够将帕金森病患者的言语分类,准确度可达95 ⁇ 。拟议的模型还能够评估帕金森病患者与斯佩曼相关联至0.75的抗争严重程度。这些模型的结果比文献中观察到的相同问题与同一体的版本要好得多。