The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four different pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and oral squamous cell carcinoma. We demonstrate that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be effectively used to classify these types of pathological voices. We evaluate the robustness of our classifiers by adding room impulse responses to the test data and by applying them to unseen speech corpora. Our approach achieves unweighted average F1-Scores between 74.1% and 97.0%, depending on the model and the noise conditions used. The systems generalize and perform well on unseen data of healthy speakers sampled from a variety of different sources.
翻译:语音特征的病理检测通常被定义为二进制分类任务, 由代表特定病理学的一类人和代表健康言论的其他类人进行分类。 在这项工作中, 我们训练神经网络、 大型边缘分类器和树增生机, 以区分四种不同的病理: 帕金森病、 喉癌、 喉唇和嘴唇癌、 口腔腐烂细胞癌。 我们证明, 在预先训练的 wav2vec 2.0 系统的不同层中提取的潜伏表可以有效地用于分类这些类型的病理声音。 我们通过在测试数据中添加室脉冲反应, 并将它们应用到看不见的言语体来评估我们分类器的稳健性。 我们的方法是根据模型和所使用的噪音条件, 在74.1%至97.0%之间实现未加权的平均F1- 分数。 这些系统对不同来源抽样的健康发言人的未见数据进行普及并运行良好。