In this paper, we propose a new approach to pathological speech synthesis. Instead of using healthy speech as a source, we customise an existing pathological speech sample to a new speaker's voice characteristics. This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech, as in our approach, the voice conversion (VC) model does not need to be optimised for speech degradation but only for the speaker change. This change in the optimisation ensures that any degradation found in naturalness is due to the conversion process and not due to the model exaggerating characteristics of a speech pathology. To show a proof of concept of this method, we convert dysarthric speech using the UASpeech database and an autoencoder-based VC technique. Subjective evaluation results show reasonable naturalness for high intelligibility dysarthric speakers, though lower intelligibility seems to introduce a marginal degradation in naturalness scores for mid and low intelligibility speakers compared to ground truth. Conversion of speaker characteristics for low and high intelligibility speakers is successful, but not for mid. Whether the differences in the results for the different intelligibility levels is due to the intelligibility levels or due to the speakers needs to be further investigated.
翻译:在本文中,我们提出一种新的病理语言合成方法。 我们不是使用健康语言作为源头,而是将现有的病理语言语言样本定制为新发声人的声音特征。 这种方法缓解了在将典型语言转换为病理语言时通常会遇到的评价问题, 正如我们的方法一样, 声音转换模式不需要为语言退化带来最佳效果, 而只是为演讲者的变化。 优化的这一变化确保了自然状态中发现的任何退化都是转换过程造成的,而不是由于演讲病理学的模型夸大特征。 为了证明这一方法的概念,我们使用 UASpeech 数据库和以自动编码器为基础的VC 技术转换dysarthric 语言。 主观的评估结果表明,对于高感性感官具有高度的听觉障碍者来说,其自然性质是合理的,尽管较低的感知度似乎使中低感官的自然分数与地面真理相比产生边际退化。 低感言人语言学家特征的转换是成功的,但并不是中位的,而是中位的。