The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons. The lack of available domain specific data is one such obstacle that hinders its usage for different speech-based applications targeting pathological speakers. In line with the challenge, in this work, we investigate a few data augmentation techniques to simulate training data for improving the children speech recognition considering the case of cleft lip and palate (CLP) speech. The augmentation techniques explored in this study, include vocal tract length perturbation (VTLP), reverberation, speaking rate, pitch modification, and speech feature modification using cycle consistent adversarial networks (CycleGAN). Our study finds that the data augmentation methods significantly improve the CLP speech recognition performance, which is more evident when we used feature modification using CycleGAN, VTLP and reverberation based methods. More specifically, the results from this study show that our systems produce an improved phone error rate compared to the systems without data augmentation.
翻译:由于各种原因,对病态言语的自动认识,特别是对有任何动脉障碍的儿童的病态言语的自动认识是一项具有挑战性的任务。缺乏具体领域的可用数据是阻碍其用于针对病态言语者的不同语音应用的障碍之一。根据这一挑战,我们在这项工作中调查了一些数据增强技术,以模拟培训数据,提高儿童对病态言语的认知,同时考虑到左唇和感官(CLP)言语的情况。本研究中探讨的增强技术包括声道曲长扰动(VTLP)、回动、讲速、音速、音调和语音特征的改变,使用周期一致的对称网络(CycleGAN)。我们的研究发现,数据增强方法大大改进了CLP言语的识别性能,而我们使用基于ScycellGAN、VTLP和回动法的特性修改方法,更明显地显示了这一点。更具体地说,这项研究的结果显示,我们的系统比没有数据增强的系统产生更好的电话错误率。