Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies based on multiple combinations of EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. The late fusion strategy is deemed to be the most effective approach for simultaneous speech generation and enhancement.
翻译:在语言交流范围缺失时,例如,在丧失说话能力的病人中,虽然提出了各种技术,但并未充分探讨电镀法(EPG)这一记录语言与言语中硬调接触的监测技术;在此,我们提议采用新型的多式EPG-语音系统,利用EPG和语音信号生成和加强语音;对基于多种组合的EPG和吵闹的语音信号的不同聚合战略进行了研究,并调查了拟议方法的可行性;实验结果表明,EPG2S仅依靠EPG信号就取得了理想的语音生成结果;此外,还观察到增加噪音语音信号,以提高质量和智能;此外,EPG2S被认为仅依靠音频信号就可实现高质量语音增强,加上EPG信号,进一步提高性能;延迟融合战略被认为是同时生成和增强语音效果的最有效办法。