Translating imagined speech from human brain activity into voice is a challenging and absorbing research issue that can provide new means of human communication via brain signals. Endeavors toward reconstructing speech from brain activity have shown their potential using invasive measures of spoken speech data, however, have faced challenges in reconstructing imagined speech. In this paper, we propose NeuroTalk, which converts non-invasive brain signals of imagined speech into the user's own voice. Our model was trained with spoken speech EEG which was generalized to adapt to the domain of imagined speech, thus allowing natural correspondence between the imagined speech and the voice as a ground truth. In our framework, automatic speech recognition decoder contributed to decomposing the phonemes of generated speech, thereby displaying the potential of voice reconstruction from unseen words. Our results imply the potential of speech synthesis from human EEG signals, not only from spoken speech but also from the brain signals of imagined speech.
翻译:将人类大脑活动想象的言语转换为声音是一个具有挑战性和吸收性的研究问题,它可以通过大脑信号提供人类交流的新手段。从大脑活动中重建言论的决斗者已经展示了他们利用语音数据侵入性措施的潜力。然而,在重建想象的言语方面,他们面临着挑战。在本文中,我们建议NeuroTalk将非侵入性想象的言语的大脑信号转换为用户自己的声音。我们的模式是接受口语EEEEG的培训,该口语普遍适应于想象的言语领域,从而允许所想象的言语和声音之间的自然通信作为地面真相。在我们的框架内,自动语音识别解密了生成的言语的电话,从而展示了用无形的言语重建声音的潜力。我们的结果意味着通过人类的言语组合,不仅来自口头表达,而且来自想象的言语的脑信号。