Silent Speech Decoding (SSD) based on Surface electromyography (sEMG) has become a prevalent task in recent years. Though revolutions have been proposed to decode sEMG to audio successfully, some problems still remain. In this paper, we propose an optimized sequence-to-sequence (Seq2Seq) approach to synthesize voice from subvocal sEMG. Both subvocal and vocal sEMG are collected and preprocessed to provide data information. Then, we extract durations from the alignment between subvocal and vocal signals to regulate the subvocal sEMG following audio length. Besides, we use phoneme classification and vocal sEMG reconstruction modules to improve the model performance. Finally, experiments on a Mandarin speaker dataset, which consists of 6.49 hours of data, demonstrate that the proposed model improves the mapping accuracy and voice quality of reconstructed voice.
翻译:近些年来,根据地表电感学(sEMG)进行的静音解析(SSD)已成为一项普遍的任务。虽然有人提议革命将 SEMG 解码成音频,但仍有一些问题。在本文件中,我们提议采用优化的顺序到序列(Seq2Seqeq)法来合成来自子vocal sEMG 的音频。收集了子音频和声频 SEMG 来提供数据信息,并预先处理。然后,我们从子音频和声频信号之间的对齐中提取调时间,以调节音频长度之后的子音频 SEMG 。此外,我们使用电话分类和声频 SEMG 重建模块来改进模型性能。最后,由6.49小时的数据组成的曼达林扬声器数据集实验表明,拟议的模型提高了重建声音的绘图准确性和声音质量。