Each year, millions of individuals lose the ability to speak intelligibly due to causes such as neuromuscular disease, stroke, trauma, and head/neck cancer surgery (e.g. laryngectomy) or treatment (e.g. radiotherapy toxicity to the speech articulators). Effective communication is crucial for daily activities, and losing the ability to speak leads to isolation, depression, anxiety, and a host of detrimental sequelae. Noninvasive surface electromyography (sEMG) has shown promise to restore speech output in these individuals. The goal is to collect sEMG signals from multiple articulatory sites as people silently produce speech and then decode the signals to enable fluent and natural communication. Currently, many fundamental properties of orofacial neuromuscular signals relating to speech articulation remain unanswered. They include questions relating to 1) the data structure of the orofacial sEMG signals, 2)the signal distribution shift of sEMG across individuals, 3) ability of sEMG signals to span the entire English language phonetic space during silent speech articulations, and 4) the generalization capability of non-invasive sEMG based silent speech interfaces. We address these questions through a series of experiments involving healthy human subjects. We show that sEMG signals evince graph data structure and that the signal distribution shift is given by a change of basis. Furthermore, we show that silently voiced articulations spanning the entire English language phonetic space can be decoded using small neural networks which can be trained with little data and that such architectures work well across individuals. To ensure transparency and reproducibility, we open-source all the data and codes used in this study.
翻译:暂无翻译