Speech Neuroprostheses have the potential to enable communication for people with dysarthria or anarthria. Recent advances have demonstrated high-quality text decoding and speech synthesis from electrocorticographic grids placed on the cortical surface. Here, we investigate a less invasive measurement modality in three participants, namely stereotactic EEG (sEEG) that provides sparse sampling from multiple brain regions, including subcortical regions. To evaluate whether sEEG can also be used to synthesize audio from neural recordings, we employ a recurrent encoder-decoder model based on modern deep learning methods. We find that speech can indeed be reconstructed with correlations up to 0.8 from these minimally invasive recordings, despite limited amounts of training data. In particular, the architecture we employ naturally picks up on the temporal nature of the data and thereby outperforms an existing benchmark based on non-regressive convolutional neural networks.
翻译:语音神经质谱仪具有为患有抑郁症或肛门的人提供交流的潜力。最近的进展表明,从放在皮层表面的电磁电网格中,可以找到高质量的文字解码和语音合成。在这里,我们调查了三种参与者中一种侵犯性较小的测量方式,即立体式EEG(sEEG),它从多个脑区域,包括亚皮层区域提供稀少的采样。为了评估SEEG是否也可以用于合成神经录音的音频,我们采用了一种基于现代深层学习方法的经常性编码脱coder-decoder模型。我们发现,尽管培训数据数量有限,但确实可以用这些最低侵入性记录的0.8的关联性来重建。特别是,我们使用的结构自然地从数据的时空性质上提取数据,从而超越了基于非累进式革命神经网络的现有基准。