Decoding language from brain activity is a long-awaited goal in both healthcare and neuroscience. Major milestones have recently been reached thanks to intracranial devices: subject-specific pipelines trained on invasive brain responses to basic language tasks now start to efficiently decode interpretable features (e.g. letters, words, spectrograms). However, scaling this approach to natural speech and non-invasive brain recordings remains a major challenge. Here, we propose a single end-to-end architecture trained with contrastive learning across a large cohort of individuals to predict self-supervised representations of natural speech. We evaluate our model on four public datasets, encompassing 169 volunteers recorded with magneto- or electro-encephalography (M/EEG), while they listened to natural speech. The results show that our model can identify, from 3s of MEG signals, the corresponding speech segment with up to 72.5% top-10 accuracy out of 1,594 distinct segments (and 44% top-1 accuracy), and up to 19.1% out of 2,604 segments for EEG recordings -- hence allowing the decoding of phrases absent from the training set. Model comparison and ablation analyses show that these performances directly benefit from our original design choices, namely the use of (i) a contrastive objective, (ii) pretrained representations of speech and (iii) a common convolutional architecture simultaneously trained across several participants. Together, these results delineate a promising path to decode natural language processing in real time from non-invasive recordings of brain activity.
翻译:大脑活动中的隐蔽语言是保健和神经科学中长期期待的目标。由于内部设备,最近达到了重要的里程碑:通过对基本语言任务(如字母、文字、光谱图等)的入侵大脑反应进行入侵大脑反应培训的主题性输油管现在开始有效地解码可解释性特征(例如字母、文字、光谱)。然而,将这种方法推广到自然言语和非侵入性大脑记录仍然是一个重大挑战。在这里,我们提议建立一个单一端对端结构,经过培训,在众多个人中进行对比学习,以预测自我监督的自然演讲的表述。我们评估了四个公共数据集的模式,包括169个通过磁或电脑摄影(M/EEEG)记录到基本语言反应的大脑反应的大脑反应。结果显示,从3个MEG信号中,相应的演讲部分在1 594个不同部分(和44%的顶层-1准确度)中达到72.5%,在2 604个部分中达到19.1%。我们从最初的语音选择中解码,从培训的169个大脑(M/EEG/E)记录中缺少最初的词语结构图象标定的词语分析,即直接的自然对比活动和直径分析显示。模型和直图的模型和直图象分析显示。模型的模型的模型的模型和直图的模型和直图象分析显示的模型和直图的进度分析。