Brain decoding, understood as the process of mapping brain activities to the stimuli that generated them, has been an active research area in the last years. In the case of language stimuli, recent studies have shown that it is possible to decode fMRI scans into an embedding of the word a subject is reading. However, such word embeddings are designed for natural language processing tasks rather than for brain decoding. Therefore, they limit our ability to recover the precise stimulus. In this work, we propose to directly classify an fMRI scan, mapping it to the corresponding word within a fixed vocabulary. Unlike existing work, we evaluate on scans from previously unseen subjects. We argue that this is a more realistic setup and we present a model that can decode fMRI data from unseen subjects. Our model achieves 5.22% Top-1 and 13.59% Top-5 accuracy in this challenging task, significantly outperforming all the considered competitive baselines. Furthermore, we use the decoded words to guide language generation with the GPT-2 model. This way, we advance the quest for a system that translates brain activities into coherent text.
翻译:大脑解码被理解为将大脑活动映射成生成它们的刺激因素的过程,在过去几年中一直是一个活跃的研究领域。在语言刺激方面,最近的研究显示,可以将FMRI扫描解码成一个嵌入一个主题的词正在读。然而,这种字嵌入是为自然语言处理任务设计的,而不是为大脑解码设计。因此,它们限制了我们恢复精确刺激的能力。在这项工作中,我们提议直接将FMRI扫描分类,在固定词汇中将它映射成相应的词。与现有的工作不同,我们评估以前看不见的科目的扫描。我们争辩说,这是一个更现实的设置,我们提出了一个模型,可以将FMRI数据从看不见的科目中解码。我们的模型在这项具有挑战性的任务中实现了5.22%的顶部-1和13.59%的顶部-5精确度,大大超过了所有被认为具有竞争力的基线。此外,我们使用解码的词来引导语言生成GPT-2模型。我们以此推进对将大脑活动转换成连贯文本的系统的研究。