A fundamental question in neurolinguistics concerns the brain regions involved in syntactic and semantic processing during speech comprehension, both at the lexical (word processing) and supra-lexical levels (sentence and discourse processing). To what extent are these regions separated or intertwined? To address this question, we trained a lexical language model, Glove, and a supra-lexical language model, GPT-2, on a text corpus from which we selectively removed either syntactic or semantic information. We then assessed to what extent these information-restricted models were able to predict the time-courses of fMRI signal of humans listening to naturalistic text. We also manipulated the size of contextual information provided to GPT-2 in order to determine the windows of integration of brain regions involved in supra-lexical processing. Our analyses show that, while most brain regions involved in language are sensitive to both syntactic and semantic variables, the relative magnitudes of these effects vary a lot across these regions. Furthermore, we found an asymmetry between the left and right hemispheres, with semantic and syntactic processing being more dissociated in the left hemisphere than in the right, and the left and right hemispheres showing respectively greater sensitivity to short and long contexts. The use of information-restricted NLP models thus shed new light on the spatial organization of syntactic processing, semantic processing and compositionality.
翻译:神经语言学中的一个基本问题是,在语言理解期间,无论是在字典(文字处理)还是超时语言(言语和话语处理)层面,参与合成和语义处理的大脑区域,在语言理解期间,在语言理解期间,无论是在词典(文字处理)还是在超时代(文字处理)和超时代(语言处理)层面,都涉及到综合和语义处理的大脑区域。为了解决这个问题,我们训练了一种词汇语言模型(Glove)和超时代语言模型(GPT-2)和超时代语言模型(GPT),在一个我们有选择地删除合成或语义信息或语义信息的文本库上。然后,我们评估了这些受信息限制模型在多大程度上能够预测听从自然文字的人类的FMRI信号的时间路径。我们还操纵了向GPT-2提供的背景信息的规模,以确定参与超时段处理的大脑区域整合的窗口。我们的分析表明,虽然大多数语言中的大脑区域对合成和语系变量都很敏感,但这些效应的相对规模在这些地区有很大差异。 此外,我们发现左半球和右半球之间的空间处理过程和右半球的短期处理过程更加复杂。</s>