More predictable words are easier to process - they are read faster and elicit smaller neural signals associated with processing difficulty, most notably, the N400 component of the event-related brain potential. Thus, it has been argued that prediction of upcoming words is a key component of language comprehension, and that studying the amplitude of the N400 is a valuable way to investigate the predictions that we make. In this study, we investigate whether the linguistic predictions of computational language models or humans better reflect the way in which natural language stimuli modulate the amplitude of the N400. One important difference in the linguistic predictions of humans versus computational language models is that while language models base their predictions exclusively on the preceding linguistic context, humans may rely on other factors. We find that the predictions of three top-of-the-line contemporary language models - GPT-3, RoBERTa, and ALBERT - match the N400 more closely than human predictions. This suggests that the predictive processes underlying the N400 may be more sensitive to the surface-level statistics of language than previously thought.
翻译:更可预测的词比较容易处理 - 读得更快,并引出与处理困难相关的较小神经信号,最显著的是事件相关大脑潜力的N400部分。 因此,有人认为,预测即将到来的词是语言理解的一个关键组成部分,研究N400的振幅是调查我们作出的预测的宝贵方法。 在本研究中,我们调查计算语言模型或人的语言预测是否更好地反映了自然语言模拟调节N400振幅的方式。 人类语言预测与计算语言模型的一个重要区别是,语言模型的预测完全基于先前的语言背景,而人类可能依赖其他因素。我们发现,三种最前沿的当代语言模型GPT-3、ROBERTA和ALBERTA的预测比人类预测更接近N400。 这表明,N400的预测过程可能比先前想象的更敏感于语言的表层统计。