We advance an information-theoretic model of human language processing in the brain, in which incoming linguistic input is processed at two levels, in terms of a heuristic interpretation and in terms of error correction. We propose that these two kinds of information processing have distinct electroencephalographic signatures, corresponding to the well-documented N400 and P600 components of language-related event-related potentials (ERPs). Formally, we show that the information content (surprisal) of a word in context can be decomposed into two quantities: (A) heuristic surprise, which signals processing difficulty of word given its inferred context, and corresponds with the N400 signal; and (B) discrepancy signal, which reflects divergence between the true context and the inferred context, and corresponds to the P600 signal. Both of these quantities can be estimated using modern NLP techniques. We validate our theory by successfully simulating ERP patterns elicited by a variety of linguistic manipulations in previously-reported experimental data from Ryskin et al. (2021). Our theory is in principle compatible with traditional cognitive theories assuming a `good-enough' heuristic interpretation stage, but with precise information-theoretic formulation.
翻译:我们提出,这两种信息处理方法具有独特的电子脑图特征,相当于有详细记录的N400和P600与语言相关事件潜力(ERPs)的成分。我们正式表明,一个字在上下文中的信息内容(超光谱)可分解成两部分:(A) 超光谱突变,其信号处理困难的字词根据其推断背景与N400信号相对应;(B) 差异信号,反映真实背景与推断背景之间的差异,与P600信号相对应,这两个数量都可以使用现代NLP技术估算。我们通过成功模拟Ryskin等人(2021年)先前报告的实验数据中的各种语言操纵所引出的各种ERP模式来验证我们的理论。我们的理论原则上与假定“良好解释阶段”但准确的信息阶段的传统认知理论相匹配。