To model behavioral and neural correlates of language comprehension in naturalistic environments researchers have turned to broad-coverage tools from natural-language processing and machine learning. Where syntactic structure is explicitly modeled, prior work has relied predominantly on context-free grammars (CFG), yet such formalisms are not sufficiently expressive for human languages. Combinatory Categorial Grammars (CCGs) are sufficiently expressive directly compositional models of grammar with flexible constituency that affords incremental interpretation. In this work we evaluate whether a more expressive CCG provides a better model than a CFG for human neural signals collected with fMRI while participants listen to an audiobook story. We further test between variants of CCG that differ in how they handle optional adjuncts. These evaluations are carried out against a baseline that includes estimates of next-word predictability from a Transformer neural network language model. Such a comparison reveals unique contributions of CCG structure-building predominantly in the left posterior temporal lobe: CCG-derived measures offer a superior fit to neural signals compared to those derived from a CFG. These effects are spatially distinct from bilateral superior temporal effects that are unique to predictability. Neural effects for structure-building are thus separable from predictability during naturalistic listening, and those effects are best characterized by a grammar whose expressive power is motivated on independent linguistic grounds.
翻译:为了模拟自然环境下语言理解的行为和神经相关性,研究人员转向了自然语言处理和机器学习的广泛覆盖工具。其中,当句法结构明确地建模时,之前的研究主要依赖于上下文无关语法(CFG),但是这种形式主义对于人类语言来说不够表达。组合范畴语法(CCG)是直接组合语法的充分表达方式,具有灵活的组成结构,可为增量解释提供方便。在这项工作中,我们评估了一个比CFG更具表达性的CCG是否比CFG更好地为参与者在听有声书故事时收集到的神经信号建模。我们进一步测试了处理可选修饰语的CCG的变体之间的差异。这些评估针对包括来自Transformer神经网络语言模型的下一个词可预测性估计的基线进行。这种比较揭示了CCG结构构建在左后颞叶主要区域提供的独特贡献:与从CFG导出的神经信号相比,CCG导出的测量值提供了更好的拟合。这些效应在空间上不同于仅与可预测性相关的双侧上颞区效应。因此,结构构建的神经效应可以在自然听力中与可预测性分开,这些效应最好由一个在独立的语言基础上具有表达能力的语法来描述。