Temporary syntactic ambiguities arise when the beginning of a sentence is compatible with multiple syntactic analyses. We inspect to which extent neural language models (LMs) exhibit uncertainty over such analyses when processing temporarily ambiguous inputs, and how that uncertainty is modulated by disambiguating cues. We probe the LM's expectations by generating from it: we use stochastic decoding to derive a set of sentence completions, and estimate the probability that the LM assigns to each interpretation based on the distribution of parses across completions. Unlike scoring-based methods for targeted syntactic evaluation, this technique makes it possible to explore completions that are not hypothesized in advance by the researcher. We apply this method to study the behavior of two LMs (GPT2 and an LSTM) on three types of temporary ambiguity, using materials from human sentence processing experiments. We find that LMs can track multiple analyses simultaneously; the degree of uncertainty varies across constructions and contexts. As a response to disambiguating cues, the LMs often select the correct interpretation, but occasional errors point to potential areas of improvement.
翻译:当句首与多个合成分析兼容时,会出现暂时性的综合模糊性; 我们检查神经语言模型(LMS)在处理暂时模糊性输入时对此类分析有何种程度的不确定性,以及这种不确定性如何通过模糊的暗示来调节。 我们通过从中得出LM的预期: 我们使用随机解码来得出一套句尾, 并估计LM根据跨完成阶段的剖面分布对每种解释分配的概率。 与基于评分的定向合成评价方法不同, 这种技术使得能够探索研究在研究者事先没有低估的完成率。 我们采用这种方法来研究两个LMS(GPT2和LSTM)在三种类型的临时模糊性行为上的行为, 使用人类句处理实验的材料。 我们发现LMS可以同时跟踪多重分析; 不同构造和背景的不确定性程度不同。 作为对调和背景的提示的反应, LMS经常选择正确的解释, 但偶尔会指出可能的改进领域。