Understanding longer narratives or participating in conversations requires tracking of discourse entities that have been mentioned. Indefinite noun phrases (NPs), such as 'a dog', frequently introduce discourse entities but this behavior is modulated by sentential operators such as negation. For example, 'a dog' in 'Arthur doesn't own a dog' does not introduce a discourse entity due to the presence of negation. In this work, we adapt the psycholinguistic assessment of language models paradigm to higher-level linguistic phenomena and introduce an English evaluation suite that targets the knowledge of the interactions between sentential operators and indefinite NPs. We use this evaluation suite for a fine-grained investigation of the entity tracking abilities of the Transformer-based models GPT-2 and GPT-3. We find that while the models are to a certain extent sensitive to the interactions we investigate, they are all challenged by the presence of multiple NPs and their behavior is not systematic, which suggests that even models at the scale of GPT-3 do not fully acquire basic entity tracking abilities.
翻译:了解更长远的叙事或参与对话需要跟踪所提到的话语实体。 无限期的名词词( NPs), 如“ 狗”, 经常引入话语实体, 但这种行为由感官操作者调节, 例如否定。 例如, “ Arthur” 中的“ 狗” 并不拥有狗 ”, 因为存在否定, 并不引入话语实体。 在这项工作中, 我们对语言模式模式模式的心理语言评估进行调整, 以更高级别的语言现象为对象, 并引入一个英语评价套件, 将感官操作者和无限期NPs之间的相互作用知识作为目标。 我们使用这个评价套子对基于变换器的GPT-2 和 GPT-3 模型的实体跟踪能力进行精细细致的调查。 我们发现, 虽然模型在某种程度上对我们调查的交互作用敏感, 但是它们都受到多个NPs的存在及其行为不系统化的挑战, 这表明即使是GPT-3 规模的模型也不能完全获得基本的实体跟踪能力 。